Motivation for white space after entities in HTMLStripReader

Dawid Weiss Fri, 21 Nov 2008 14:30:09 -0800

Hi folks. What's the motivation to add exactly the number of white spaces afteran entity declaration in HTMLStripReader? It basically looks like this:


"l&oacute;d"

(UTF: lód, "ice" in Polish) is translated into:

"ló       d"

This happens both with numeric entities and named entities. Needless to say,these added spaces in the character stream do no good as they effectively splita single term "lód" into two meaningless terms "l" and "d".

I can fix this in the code easily, but it looks like it was intentional, sobefore I write test cases and commit a JIRA issue I would like to understandwhat the original reasons might have been (I really don't see anything thiswould be useful for). Apologies if I'm being dim here.


Dawid

Motivation for white space after entities in HTMLStripReader

Reply via email to