Hi folks. What's the motivation to add exactly the number of white spaces after
an entity declaration in HTMLStripReader? It basically looks like this:
"lód"
(UTF: lód, "ice" in Polish) is translated into:
"ló d"
This happens both with numeric entities and named entities. Needless to say,
these added spaces in the character stream do no good as they effectively split
a single term "lód" into two meaningless terms "l" and "d".
I can fix this in the code easily, but it looks like it was intentional, so
before I write test cases and commit a JIRA issue I would like to understand
what the original reasons might have been (I really don't see anything this
would be useful for). Apologies if I'm being dim here.
Dawid