Hi,

I'm seeing some extra spaces after special/encoded characters in RTF files.
Example:
- RTF file with only content "Übersicht" (created one in WordPad and
one in Word2010, both exhibited the same problem).
- Tika 0.7 extracts "Ü bersicht"

I didn't test in 0.6 yet, but I know in 0.4 this problem was not present.
My only guess at this point is that it might be related to TIKA-392
which adds spaces after subsequent text runs, the umlaut being encoded
is maybe generating a separate text run.

Any suggestions?

Thanks,
Cristian Vat

Reply via email to