Hi, I'm seeing some extra spaces after special/encoded characters in RTF files. Example: - RTF file with only content "Übersicht" (created one in WordPad and one in Word2010, both exhibited the same problem). - Tika 0.7 extracts "Ü bersicht"
I didn't test in 0.6 yet, but I know in 0.4 this problem was not present. My only guess at this point is that it might be related to TIKA-392 which adds spaces after subsequent text runs, the umlaut being encoded is maybe generating a separate text run. Any suggestions? Thanks, Cristian Vat
