Hi Tika support team
I'd like to know if there is any updates regard the question I submitted on 26/06/2011 The question I'd like to continue the non-West European languages support issue since I run into one more problem. I'm using Tika 0.9 . Although I got the problem from api but it's replicated from command line as well The test data is rtf document in Japanese When I examine the extracted data I found very strange result, it turns out that almost all Japanese characters are returned twice here is the output Here is the way I use command line java -jar tika-app-0.9.jar Jp_euc-jp_rtf1.rtf > jp_euc-jp_rtf.out Thanks Best Regards. Denis Voloshin Software engineer Phone: +972-2-649-1162 Mobile: +972-54-642-2269
<<image/gif>>
<<image/gif>>
<<image/jpeg>>
