Hi Tika support team

I'd like to know if there is any updates regard the question I submitted 
on 26/06/2011


The question

I'd like to continue the  non-West European languages support issue since 
I run into one more problem.
I'm using Tika 0.9 .

Although I got the problem from api but it's replicated from command line 
as well
The test data is rtf document in Japanese 

When I examine the extracted data I found very strange result, it turns 
out that almost all Japanese characters are returned twice
here is the output 
 


Here is the way I use command line

java -jar tika-app-0.9.jar Jp_euc-jp_rtf1.rtf > jp_euc-jp_rtf.out


Thanks





 


Best Regards. 

Denis Voloshin 
Software engineer 
Phone: +972-2-649-1162 
Mobile: +972-54-642-2269 


 

<<image/gif>>

<<image/gif>>

<<image/jpeg>>

Reply via email to