Is Chinese content extraction supported by Tika ?

yu shen Sat, 13 Nov 2010 02:17:30 -0800

Hi all,

I just started using tika. I tried to extract English words in html files,
it works fine?
And I try to integrate a Chinese words tokenizer into solr, and search
again, many previous hitted english words does not hit anymore.


Is there already a solution from tika to extract chinese content within a
html file?

Thanks in advance.

Is Chinese content extraction supported by Tika ?

Reply via email to