Hi all, I just started using tika. I tried to extract English words in html files, it works fine? And I try to integrate a Chinese words tokenizer into solr, and search again, many previous hitted english words does not hit anymore.
Is there already a solution from tika to extract chinese content within a html file? Thanks in advance.
