Hi Sandhya,
It is observed that TIKA does not extract the Content-Language for
documents encoded in UTF-8. For natively encoded documents, it works
fine. Any idea on how we can resolve this ?
I would post this question to the u...@tika.apache.org mailing list,
and include more details on what type of document.
The Tika language detection is fairly weak, and when the encoding is
universal (language independent) such as UTF-8, the resulting
confidence level is often low enough that Tika doesn't assume it has a
good match, and thus doesn't report a language.
-- Ken
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g