Hey,

In my use case of tika, I am rendering a webpage, taking the contents of the 
page and feeding this into tika.  The contents of the webpage are encoded in 
UTF-8 when I feed it into tika, but the HtmlParser is using the 
AutoDetectReader to try and determine the charset.  This means tika is using 
the meta-data tag of the page to determine the charset.

Is there a way to not use this AutoDetectReader and just specify the charset?  
Or better yet, inject the Detector that will be used?

Thanks for your help,
Dave


Reply via email to