Hey, In my use case of tika, I am rendering a webpage, taking the contents of the page and feeding this into tika. The contents of the webpage are encoded in UTF-8 when I feed it into tika, but the HtmlParser is using the AutoDetectReader to try and determine the charset. This means tika is using the meta-data tag of the page to determine the charset.
Is there a way to not use this AutoDetectReader and just specify the charset? Or better yet, inject the Detector that will be used? Thanks for your help, Dave
