I have some very simple code to call Tika:
Parser parser = new AutoDetectParser();
ContentHandler contentHandler = new BodyContentHandler(writer);
ParseContext parseContext = new ParseContext();
Metadata metadata = new Metadata();
parser.parse(input, contentHandler, metadata, parseContext);
It has been working fine on many inputs, but I get no text in the
content handler when I feed it a file in the Shift-JIS encoding.
The metadata comes back with a content type of application/octet-stream.
I thought I'd better write here before opening a JIRA, in case I'm
missing something trivial