On Wed, 28 Sep 2016, Mark Kerzner wrote:
probably yes, but how do I tell it which parser to use? Today, I just do
that
String text = tika.parseToString(inputStream, metadata);
and it know the parser.
That might be your issue. It's quite hard to identify the language of a
piece of source code from just the first few hundred bytes of text. If you
tell Tika the filename, including the extension, it'll have much more luck
spotting the file is code and using the appropriate parser!
(Binary files often have common magic at/near the start that helps Tika
identify the file type, source code is text based and lacks that)
Nick