On Wed, 28 Sep 2016, Mark Kerzner wrote:
probably yes, but how do I tell it which parser to use? Today, I just do
that

String text = tika.parseToString(inputStream, metadata);

and it know the parser.

That might be your issue. It's quite hard to identify the language of a piece of source code from just the first few hundred bytes of text. If you tell Tika the filename, including the extension, it'll have much more luck spotting the file is code and using the appropriate parser!

(Binary files often have common magic at/near the start that helps Tika identify the file type, source code is text based and lacks that)

Nick

Reply via email to