Nick, Marcus,

thank you for your help. It works great, and one of the problems that I saw
was indeed with my code, not Tika.

Mark

Mark Kerzner, SHMsoft <http://shmsoft.com/>,
Book a call with me here <http://www.meetme.so/markkerzner>

Mobile: 713-724-2534
Skype: mark.kerzner1
<http://shmsoft.com/>

On Thu, Sep 29, 2016 at 5:21 PM, Nick Burch <[email protected]> wrote:

> On Wed, 28 Sep 2016, Mark Kerzner wrote:
>
>> probably yes, but how do I tell it which parser to use? Today, I just do
>> that
>>
>> String text = tika.parseToString(inputStream, metadata);
>>
>> and it know the parser.
>>
>
> That might be your issue. It's quite hard to identify the language of a
> piece of source code from just the first few hundred bytes of text. If you
> tell Tika the filename, including the extension, it'll have much more luck
> spotting the file is code and using the appropriate parser!
>
> (Binary files often have common magic at/near the start that helps Tika
> identify the file type, source code is text based and lacks that)
>
> Nick
>

Reply via email to