parse-tika config

Matthias Paul Mon, 27 Sep 2010 09:52:22 -0700

Hi,

I'm using parse-tika, but how can I decide which mime-types to parse and
which not? e.g. if I'm only interested in pdfs and not doc's?
Do I have to work on the tika-mimetypes.xml file or is there some other way
to configure this?


Second question: if I don't use parse-tika, I get a lot of parse-exceptions
for jpgs etc as Nutch doesn't know what to do with this content.
So it's normal to see this exceptions? I find it a bit strange to see all
this errors...

Thanks
Matthias

parse-tika config

Reply via email to