Hi, I'm using parse-tika, but how can I decide which mime-types to parse and which not? e.g. if I'm only interested in pdfs and not doc's? Do I have to work on the tika-mimetypes.xml file or is there some other way to configure this?
Second question: if I don't use parse-tika, I get a lot of parse-exceptions for jpgs etc as Nutch doesn't know what to do with this content. So it's normal to see this exceptions? I find it a bit strange to see all this errors... Thanks Matthias

