Jeff,

Hi, in Nutch 1.0 I was able to replace the parse-html plugin with my own
> html parser to parse html files, through modifying the mime types in
> parse-plugins.xml.
>
> I have been trying to do the same things in Nutch 1.1, but my own html
> parser is not picked up when crawling, leading to no parser exceptions.
>

You should be able to override Tika for a given mime-type provided that you
declare the association between your plugin and the mime-type in
parse-plugins.xml. Have you checked that your plugin is listed in
plugin.includes? Can you see it listed in the log?

J.
-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to