Hi, in Nutch 1.0 I was able to replace the parse-html plugin with my own
html parser to parse html files, through modifying the mime types in
parse-plugins.xml.

I have been trying to do the same things in Nutch 1.1, but my own html
parser is not picked up when crawling, leading to no parser exceptions. 

I would like to know how to replace one or two parsing capabilities of
tika. More importantly, it is good to know how tika works.

Thanks 

Reply via email to