Hi, in Nutch 1.0 I was able to replace the parse-html plugin with my own html parser to parse html files, through modifying the mime types in parse-plugins.xml.
I have been trying to do the same things in Nutch 1.1, but my own html parser is not picked up when crawling, leading to no parser exceptions. I would like to know how to replace one or two parsing capabilities of tika. More importantly, it is good to know how tika works. Thanks

