How Tika parsers works?

jeff Sun, 18 Jul 2010 16:25:52 -0700

Hi, in Nutch 1.0 I was able to replace the parse-html plugin with my own
html parser to parse html files, through modifying the mime types in
parse-plugins.xml.


I have been trying to do the same things in Nutch 1.1, but my own html
parser is not picked up when crawling, leading to no parser exceptions. 

I would like to know how to replace one or two parsing capabilities of
tika. More importantly, it is good to know how tika works.

Thanks

How Tika parsers works?

Reply via email to