I am using a hadoop cluster so I am putting my conf files into the nutch
source conf directory and building a nutch job file. I am then putting the
job file into the classpath. I thought it was working fine since it seems to
be reading the regex-urlfilter.txt from there. However, I am getting
messages like this:

2011-01-04 07:41:55,259 WARN  parse.ParserFactory - ParserFactory:Plugin:
org.apache.nutch.parse.html.HtmlParser mapped to contentType
application/xhtml+xml via parse-plugins.xml, but i
ts plugin.xml file does not claim to support contentType:
application/xhtml+xml

But in the parse-plugins.xml file I had put this:

        <mimeType name="application/xhtml+xml">
                <plugin id="parse-tika" />
        </mimeType>


Shouldn't the parse-plugins.xml be using parse-tika? If nutch is using a
different parse-plugins.xml, how do I find which one it is using?

Thanks,
Steve Cohen

Reply via email to