I'm running nutch1.1 in distributed mode. The slave and master have the same
configuration related to parsing:
nutch-site.xml:  parse-(html|text|js|zip|tika)
parse-plugins.xml: enable nutch parser for html, text, js, zip

This settings will use nutch parsers for html, text, js, zip, but tika for
pdf and everything else.

When doing ParseSegment step, log messages on master look normal. But, the
log file on the slave machine is full of  the following errors for all
mime-type:
2010-08-05 09:22:31,916 ERROR tika.TikaParser - Can't retrieve Tika parser
for mime-type text/html
2010-08-05 09:22:32,048 ERROR tika.TikaParser - Can't retrieve Tika parser
for mime-type application/pdf
...same for all other mime-type

any idea why slave machine has tika error for all mime-types?

thanks,
-aj

Reply via email to