I'm running nutch1.1 in distributed mode. The slave and master have the same configuration related to parsing: nutch-site.xml: parse-(html|text|js|zip|tika) parse-plugins.xml: enable nutch parser for html, text, js, zip
This settings will use nutch parsers for html, text, js, zip, but tika for pdf and everything else. When doing ParseSegment step, log messages on master look normal. But, the log file on the slave machine is full of the following errors for all mime-type: 2010-08-05 09:22:31,916 ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type text/html 2010-08-05 09:22:32,048 ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type application/pdf ...same for all other mime-type any idea why slave machine has tika error for all mime-types? thanks, -aj