Re: parse.ParserFactory

Tolga Mon, 28 May 2012 23:31:42 -0700

Hi,

I know this issue should have been closed, but I thought I'd continuethis rather than starting a new thread.


Regards,

On 5/23/12 12:27 AM, Lewis John Mcgibbney wrote:

Unless your using<= Nutch 1.2 you should not be using
msexcel|mspowerpoint|msword|oo|pdf| within your plugin.includes... all
of these document formats are (and have been for some time)
implemented as Apache Tika parsers.

hth



On Tue, May 22, 2012 at 9:20 PM, Tolga<[email protected]>  wrote:

Hi,

I crawl / index PDF files just fine, but I get the following warning.

parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to contentType
application/pdf via parse-plugins.xml, but not enabled via plugin.includes
in nutch-default.xml.

I've got the value
protocol-http|urlfilter-regex|parse-(html|tika|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)
for plugin.includes property in nutch-default.xml. What am I missing?

Regards,

Re: parse.ParserFactory

Reply via email to