> I am doing it in NUTCH_HOME/runtime/local/conf. I thought I could use > nutch-default.xml, and nutch-site.xml just overrode nutch-default.xml.
that's the case. I was just mentioning a recommended practice, not a strict requirement > > > On 5/29/12 9:48 AM, Julien Nioche wrote: > >> if you are seeing this warning then this means that parse-pdf IS being >> used. You should modify nutch-site.xml and not nutch-default and my bet is >> that your are doing this in NUTCH_HOME/conf and not in >> NUTCH_HOME/runtime/local/conf (see tutorial on WIKI) >> >> >> >> On 29 May 2012 07:31, Tolga<[email protected]> wrote: >> >> Hi, >>> >>> I know this issue should have been closed, but I thought I'd continue >>> this >>> rather than starting a new thread. >>> >>> Anyway, I'm getting this: parse.ParserFactory - ParserFactory: Plugin: >>> parse-pdf mapped to contentType application/pdf via parse-plugins.xml, >>> but >>> not enabled via plugin.includes in nutch-default.xml and I have tika in >>> my >>> nutch-default.xml:<value>**protocol-http|**urlfilter-** >>> regex|parse-(html|** >>> tika|js|swf|zip|xml)|index-(****basic|anchor)|scoring-opic|** >>> urlnormalizer-(pass|regex|****basic)</value>. What's the point of seeing >>> >>> this warning if I already have tika? This should be removed IMHO. >>> >>> Regards, >>> >>> >>> On 5/23/12 12:27 AM, Lewis John Mcgibbney wrote: >>> >>> Unless your using<= Nutch 1.2 you should not be using >>>> msexcel|mspowerpoint|msword|****oo|pdf| within your plugin.includes... >>>> all >>>> >>>> of these document formats are (and have been for some time) >>>> implemented as Apache Tika parsers. >>>> >>>> hth >>>> >>>> >>>> >>>> On Tue, May 22, 2012 at 9:20 PM, Tolga<[email protected]> wrote: >>>> >>>> Hi, >>>>> >>>>> I crawl / index PDF files just fine, but I get the following warning. >>>>> >>>>> parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to >>>>> contentType >>>>> application/pdf via parse-plugins.xml, but not enabled via >>>>> plugin.includes >>>>> in nutch-default.xml. >>>>> >>>>> I've got the value >>>>> protocol-http|urlfilter-regex|****parse-(html|tika|js|msexcel|**** >>>>> mspowerpoint|msword|oo|pdf|****swf|zip)|index-(basic|anchor)|**** >>>>> scoring-opic|urlnormalizer-(****pass|regex|basic) >>>>> >>>>> for plugin.includes property in nutch-default.xml. What am I missing? >>>>> >>>>> Regards, >>>>> >>>>> >>>> >>>> >> -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

