Hi All
I am using nutch 2.3.1-mongoDB. I have created a plugin by implementing the Parser interface and changed the parse-plugin.xml,build.xml,plugin.xml accordingly. But While using parsechecker it is not working.The logs are given below.

2016-04-07 17:52:46,693 INFO parse.ParserChecker - fetching: http://www.thehindu.com/?service=rss 2016-04-07 17:52:46,820 WARN plugin.PluginRepository - Missing dependency * for plugin parse-rome
2016-04-07 17:52:47,195 INFO  http.Http - http.proxy.host = null
2016-04-07 17:52:47,195 INFO  http.Http - http.proxy.port = 8080
2016-04-07 17:52:47,195 INFO  http.Http - http.timeout = 10000
2016-04-07 17:52:47,195 INFO  http.Http - http.content.limit = -1
2016-04-07 17:52:47,196 INFO http.Http - http.agent = My Crawler/Nutch-2.3.1 2016-04-07 17:52:47,196 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2016-04-07 17:52:47,196 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2016-04-07 17:52:48,411 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature *2016-04-07 17:52:48,451 WARN parse.ParserFactory - ParserFactory: Plugin: org.apache.nutch.parse.feed.FeedParser mapped to contentType application/rss+xml via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml** **2016-04-07 17:52:48,451 WARN parse.ParserFactory - ParserFactory: Plugin: org.apache.nutch.parse.rome.RomeParser mapped to contentType application/rss+xml via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml* 2016-04-07 17:52:49,920 INFO parse.ParserChecker - parsing: http://www.thehindu.com/?service=rss 2016-04-07 17:52:49,920 INFO parse.ParserChecker - contentType: application/rss+xml 2016-04-07 17:52:49,920 INFO parse.ParserChecker - signature: caf37ee6dd27677612a4d3d0b528ed65
2016-04-07 17:52:49,920 INFO  parse.ParserChecker - ---------
Url
---------------

2016-04-07 17:52:49,920 INFO  parse.ParserChecker - ---------
Metadata
---------

2016-04-07 17:52:49,921 INFO  parse.ParserChecker - ---------
Outlinks
---------

2016-04-07 17:52:50,055 INFO  parse.ParserChecker - ---------
Headers
---------


What does the mean of the line "*2016-04-07 17:52:48,451 WARN parse.ParserFactory - ParserFactory: Plugin: org.apache.nutch.parse.rome.RomeParser mapped to contentType application/rss+xml via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml*".

I have already included the plugin in nutch-site.xml within plugin.include parameter.

Thanks
Harsh



Reply via email to