Hi All
I am using nutch 2.3.1-mongoDB. I have created a plugin by implementing
the Parser interface and changed the
parse-plugin.xml,build.xml,plugin.xml accordingly. But While using
parsechecker it is not working.The logs are given below.
2016-04-07 17:52:46,693 INFO parse.ParserChecker - fetching:
http://www.thehindu.com/?service=rss
2016-04-07 17:52:46,820 WARN plugin.PluginRepository - Missing
dependency * for plugin parse-rome
2016-04-07 17:52:47,195 INFO http.Http - http.proxy.host = null
2016-04-07 17:52:47,195 INFO http.Http - http.proxy.port = 8080
2016-04-07 17:52:47,195 INFO http.Http - http.timeout = 10000
2016-04-07 17:52:47,195 INFO http.Http - http.content.limit = -1
2016-04-07 17:52:47,196 INFO http.Http - http.agent = My
Crawler/Nutch-2.3.1
2016-04-07 17:52:47,196 INFO http.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2016-04-07 17:52:47,196 INFO http.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2016-04-07 17:52:48,411 INFO crawl.SignatureFactory - Using Signature
impl: org.apache.nutch.crawl.MD5Signature
*2016-04-07 17:52:48,451 WARN parse.ParserFactory - ParserFactory:
Plugin: org.apache.nutch.parse.feed.FeedParser mapped to contentType
application/rss+xml via parse-plugins.xml, but not enabled via
plugin.includes in nutch-default.xml**
**2016-04-07 17:52:48,451 WARN parse.ParserFactory - ParserFactory:
Plugin: org.apache.nutch.parse.rome.RomeParser mapped to contentType
application/rss+xml via parse-plugins.xml, but not enabled via
plugin.includes in nutch-default.xml*
2016-04-07 17:52:49,920 INFO parse.ParserChecker - parsing:
http://www.thehindu.com/?service=rss
2016-04-07 17:52:49,920 INFO parse.ParserChecker - contentType:
application/rss+xml
2016-04-07 17:52:49,920 INFO parse.ParserChecker - signature:
caf37ee6dd27677612a4d3d0b528ed65
2016-04-07 17:52:49,920 INFO parse.ParserChecker - ---------
Url
---------------
2016-04-07 17:52:49,920 INFO parse.ParserChecker - ---------
Metadata
---------
2016-04-07 17:52:49,921 INFO parse.ParserChecker - ---------
Outlinks
---------
2016-04-07 17:52:50,055 INFO parse.ParserChecker - ---------
Headers
---------
What does the mean of the line "*2016-04-07 17:52:48,451 WARN
parse.ParserFactory - ParserFactory: Plugin:
org.apache.nutch.parse.rome.RomeParser mapped to contentType
application/rss+xml via parse-plugins.xml, but not enabled via
plugin.includes in nutch-default.xml*".
I have already included the plugin in nutch-site.xml within
plugin.include parameter.
Thanks
Harsh