Isn't tika responsible for XML parsing? Because I got this: parse.ParserFactory
- ParserFactory:
Plugin: org.apache.nutch.parse.feed.FeedParser mapped to contentType
application/rss+xml via
parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml.
Should I just include xml?
The plugin "feed" contains org.apache.nutch.parse.feed.FeedParser
But both ("feed" and "parse-tika") should be able to parse RSS feeds.
Have a look at:
http://lucene.472066.n3.nabble.com/RSS-parser-td3719558.html
and also:
https://issues.apache.org/jira/browse/NUTCH-1053
https://issues.apache.org/jira/browse/NUTCH-887
Sebastian