2010-10-12 19:54:19,976 WARN parse.ParserFactory -
ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to
contentType application/xhtml+xml via parse-plugins.xml, but its
plugin.xml file does not claim to support contentType:
application/xhtml+xml
2010-10-12 19:54:19,991 WARN parse.ParseUtil - Unable to successfully
parse content http://www.lucidimagination.com/ of type
application/xhtml+xml
2010-10-12 19:54:19,991 WARN fetcher.Fetcher - Error parsing:
http://www.lucidimagination.com/: failed(2,200):
org.apache.nutch.parse.ParseException: Unable to successfully parse content
I am trying to crawl http://www.lucidimagination.com/ with Nutch 1.2. I
tried both Tika and html parsers (above is html), but neither work.
Any suggestions?
- Problem parsing application/xhtml+xml Okke Klein
-