Hi, I am using Nutch 1.4 and Solr. My crawls were working perfectly fine before I made some changes to the SolrWriter (which I believe has nothing to do with my problem). Since then, I am getting:
WARN : org.apache.nutch.parse.ParseUtil - Unable to successfully parse content <webpage> of type text/html INFO : org.apache.nutch.parse.ParseSegment - Parsing: <webpage> WARN : org.apache.nutch.parse.ParseSegment - Error parsing: <webpage>: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content for any <webpage> that I try to crawl! My nutch-site.xml file reads: <value>protocol-httpclient|urlfilter-regex|parse-(html|xhtml|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> What could be going wrong? Thanks, --Sudip.

