Nopes. That didn't help. In fact, I had added that entry minutes before sending a mail to the group and after couple of hours of frustration in trying to get the parser to work.
On Thu, Jul 12, 2012 at 11:40 PM, Lewis John Mcgibbney < [email protected]> wrote: > For starters there is no parse-xhtml plugin unless of course this is a > custom one you've written yourself. > > Unless this is the case then remove this from the plugin.includes > property and re-spin it > > hth > > On Thu, Jul 12, 2012 at 7:00 PM, Sudip Datta <[email protected]> wrote: > > Hi, > > > > I am using Nutch 1.4 and Solr. My crawls were working perfectly fine > before > > I made some changes to the SolrWriter (which I believe has nothing to do > > with my problem). Since then, I am getting: > > > > WARN : org.apache.nutch.parse.ParseUtil - Unable to successfully parse > > content <webpage> of type text/html > > INFO : org.apache.nutch.parse.ParseSegment - Parsing: <webpage> > > WARN : org.apache.nutch.parse.ParseSegment - Error parsing: <webpage>: > > failed(2,200): org.apache.nutch.parse.ParseException: Unable to > > successfully parse content > > > > for any <webpage> that I try to crawl! > > > > My nutch-site.xml file reads: > > > <value>protocol-httpclient|urlfilter-regex|parse-(html|xhtml|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > > > > What could be going wrong? > > > > Thanks, > > > > --Sudip. > > > > -- > Lewis >

