Strange, it should show the bad URL. But since you have only 9 URL's the 
easiest way to go is to use the parsechecker tool for each URL.

 
 
-----Original message-----
> From:Ing. Eyeris Rodriguez Rueda <[email protected]>
> Sent: Mon 21-May-2012 19:42
> To: [email protected]
> Subject: Re: error parsing some xml
> 
> I use nutch 1.4 and solr 3.4
> I think that my error is at moment to parse one xml with this structure
> <!--text with -- inside the comentary-->
> I was reading but not found so much, this is my error's log.
> please some help.
> *************************************************************************************************
> 2012-05-21 10:17:53,398 INFO  fetcher.Fetcher - Fetcher: starting at 
> 2012-05-21 10:17:53
> 2012-05-21 10:17:53,399 INFO  fetcher.Fetcher - Fetcher: segment: 
> crawl/segments/20120521101752
> 2012-05-21 10:17:53,762 INFO  fetcher.Fetcher - Using queue mode : byHost
> 2012-05-21 10:17:53,762 INFO  fetcher.Fetcher - Fetcher: threads: 20
> 2012-05-21 10:17:53,762 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2
> 2012-05-21 10:17:53,777 INFO  fetcher.Fetcher - QueueFeeder finished: total 9 
> records + hit by time limit :0
> 2012-05-21 10:17:53,804 WARN  parse.ParsePluginsReader - Unable to parse 
> [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; 
> columnNumber: 7; The string "--" is not permitted within comments.]
> 2012-05-21 10:17:53,809 WARN  mapred.LocalJobRunner - job_local_0005
> java.lang.RuntimeException: Parse Plugins preferences could not be loaded.
>       at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:73)
>       at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:53)
>       at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:581)
>       at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1075)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> ****************************************************************************************************
> 
> 
> 
> 
> ----- Mensaje original -----
> De: "Markus Jelsma" <[email protected]>
> Para: [email protected]
> Enviados: Lunes, 21 de Mayo 2012 11:41:40
> Asunto: RE: error parsing some xml
> 
> Hi
> 
> Which version do you use? It should list the troubling URL. What's the stack 
> trace?
> 
> Cheers
> 
>  
>  
> -----Original message-----
> > From:Ing. Eyeris Rodriguez Rueda <[email protected]>
> > Sent: Mon 21-May-2012 17:07
> > To: [email protected]
> > Subject: error parsing some xml
> > 
> > Hi all.
> > When I try to crawl i have a problem at parsing some xml, i get the 
> > exception below, i want to know which is the xml with problem at parsing 
> > moment.
> > **************************************************************************************
> > WARN  parse.ParsePluginsReader - Unable to parse [null].Reason is 
> > [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string 
> > "--" is not permitted within comments.]
> > ***************************************************************************************
> > Please some help will apreciated
> > 
> > 
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > 
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> > 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 

Reply via email to