Strange, it should show the bad URL. But since you have only 9 URL's the easiest way to go is to use the parsechecker tool for each URL.
-----Original message----- > From:Ing. Eyeris Rodriguez Rueda <[email protected]> > Sent: Mon 21-May-2012 19:42 > To: [email protected] > Subject: Re: error parsing some xml > > I use nutch 1.4 and solr 3.4 > I think that my error is at moment to parse one xml with this structure > <!--text with -- inside the comentary--> > I was reading but not found so much, this is my error's log. > please some help. > ************************************************************************************************* > 2012-05-21 10:17:53,398 INFO fetcher.Fetcher - Fetcher: starting at > 2012-05-21 10:17:53 > 2012-05-21 10:17:53,399 INFO fetcher.Fetcher - Fetcher: segment: > crawl/segments/20120521101752 > 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Using queue mode : byHost > 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: threads: 20 > 2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2 > 2012-05-21 10:17:53,777 INFO fetcher.Fetcher - QueueFeeder finished: total 9 > records + hit by time limit :0 > 2012-05-21 10:17:53,804 WARN parse.ParsePluginsReader - Unable to parse > [null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; > columnNumber: 7; The string "--" is not permitted within comments.] > 2012-05-21 10:17:53,809 WARN mapred.LocalJobRunner - job_local_0005 > java.lang.RuntimeException: Parse Plugins preferences could not be loaded. > at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:73) > at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:53) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:581) > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1075) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > **************************************************************************************************** > > > > > ----- Mensaje original ----- > De: "Markus Jelsma" <[email protected]> > Para: [email protected] > Enviados: Lunes, 21 de Mayo 2012 11:41:40 > Asunto: RE: error parsing some xml > > Hi > > Which version do you use? It should list the troubling URL. What's the stack > trace? > > Cheers > > > > -----Original message----- > > From:Ing. Eyeris Rodriguez Rueda <[email protected]> > > Sent: Mon 21-May-2012 17:07 > > To: [email protected] > > Subject: error parsing some xml > > > > Hi all. > > When I try to crawl i have a problem at parsing some xml, i get the > > exception below, i want to know which is the xml with problem at parsing > > moment. > > ************************************************************************************** > > WARN parse.ParsePluginsReader - Unable to parse [null].Reason is > > [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string > > "--" is not permitted within comments.] > > *************************************************************************************** > > Please some help will apreciated > > > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > > INFORMATICAS... > > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > > > http://www.uci.cu > > http://www.facebook.com/universidad.uci > > http://www.flickr.com/photos/universidad_uci > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci >

