I use nutch 1.4 and solr 3.4
I think that my error is at moment to parse one xml with this structure
<!--text with -- inside the comentary-->
I was reading but not found so much, this is my error's log.
please some help.
*************************************************************************************************
2012-05-21 10:17:53,398 INFO fetcher.Fetcher - Fetcher: starting at 2012-05-21
10:17:53
2012-05-21 10:17:53,399 INFO fetcher.Fetcher - Fetcher: segment:
crawl/segments/20120521101752
2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Using queue mode : byHost
2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: threads: 20
2012-05-21 10:17:53,762 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
2012-05-21 10:17:53,777 INFO fetcher.Fetcher - QueueFeeder finished: total 9
records + hit by time limit :0
2012-05-21 10:17:53,804 WARN parse.ParsePluginsReader - Unable to parse
[null].Reason is [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber:
7; The string "--" is not permitted within comments.]
2012-05-21 10:17:53,809 WARN mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: Parse Plugins preferences could not be loaded.
at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:73)
at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:53)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:581)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1075)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
****************************************************************************************************
----- Mensaje original -----
De: "Markus Jelsma" <[email protected]>
Para: [email protected]
Enviados: Lunes, 21 de Mayo 2012 11:41:40
Asunto: RE: error parsing some xml
Hi
Which version do you use? It should list the troubling URL. What's the stack
trace?
Cheers
-----Original message-----
> From:Ing. Eyeris Rodriguez Rueda <[email protected]>
> Sent: Mon 21-May-2012 17:07
> To: [email protected]
> Subject: error parsing some xml
>
> Hi all.
> When I try to crawl i have a problem at parsing some xml, i get the exception
> below, i want to know which is the xml with problem at parsing moment.
> **************************************************************************************
> WARN parse.ParsePluginsReader - Unable to parse [null].Reason is
> [org.xml.sax.SAXParseException; lineNumber: 37; columnNumber: 7; The string
> "--" is not permitted within comments.]
> ***************************************************************************************
> Please some help will apreciated
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci