Hello Ashumeet,
Yes, thats a symptom of malformed XML, if you haven't change the nutch-
default.xml file, then its probably the version of the SAX parser you
are using. Which version of Java are you using...?
Best regards,
---
Andreas P. Koenzen
On 14/02/2010, at 01:31 a.m., Ashumeet Singh wrote:
I am not sure which tag is missing in nutch-default.xml. Also
"Content is not allowed in prolog." I don't understand what is it
trying to say. I can view it in the browser but the search is empty
because there is no crawl happened till now.
Thanks for the prompt reply.
Ashumeet Singh
On Feb 13, 2010, at 11:20 PM, Neera Sharma wrote:
You need to check your nutch-default.xml file for missing tag etc.
Are you
able to view it in your browser?
On Sat, Feb 13, 2010 at 4:33 PM, Ashumeet Singh <ashumeet.landm...@gmail.com
wrote:
Hey everyone…!!! I am pleased to be a part of this community. I am a
student trying to learn nutch. I have installed nutch and tomcat
properly.
And they are running but at the time of crawl, when I am running the
following command in terminal:
./nutch crawl urls -dir crawl -depth 3 -topN 50
It is giving me the following error:
[Fatal Error] nutch-default.xml:1:1: Content is not allowed in
prolog.
Exception in thread "main" java.lang.RuntimeException:
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at
org
.apache.hadoop.conf.Configuration.loadResource(Configuration.java:
1049)
at
org
.apache.hadoop.conf.Configuration.loadResources(Configuration.java:
940)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:
891)
at
org.apache.hadoop.conf.Configuration.set(Configuration.java:345)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:195)
at
org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:205)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:150)
at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:27)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:59)
Caused by: org.xml.sax.SAXParseException: Content is not allowed
in prolog.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
at
javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
at
org
.apache.hadoop.conf.Configuration.loadResource(Configuration.java:
968)
... 8 more
PLEASE HELP ME …….
Thanks
Ashumeet Singh