I haven't changed anything in nutch.default.xml. I am using java version 1.6.0 on a macbook. Is there a way I can fix SAX Parser ??
Thanks everyone for your support. I am honored… :) Ashumeet Singh On Feb 14, 2010, at 8:32 AM, Andreas P. Koenzen wrote: > Hello Ashumeet, > > Yes, thats a symptom of malformed XML, if you haven't change the > nutch-default.xml file, then its probably the version of the SAX parser you > are using. Which version of Java are you using...? > > Best regards, > > --- > Andreas P. Koenzen > > On 14/02/2010, at 01:31 a.m., Ashumeet Singh wrote: > >> I am not sure which tag is missing in nutch-default.xml. Also "Content is >> not allowed in prolog." I don't understand what is it trying to say. I can >> view it in the browser but the search is empty because there is no crawl >> happened till now. >> >> Thanks for the prompt reply. >> Ashumeet Singh >> >> On Feb 13, 2010, at 11:20 PM, Neera Sharma wrote: >> >>> You need to check your nutch-default.xml file for missing tag etc. Are you >>> able to view it in your browser? >>> >>> >>> On Sat, Feb 13, 2010 at 4:33 PM, Ashumeet Singh <ashumeet.landm...@gmail.com >>>> wrote: >>> >>>> Hey everyone…!!! I am pleased to be a part of this community. I am a >>>> student trying to learn nutch. I have installed nutch and tomcat properly. >>>> And they are running but at the time of crawl, when I am running the >>>> following command in terminal: >>>> >>>> ./nutch crawl urls -dir crawl -depth 3 -topN 50 >>>> >>>> It is giving me the following error: >>>> >>>> [Fatal Error] nutch-default.xml:1:1: Content is not allowed in prolog. >>>> Exception in thread "main" java.lang.RuntimeException: >>>> org.xml.sax.SAXParseException: Content is not allowed in prolog. >>>> at >>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1049) >>>> at >>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:940) >>>> at >>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:891) >>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:345) >>>> at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:195) >>>> at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:205) >>>> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:150) >>>> at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:27) >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:59) >>>> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog. >>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) >>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) >>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) >>>> at >>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:968) >>>> ... 8 more >>>> >>>> PLEASE HELP ME ……. >>>> >>>> >>>> Thanks >>>> Ashumeet Singh >>>> >>>> >> >