Alex (et al),

There was/is plenty of space on the drive (>3GB).

I was trying the command line from the tutorial:

bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log

I'm re-running again, to see what happens.  If I get that error again, I'll 
delete the dirs, as yourself and xiao yang suggested.

Jim

---- Alex McLintock <alex.mclint...@gmail.com> wrote: 
> > but I get a number of messages in crawl.log, like:
> >
> > Error parsing: http://lucene.apache.org/skin/getMenu.js: 
> > org.apache.nutch.parse.ParseException: parser not found for 
> > contentType=application/javascript 
> > url=http://lucene.apache.org/skin/getMenu.js
> >        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:74)
> >        at 
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:766)
> >        at 
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:552)
> 
> I dont see this as an error to worry about. It is just saying that it
> has been directed to fetch a ".js" file but it doesnt know
> how to parse it looking for values to index or links to crawl. I dont
> see the need to do that with javascript so I would treat this "Error"
> as a warning.
> 
> 
> > Then, at the end of the log, I get:
> >
> > LinkDb: adding segment: 
> > file:/opt/nutch-1.0/crawl.test/segments/20090713171413
> > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
> > Input path does not exist: 
> > file:/opt/nutch-1.0/crawl.test/segments/20090713171413/parse_data
> >        at 
> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
> >        at 
> > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
> >
> > I must have missed something, but being new, I can't figure out what is 
> > causing that problem?
> >
> > Thanks,
> > Jim
> 
> Have you told us what commands you ran? Is the hard disk full? What is
> actually in that segment? Does it contain perhaps an aborted run?
> 
> Can you simply delete that segment/directory if there isnt much data
> in there that you dont mind losing?
> 
> Goodluck.
> 
> Alex

Reply via email to