Alex (et al), There was/is plenty of space on the drive (>3GB).
I was trying the command line from the tutorial: bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log I'm re-running again, to see what happens. If I get that error again, I'll delete the dirs, as yourself and xiao yang suggested. Jim ---- Alex McLintock <alex.mclint...@gmail.com> wrote: > > but I get a number of messages in crawl.log, like: > > > > Error parsing: http://lucene.apache.org/skin/getMenu.js: > > org.apache.nutch.parse.ParseException: parser not found for > > contentType=application/javascript > > url=http://lucene.apache.org/skin/getMenu.js > > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:74) > > at > > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:766) > > at > > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:552) > > I dont see this as an error to worry about. It is just saying that it > has been directed to fetch a ".js" file but it doesnt know > how to parse it looking for values to index or links to crawl. I dont > see the need to do that with javascript so I would treat this "Error" > as a warning. > > > > Then, at the end of the log, I get: > > > > LinkDb: adding segment: > > file:/opt/nutch-1.0/crawl.test/segments/20090713171413 > > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > > Input path does not exist: > > file:/opt/nutch-1.0/crawl.test/segments/20090713171413/parse_data > > at > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) > > at > > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39) > > > > I must have missed something, but being new, I can't figure out what is > > causing that problem? > > > > Thanks, > > Jim > > Have you told us what commands you ran? Is the hard disk full? What is > actually in that segment? Does it contain perhaps an aborted run? > > Can you simply delete that segment/directory if there isnt much data > in there that you dont mind losing? > > Goodluck. > > Alex