> but I get a number of messages in crawl.log, like:
>
> Error parsing: http://lucene.apache.org/skin/getMenu.js: 
> org.apache.nutch.parse.ParseException: parser not found for 
> contentType=application/javascript 
> url=http://lucene.apache.org/skin/getMenu.js
>        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:74)
>        at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:766)
>        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:552)

I dont see this as an error to worry about. It is just saying that it
has been directed to fetch a ".js" file but it doesnt know
how to parse it looking for values to index or links to crawl. I dont
see the need to do that with javascript so I would treat this "Error"
as a warning.


> Then, at the end of the log, I get:
>
> LinkDb: adding segment: file:/opt/nutch-1.0/crawl.test/segments/20090713171413
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
> Input path does not exist: 
> file:/opt/nutch-1.0/crawl.test/segments/20090713171413/parse_data
>        at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
>        at 
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
>
> I must have missed something, but being new, I can't figure out what is 
> causing that problem?
>
> Thanks,
> Jim

Have you told us what commands you ran? Is the hard disk full? What is
actually in that segment? Does it contain perhaps an aborted run?

Can you simply delete that segment/directory if there isnt much data
in there that you dont mind losing?

Goodluck.

Alex

Reply via email to