> but I get a number of messages in crawl.log, like: > > Error parsing: http://lucene.apache.org/skin/getMenu.js: > org.apache.nutch.parse.ParseException: parser not found for > contentType=application/javascript > url=http://lucene.apache.org/skin/getMenu.js > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:74) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:766) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:552)
I dont see this as an error to worry about. It is just saying that it has been directed to fetch a ".js" file but it doesnt know how to parse it looking for values to index or links to crawl. I dont see the need to do that with javascript so I would treat this "Error" as a warning. > Then, at the end of the log, I get: > > LinkDb: adding segment: file:/opt/nutch-1.0/crawl.test/segments/20090713171413 > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: > file:/opt/nutch-1.0/crawl.test/segments/20090713171413/parse_data > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39) > > I must have missed something, but being new, I can't figure out what is > causing that problem? > > Thanks, > Jim Have you told us what commands you ran? Is the hard disk full? What is actually in that segment? Does it contain perhaps an aborted run? Can you simply delete that segment/directory if there isnt much data in there that you dont mind losing? Goodluck. Alex