Re: Just getting started w/tutorial- errors in crawl.log

2009-07-14 Thread ohaya
Alex (et al), There was/is plenty of space on the drive (>3GB). I was trying the command line from the tutorial: bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log I'm re-running again, to see what happens. If I get that error again, I'll delete the dirs, as yourself and xiao yang sug

Re: Just getting started w/tutorial- errors in crawl.log

2009-07-14 Thread xiao yang
Hi, Jim I got the second error too. It's because the previous crawl failed abnormally. There should be the following sub-directories in /segments/20090713171413: content crawl_fetch crawl_generate crawl_parse parse_data parse_text My solution is deleting the corrupted directory and re-crawl.

Re: Just getting started w/tutorial- errors in crawl.log

2009-07-14 Thread Beats
h.crawl.Crawl.main(Crawl.java:129) > > I must have missed something, but being new, I can't figure out what is > causing that problem? > > Thanks, > Jim > > > -- View this message in context: http://www.nabble.com/Just-getting-started-w-tutorial--errors-in-crawl.log-tp24472043p24476828.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Just getting started w/tutorial- errors in crawl.log

2009-07-14 Thread Alex McLintock
> but I get a number of messages in crawl.log, like: > > Error parsing: http://lucene.apache.org/skin/getMenu.js: > org.apache.nutch.parse.ParseException: parser not found for > contentType=application/javascript > url=http://lucene.apache.org/skin/getMenu.js >        at org.apache.nutch.parse.P

Just getting started w/tutorial- errors in crawl.log

2009-07-13 Thread ohaya
Hi, I've just gotten nutch installed, and am stepping through the tutorial at: http://lucene.apache.org/nutch/tutorial8.html It seems to be working, but I get a number of messages in crawl.log, like: Error parsing: http://lucene.apache.org/skin/getMenu.js: org.apache.nutch.parse.ParseException