Alex (et al),
There was/is plenty of space on the drive (>3GB).
I was trying the command line from the tutorial:
bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
I'm re-running again, to see what happens. If I get that error again, I'll
delete the dirs, as yourself and xiao yang sug
Hi, Jim
I got the second error too. It's because the previous crawl failed
abnormally.
There should be the following sub-directories in /segments/20090713171413:
content crawl_fetch crawl_generate crawl_parse parse_data parse_text
My solution is deleting the corrupted directory and re-crawl.
h.crawl.Crawl.main(Crawl.java:129)
>
> I must have missed something, but being new, I can't figure out what is
> causing that problem?
>
> Thanks,
> Jim
>
>
>
--
View this message in context:
http://www.nabble.com/Just-getting-started-w-tutorial--errors-in-crawl.log-tp24472043p24476828.html
Sent from the Nutch - User mailing list archive at Nabble.com.
> but I get a number of messages in crawl.log, like:
>
> Error parsing: http://lucene.apache.org/skin/getMenu.js:
> org.apache.nutch.parse.ParseException: parser not found for
> contentType=application/javascript
> url=http://lucene.apache.org/skin/getMenu.js
> at org.apache.nutch.parse.P
Hi,
I've just gotten nutch installed, and am stepping through the tutorial at:
http://lucene.apache.org/nutch/tutorial8.html
It seems to be working, but I get a number of messages in crawl.log, like:
Error parsing: http://lucene.apache.org/skin/getMenu.js:
org.apache.nutch.parse.ParseException