nutch crawl

Christopher Gross Mon, 20 May 2013 06:44:56 -0700

I'm attempting to get a crawl working using scripts, but I've been getting
a "Skipping <url>; different batch id (null)" error and then nothing new in
Solr.  So I've reverted back to trying out the "crawl" for the nutch script:


./nutch crawl ../urls/ -solr "http://localhost/nutchsolr"; -threads 5 -depth
3 -topN 100

urls has the "seed.txt" file with some sites.  It definitely is able to get
pages (finding other hostnames in the lists scrolling through the screen),
but then it is still skipping with the "batch id (null)" message for
everything it finds.

Any guidance/advice would be appreciated.

Thanks!

-- Chris

nutch crawl

Reply via email to