Re: nutch crawl

feng lu Mon, 20 May 2013 09:26:14 -0700

Hi Christopher

It will check the update db mark when indexing. but now the update db mark
is null. so it skip the url. maybe this url is not parsed success, you can
check the log to see what happens.



On Mon, May 20, 2013 at 9:44 PM, Christopher Gross <[email protected]>wrote:

> I'm attempting to get a crawl working using scripts, but I've been getting
> a "Skipping <url>; different batch id (null)" error and then nothing new in
> Solr.  So I've reverted back to trying out the "crawl" for the nutch
> script:
>
> ./nutch crawl ../urls/ -solr "http://localhost/nutchsolr"; -threads 5
> -depth
> 3 -topN 100
>
> urls has the "seed.txt" file with some sites.  It definitely is able to get
> pages (finding other hostnames in the lists scrolling through the screen),
> but then it is still skipping with the "batch id (null)" message for
> everything it finds.
>
> Any guidance/advice would be appreciated.
>
> Thanks!
>
> -- Chris
>



-- 
Don't Grow Old, Grow Up... :-)

Re: nutch crawl

Reply via email to