Hi All,

I'm using Nutch 2.x along with Hbase and SOLR. I have the following
question.

(a) Lets say I run a crawl (generate, fetch, parse, update, etc.) with
Batch ID  '1' and set the depth to 3.
(b) After this, I may still have some pages unfetched and they should be
marked with Batch ID 1'

(c) I then inject additional URLS
(d) Run a crawl (generate, fetch, parse, update, etc.) with Batch ID  '2'

My question is what pages get assigned this new batch id? Do the pages from
the previous crawl (unfetched pages) get assigned this new batch id? Or
only newly injected pages.

I guess I don't fully understand the concept of batch id and how to utilize
it. I already searched the Nutch site and past posts, but could not find
clarification on this.

Thank you for your help

Reply via email to