The crawl script doesn't accept Batch ID. So in order to use Batch ID you would run the commands separately which would not involve depth. Depth is just the number of times to run the generate, fetch, parse, update cycle.
Any unfetched pages will not have a Batch ID. The Batch ID only applies to the pages that were generated. By default all of the unfetched and injected pages are available to be generated with Batch ID 2. Batch ID is useful because it allows you to run fetch, parse, and index commands only on the generated urls instead of the entire database. Hope that makes sense. On Wed, Jul 10, 2013 at 3:52 PM, Mariam Salloum <[email protected]>wrote: > Hi All, > > > I'm using Nutch 2.x along with Hbase and SOLR. I have the following > question. > > (a) Lets say I run a crawl (generate, fetch, parse, update, etc.) with > Batch ID '1' and set the depth to 3. > (b) After this, I may still have some pages unfetched and they should be > marked with Batch ID 1' > > (c) I then inject additional URLS > (d) Run a crawl (generate, fetch, parse, update, etc.) with Batch ID '2' > > My question is what pages get assigned this new batch id? Do the pages from > the previous crawl (unfetched pages) get assigned this new batch id? Or > only newly injected pages. > > I guess I don't fully understand the concept of batch id and how to utilize > it. I already searched the Nutch site and past posts, but could not find > clarification on this. > > Thank you for your help >

