Thanks for the response Lewis. I'll give nucth 2.3.1 a spin later tonight.
I didn't have success with batchId. I thought I could overwrite this in the DB with 123 and then ./fetch 123 would get all urls marked with 123. I seem to be missing where the generate command stores its segments. For now I'm happy looking through the code for the first time. I think I'll try building a generator or fetch job which can prioritize/boost domains. I'm no Java wiz but it'll be a good exercise regardless if it works or not. Thanks, Lex On Tue, Jan 12, 2016 at 4:13 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Lex, > > On Mon, Jan 11, 2016 at 2:16 PM, <[email protected]> > wrote: > > > > > I'm using Nutch 2.3. > > > > Please note we are on the very cusp of releasing Apache Nutch 2.3.1 which > has a number of bug fixes and improvements. There is a VOTE out right now > for it. If you have time please consider taking it for a spin and providing > us with feedback. Thanks. > > > > > > After thinking about it more I see batchId. And after running ./generate > > -topN x I see a batch id generated. I wonder if its safe to overwrite the > > batchId to 123 and then run ./fetch 123? > > > > > When you say overwrite the batch id you mean passing the -batchId 123 > argument to GeneratorJob? Yes I think that this is OK. Baring in mind that > the batchId is autogenerated anyways, I am not sure that this would matter > much. All that it would do is enable you to remember that you previously > generated a batch with ID 123 :) > Thanks >

