Hi Bai Shen, Thanks for your comments. Can you kindly add those to the relevant jira [0] so that it gets tracked ?
[0] https://issues.apache.org/jira/browse/NUTCH-1545 Thanks, Tejas On Fri, May 17, 2013 at 5:36 AM, Bai Shen <[email protected]> wrote: > I just tested the GeneratorJob portion and it works fine. I have two > comments, though. > > 1. I added braces around the -batchId arg if statement. I don't like if's > without them. > 2. BatchIds never get cleared. So if you use the same batchId for > multiple crawl cycles your urls per batch will continue to grow. There > should probably be some sort of note in the help printout. > > > > > On Tue, Apr 30, 2013 at 10:37 AM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi James, > > Please look for NUTCH-1545 capture batchid... > > If you could review and use this patch it would be very very helpful. > > thank you > > lewis > > > > On Tuesday, April 30, 2013, James Ford <[email protected]> wrote: > > > Thanks for your answer! > > > > > > I think I will create my own modified crawlscript then. But I am pretty > > > confused of how to get a generated batchId? Should I just parse the id > > from > > > the output: > > > > > > GeneratorJob: generated batch id: 1367327604-149897259 > > > > > > Or should I get the newly generated batchId from the datastore in my > > script? > > > Any best practices? > > > > > > Thanks > > > > > > > > > > > > -- > > > View this message in context: > > > > > http://lucene.472066.n3.nabble.com/Example-crawl-script-Nutch-2-1-tp4059960p4059985.html > > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > -- > > *Lewis* > > >

