Re: Example crawl script Nutch 2.1

Tejas Patil Fri, 17 May 2013 13:41:23 -0700

Hi Bai Shen,

Thanks for your comments. Can you kindly add those to the relevant jira [0]
so that it gets tracked ?


[0] https://issues.apache.org/jira/browse/NUTCH-1545

Thanks,
Tejas


On Fri, May 17, 2013 at 5:36 AM, Bai Shen <[email protected]> wrote:

> I just tested the GeneratorJob portion and it works fine.  I have two
> comments, though.
>
> 1.  I added braces around the -batchId arg if statement.  I don't like if's
> without them.
> 2.  BatchIds never get cleared.  So if you use the same batchId for
> multiple crawl cycles your urls per batch will continue to grow.  There
> should probably be some sort of note in the help printout.
>
>
>
>
> On Tue, Apr 30, 2013 at 10:37 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi James,
> > Please look for NUTCH-1545 capture batchid...
> > If you could review and use this patch it would be very very helpful.
> > thank you
> > lewis
> >
> > On Tuesday, April 30, 2013, James Ford <[email protected]> wrote:
> > > Thanks for your answer!
> > >
> > > I think I will create my own modified crawlscript then. But I am pretty
> > > confused of how to get a generated batchId? Should I just parse the id
> > from
> > > the output:
> > >
> > > GeneratorJob: generated batch id: 1367327604-149897259
> > >
> > > Or should I get the newly generated batchId from the datastore in my
> > script?
> > > Any best practices?
> > >
> > > Thanks
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> >
> http://lucene.472066.n3.nabble.com/Example-crawl-script-Nutch-2-1-tp4059960p4059985.html
> > > Sent from the Nutch - User mailing list archive at Nabble.com.
> > >
> >
> > --
> > *Lewis*
> >
>

Re: Example crawl script Nutch 2.1

Reply via email to