Re: Example crawl script Nutch 2.1

Bai Shen Fri, 17 May 2013 05:37:15 -0700

I just tested the GeneratorJob portion and it works fine.  I have two
comments, though.

1.  I added braces around the -batchId arg if statement.  I don't like if's
without them.
2.  BatchIds never get cleared.  So if you use the same batchId for
multiple crawl cycles your urls per batch will continue to grow.  There
should probably be some sort of note in the help printout.

On Tue, Apr 30, 2013 at 10:37 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi James,
> Please look for NUTCH-1545 capture batchid...
> If you could review and use this patch it would be very very helpful.
> thank you
> lewis
>
> On Tuesday, April 30, 2013, James Ford <[email protected]> wrote:
> > Thanks for your answer!
> >
> > I think I will create my own modified crawlscript then. But I am pretty
> > confused of how to get a generated batchId? Should I just parse the id
> from
> > the output:
> >
> > GeneratorJob: generated batch id: 1367327604-149897259
> >
> > Or should I get the newly generated batchId from the datastore in my
> script?
> > Any best practices?
> >
> > Thanks
> >
> >
> >
> > --
> > View this message in context:
>
> http://lucene.472066.n3.nabble.com/Example-crawl-script-Nutch-2-1-tp4059960p4059985.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
> --
> *Lewis*
>

Re: Example crawl script Nutch 2.1

Reply via email to