Hi Jesse,

no problem. Feel free to post your comments / bug fixes / suggestions on the
JIRA NUTCH-762

Thanks
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/11/4 Jesse Hires <jhi...@gmail.com>

> My apologies. missed a patch option :-P
> Must need more coffee.
> Jesse
>
> int GetRandomNumber()
> {
>   return 4; // Chosen by fair roll of dice
>                // Guaranteed to be random
> } // xkcd.com
>
>
>
> On Tue, Nov 3, 2009 at 8:08 PM, Jesse Hires <jhi...@gmail.com> wrote:
>
> > Julien,
> > I tried to apply your patch because I was curious.
> > $ patch < NUTCH-762-MultiGenerator.patch
> >
> > but this seems to drop the two java files into the root directory instead
> > of
> > src/java/org/apache/nutch/crawl/URLPartitioner.java
> > src/java/org/apache/nutch/crawl/MultiGenerator.java
> >
> > But if I copy the files to those locations, I get compile errors.
> > I'm up to date on the svn trunk.
> > Did I miss a step?
> >
> >
> > Jesse
> >
> > int GetRandomNumber()
> > {
> >    return 4; // Chosen by fair roll of dice
> >                 // Guaranteed to be random
> > } // xkcd.com
> >
> >
> >
> >
> > On Tue, Nov 3, 2009 at 7:09 AM, Julien Nioche <
> > lists.digitalpeb...@gmail.com> wrote:
> >
> >> FYI : there is an implementation of such a modified Generator in
> >> http://issues.apache.org/jira/browse/NUTCH-762
> >>
> >> Julien
> >> --
> >> DigitalPebble Ltd
> >> http://www.digitalpebble.com
> >>
> >> 2009/10/5 Andrzej Bialecki <a...@getopt.org>
> >>
> >> > Eric wrote:
> >> >
> >> >> My plan is to crawl ~1.6M TLD's to a depth of 2. Is there a way I can
> >> >> crawl it in increments of 100K? e.g. crawl 100K 16 times for the
> TLD's
> >> then
> >> >> crawl the links generated from the TLD's in increments of 100K?
> >> >>
> >> >
> >> > Yes. Make sure that you have the "generate.update.db" property set to
> >> true,
> >> > and then generate 16 segments each having 100k urls. After you finish
> >> > generating them, then you can start fetching.
> >> >
> >> > Similarly, you can do the same for the next level, only you will have
> to
> >> > generate more segments.
> >> >
> >> > This could be done much simpler with a modified Generator that outputs
> >> > multiple segments from one job, but it's not implemented yet.
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrzej Bialecki     <><
> >> >  ___. ___ ___ ___ _ _   __________________________________
> >> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> >> > http://www.sigram.com  Contact: info at sigram dot com
> >> >
> >> >
> >>
> >
> >
>

Reply via email to