[email protected] wrote:
On Feb 27, 2009 5:14pm, Andrzej Bialecki <[email protected]> wrote:
Michael Chan wrote:


Hi,



I'm trying to generate multiple segments so that I can run several fetching

tasks on a *single* machine. This is just to reduce the effort needed to

refetch after a crash. Is the -numFetchers option still available in 0.9?

When I use -numFetchers 4, it seems to be ignored and the generator

generates one partition. Has it been deprecated? If so, is there an

alternative?


The numFetchers option is poorly named - it still works with the current code but not in the same way as with Nutch 0.7: now it determines the number of fetching tasks, and this happens ONLY when you run in distributed mode (on a Hadoop cluster). In local mode it has no effect.

Currently there is no support for generating multiple segments in one go. However, if you set generator.update.crawldb to true, you can generate multiple segments in multiple runs of Generator, and then fetch / update these segments in arbitrary order.


I see. How do I indicate how large each segment should be? Thanks.

Use -topN option.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to