FYI : there is an implementation of such a modified Generator in

DigitalPebble Ltd

2009/10/5 Andrzej Bialecki <>

> Eric wrote:
>> My plan is to crawl ~1.6M TLD's to a depth of 2. Is there a way I can
>> crawl it in increments of 100K? e.g. crawl 100K 16 times for the TLD's then
>> crawl the links generated from the TLD's in increments of 100K?
> Yes. Make sure that you have the "generate.update.db" property set to true,
> and then generate 16 segments each having 100k urls. After you finish
> generating them, then you can start fetching.
> Similarly, you can do the same for the next level, only you will have to
> generate more segments.
> This could be done much simpler with a modified Generator that outputs
> multiple segments from one job, but it's not implemented yet.
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>  Contact: info at sigram dot com

Reply via email to