Hi, Generating from a 100K crawlDB should be quite fast. Have you checked that the IP resolution is turned off? Do you have any special URL filters that could take a lot of time to process? Generating and merging tend to take more and more time as the crawlDB grows but this should not be too much of an issue at your scale.
Could you dump the stats of your crawlDB and tell us how long the generation step takes? > One problem I've run into so far is the amount of time the generate command > increases with each iteration. The only item that really seems to grow out > of control is the unfetched URLs, which is expected with such a small > sample > of web pages, but it doesn't make sense to me as to why it would take so > long to generate a list of 1000 urls to fetch out of a list of 100k. Those > are small numbers in terms of database and computing in general. > > Julien -- DigitalPebble Ltd http://www.digitalpebble.com