>
>
> This could be done much simpler with a modified Generator that outputs
> multiple segments from one job, but it's not implemented yet.
>

This would also be more efficient as crawlDB operations such as generate or
update take more time as the crawlDB grows (unlike fetch and parse which are
proportional to the size of the fetchlist). When the crawlDB sizes in
billions of URL the fetching / parsing takes relatively little time.

generate.update.db requires to read and write a whole crawlDB everytime but
I suppose that it would be fine for a small crawlDB

J.

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to