Hi,
I have about 300 hundred sites. on a specific subject, to start with and I have used both, the crawl method and the whole internet.  Once for testing purposes, I crawled those sites to depth 2, with the expiring time of just 1 day (I set this in the site.xml file)  and got about 3,000 sites. .  After that  1 day I used the command "bin/nutch generate db segments" with the only flag "-refetchonly". When I did a fetch of the generated segment, I got about 30,000 sites.  If, besides the refetchonly I have used the -topN 3000, for instances, I would get diferent sites, not the original ones. So, I really dont't know how, begining with a initial set of fetched or crawled sites, just to perform the maintenance of them adding only modified or new sites to the ones that you already have.
Tanks
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.322 / Virus Database: 267.4.1 - Release Date: 2/6/2005

Reply via email to