Hi, I need execute nutch focus over seed file, no more urls added in every cycle.
I am executing nutch with the following scenarios: 1. Invoking crawl script without updatedb job: The time of execution for every cycle is 15 minutes, but in every cycle the urls processing are the same. The total time for nutch execution is around 16 hours. Because the urls in every cycle are the same? 2. Crawling normal (using updateddb): if I am using updatedb job, how can nutch make fetch only urls of seed file without add new urls to crawldb? I am trying execute nutch using updatedb job with -noAdditions, so that it serves this option? I was reading the nutch wiki but is not clear the performance of -noAdditions option Conditions for every case: the configuration used for proccessing is 360 urls in every cycle. The seed file contains around 25000 urls. (limit parameter in crawl bash script is 25000 and sizeFetchlist is 360). Thanks, Andres

