Hi and thanks Ferdy, It seems that since i'm using -noFilter and -noNorm with "nutch generate ..." everything is going more quicky (by the way, my version of nutch is 1.6)
Now i would like to optimize my crawling loop since i don't want to reindex everything with solrindex, and also only add new discovered links to linkdb. Here is my loop content : bin/nutch generate crawl/crawldb crawl/segments -topN 10000 -noFilter -noNorm s2=`ls -d crawl/segments/2* | tail -1` bin/nutch fetch $s2 bin/nutch parse $s2 bin/nutch updatedb crawl/crawldb $s2 bin/nutch generate crawl/crawldb crawl/segments -topN 10000 -noFilter -noNorm s3=`ls -d crawl/segments/2* | tail -1` bin/nutch fetch $s3 bin/nutch parse $s3 bin/nutch updatedb crawl/crawldb $s3 bin/nutch invertlinks crawl/linkdb -dir crawl/segments bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/* i've read the doc about invertlinks and solrindex, but i'm still not undertanding how i can invertlinks / solrindex only for the last segments (here $s2 and $s3). Could someone tell me how to set my command line to something like : bin/nutch invertlinks crawl/linkdb -dir $s2 $s3 bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb $s2 $s3 I already have about 1000000 indexed urls and i don't really want to break something by making wrong tests. My tool will be used for press coverage (search new articles and store them for making data reporting). So i'll need to have a quick loop so the site database (currently 2000 urls) will always have all urls indexed (would be critical to miss some important news just because the crawl is taking too much time). -- View this message in context: http://lucene.472066.n3.nabble.com/Very-long-time-just-before-fetching-and-just-after-parsing-tp4037673p4038583.html Sent from the Nutch - User mailing list archive at Nabble.com.

