Ok now with the generate and -noFilter -noNorm option, the fetch is starting almost directly.
I would really like to have an exhaustive pipeline of how filtering/normalizing urls are done among all different steps of a crawl to understand side effects of what i'm doing. >From what i've found updatedb can also filter/normalize urls, but it also normalize crawldb urls (which should take a very long time too). What i want (i think ^^) is to filter/normalize only discovered urls once. Is there a way to do that? Or am i completely wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Very-long-time-just-before-fetching-and-just-after-parsing-tp4037673p4037886.html Sent from the Nutch - User mailing list archive at Nabble.com.

