Hi Pierre, Can you supply some additional information:
1. What is the status of that url now ? if say it is un-fetched in first round, then it will considered again in 2nd round and so on. Maybe there might be something with that url which causes some exception and thus re-tried by nutch in all subsequent rounds. 2. I guess you have not modified the fetch interval for urls. Typically its set to 30 days but if changed to say 4 secs by user then it will cause that url to be eligible to be fetched in the next round itself. 3. Did you observe any exceptions in any logs ? please share those. Thanks, Tejas On Sat, Oct 13, 2012 at 10:07 AM, Pierre Nogues <[email protected]> wrote: > > Hello, > > I'm using nutch 2.1 with mysql and when I do a simple "bin/nutch crawl > seed/ -depth 5 -topN 10000", I noticed nutch fetch 3 or 4 times the same > URL during the crawl, why ? > > I just configured nutch to local crawl a website (restriction in > regex-urlfilter), everything else looks ok on mysql. > > nuch-site.xml : http://pastebin.com/Mx9s5Kfz > > >

