Doğacan Güney wrote: > Hi all, > > I have been working on Fetcher2 code lately and I came across this > particular code (in FetchItemQueue.getFetchItem) that I didn't quite > understand: > > public FetchItem getFetchItem() { > ... > long last = endTime.get() + (maxThreads > 1 ? crawlDelay : minCrawlDelay); > ... > } > > Now, the 'default' politeness behaviour should be 1 thread per host > and delaying n seconds between successive requests to that host, > right? But, won't this code wait only minCrawlDelay(which, by default, > is 0) if maxThreads == 1.
Yes, that was the intended behavior - normally, you should never use more than 1 thread per host, unless you have an explicit permission to do so. If multiple threads make requests to the same host, then the crawl delay parameter loses its usual meaning - see the details of this in comments to NUTCH-385. However, the sensible way to do is to still provide a way to limit the maximum rate of requests, and this is what the minCrawlDelay parameter is for. > > I also did not understand why there is a maxThread check at all. Each > individual thread should wait crawl delay before making another > request to the same host. Am I missing something here? See the ASCII-art graphs and comments in NUTCH-385 - this is likely not what is expected. Although this JIRA issue is still open, the Fetcher2 code tries to implement this middle ground solution. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers