On 4/24/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Doğacan Güney wrote: > > Hi all, > > > > I have been working on Fetcher2 code lately and I came across this > > particular code (in FetchItemQueue.getFetchItem) that I didn't quite > > understand: > > > > public FetchItem getFetchItem() { > > ... > > long last = endTime.get() + (maxThreads > 1 ? crawlDelay : minCrawlDelay); > > ... > > } > > > > Now, the 'default' politeness behaviour should be 1 thread per host > > and delaying n seconds between successive requests to that host, > > right? But, won't this code wait only minCrawlDelay(which, by default, > > is 0) if maxThreads == 1. > > Yes, that was the intended behavior - normally, you should never use > more than 1 thread per host, unless you have an explicit permission to > do so. > > If multiple threads make requests to the same host, then the crawl delay > parameter loses its usual meaning - see the details of this in comments > to NUTCH-385. However, the sensible way to do is to still provide a way > to limit the maximum rate of requests, and this is what the > minCrawlDelay parameter is for.
I don't get it. The code seems to do exactly the opposite of what you are saying. If maxThreads == 1 then maxThreads > 1 is false thus the expression evaluates to minCrawlDelay not crawlDelay. Shouldn't the expression be (maxThreads > 1 ? minCrawlDelay : crawlDelay) ? > > > > > > I also did not understand why there is a maxThread check at all. Each > > individual thread should wait crawl delay before making another > > request to the same host. Am I missing something here? > > > See the ASCII-art graphs and comments in NUTCH-385 - this is likely not > what is expected. > > Although this JIRA issue is still open, the Fetcher2 code tries to > implement this middle ground solution. OK. I guess this approach is good enough. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers