On 4/24/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Doğacan Güney wrote:
> > Hi all,
> >
> > I have been working on Fetcher2 code lately and I came across this
> > particular code (in FetchItemQueue.getFetchItem) that I didn't quite
> > understand:
> >
> > public FetchItem getFetchItem() {
> >  ...
> >  long last = endTime.get() + (maxThreads > 1 ? crawlDelay : minCrawlDelay);
> >  ...
> > }
> >
> > Now, the 'default' politeness behaviour should be 1 thread per host
> > and delaying n seconds between successive requests to that host,
> > right? But, won't this code wait only minCrawlDelay(which, by default,
> > is 0) if maxThreads == 1.
>
> Yes, that was the intended behavior - normally, you should never use
> more than 1 thread per host, unless you have an explicit permission to
> do so.
>
> If multiple threads make requests to the same host, then the crawl delay
> parameter loses its usual meaning - see the details of this in comments
> to NUTCH-385. However, the sensible way to do is to still provide a way
> to limit the maximum rate of requests, and this is what the
> minCrawlDelay parameter is for.

I don't get it. The code seems to do exactly the opposite of what you
are saying. If maxThreads == 1 then maxThreads > 1 is false thus the
expression evaluates to minCrawlDelay not crawlDelay. Shouldn't the
expression be (maxThreads > 1 ? minCrawlDelay : crawlDelay) ?

>
>
> >
> > I also did not understand why there is a maxThread check at all. Each
> > individual thread should wait crawl delay before making another
> > request to the same host. Am I missing something here?
>
>
> See the ASCII-art graphs and comments in NUTCH-385 - this is likely not
> what is expected.
>
> Although this JIRA issue is still open, the Fetcher2 code tries to
> implement this middle ground solution.

OK. I guess this approach is good enough.

>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to