Doğacan Güney wrote:
> Hi all,
> 
> I have been working on Fetcher2 code lately and I came across this
> particular code (in FetchItemQueue.getFetchItem) that I didn't quite
> understand:
> 
> public FetchItem getFetchItem() {
>  ...
>  long last = endTime.get() + (maxThreads > 1 ? crawlDelay : minCrawlDelay);
>  ...
> }
> 
> Now, the 'default' politeness behaviour should be 1 thread per host
> and delaying n seconds between successive requests to that host,
> right? But, won't this code wait only minCrawlDelay(which, by default,
> is 0) if maxThreads == 1.

Yes, that was the intended behavior - normally, you should never use 
more than 1 thread per host, unless you have an explicit permission to 
do so.

If multiple threads make requests to the same host, then the crawl delay 
parameter loses its usual meaning - see the details of this in comments 
to NUTCH-385. However, the sensible way to do is to still provide a way 
to limit the maximum rate of requests, and this is what the 
minCrawlDelay parameter is for.


> 
> I also did not understand why there is a maxThread check at all. Each
> individual thread should wait crawl delay before making another
> request to the same host. Am I missing something here?


See the ASCII-art graphs and comments in NUTCH-385 - this is likely not 
what is expected.

Although this JIRA issue is still open, the Fetcher2 code tries to 
implement this middle ground solution.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to