Is there a bug in 0.7.1 that causes the fetcher.threads.per.host setting to be ignored?

Why do you think it's getting ignored?

Is it because of the "Exceeded http.max.delays" errors below?

These show up when the fetcher.threads.per.host limit causes a thread to delay and then loop, because another thread is already accessing a page from the same host. When a thread has looped more than http.max.delays times, it triggers the that error. So it's actually a sign that fetcher.threads.per.host is being used, not ignored.

Looks like you're going after a bunch of pages from the same domain (fas.org), which means you're going to get a bunch of these errors even with just three threads.

-- Ken


[snip]

<property>
 <name>fetcher.threads.per.host</name>
 <value>1</value>
 <description>This number is the maximum number of threads that
   should be allowed to access a host at one time.</description>
</property>




Fetch Log

060109 202235 fetching http://www.fas.org/irp/news/1998/06/prs_rel21.html
060109 202250 fetch of http://www.fas.org/irp/news/1998/04/t04141998_t0414asd-3.html failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 202250 fetch of http://www.fas.org/asmp/campaigns/smallarms/sawgconf.PDF failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
060109 202250 fetching http://www.fas.org/irp/commission/testhaas.htm
060109 202250 fetching http://www.fas.org/asmp/profiles/bahrain.htm
060109 202250 fetching http://www.fas.org/irp/cia/product/dci_speech_03082001.html
060109 202306 fetching http://www.fas.org/irp/news/1998/06/980609-drug10.htm
060109 202321 fetch of http://www.fas.org/irp/commission/testhaas.htm failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 202321 fetch of http://www.fas.org/asmp/profiles/bahrain.htm failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
060109 202321 fetching http://www.fas.org/irp/news/1998/04/980422-terror2.htm
060109 202321 fetching http://www.fas.org/irp//congress/2004_cr/index.html
060109 202321 fetching http://www.fas.org/irp//congress/2001_rpt/index.html
060109 202338 fetching http://www.fas.org/irp/budget/fy98_navy/0601152n.htm
060109 202354 fetching http://www.fas.org/irp/dia/product/cent21strat.htm
060109 202408 fetch of http://www.fas.org/irp/news/1998/04/980422-terror2.htm failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 202408 fetch of http://www.fas.org/irp//congress/2004_cr/index.html failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
060109 202408 fetching http://www.fas.org/faspir/2001/v54n2/qna.htm
060109 202408 fetching http://www.fas.org/graphics/predator/index.htm
060109 202409 fetching http://www.fas.org/irp/doddir/dod/5200-1r/chapter_6.htm
060109 202425 fetching http://www.fas.org/irp//congress/1995_hr/140.htm


--
Ken Krugler
Krugle, Inc.
+1 530-470-9200


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to