Is there a bug in 0.7.1 that causes the fetcher.threads.per.host
setting to be ignored?
Why do you think it's getting ignored?
Is it because of the "Exceeded http.max.delays" errors below?
These show up when the fetcher.threads.per.host limit causes a thread
to delay and then loop, because another thread is already accessing a
page from the same host. When a thread has looped more than
http.max.delays times, it triggers the that error. So it's actually a
sign that fetcher.threads.per.host is being used, not ignored.
Looks like you're going after a bunch of pages from the same domain
(fas.org), which means you're going to get a bunch of these errors
even with just three threads.
-- Ken
[snip]
<property>
<name>fetcher.threads.per.host</name>
<value>1</value>
<description>This number is the maximum number of threads that
should be allowed to access a host at one time.</description>
</property>
Fetch Log
060109 202235 fetching http://www.fas.org/irp/news/1998/06/prs_rel21.html
060109 202250 fetch of
http://www.fas.org/irp/news/1998/04/t04141998_t0414asd-3.html failed
with: java.lang.Exception: org.apache.nutch.protocol.RetryLater:
Exceeded http.max.delays: retry later.
060109 202250 fetch of
http://www.fas.org/asmp/campaigns/smallarms/sawgconf.PDF failed
with: java.lang.Exception: org.apache.nutch.protocol.RetryLater:
Exceeded http.max.delays: retry later.
060109 202250 fetching http://www.fas.org/irp/commission/testhaas.htm
060109 202250 fetching http://www.fas.org/asmp/profiles/bahrain.htm
060109 202250 fetching
http://www.fas.org/irp/cia/product/dci_speech_03082001.html
060109 202306 fetching http://www.fas.org/irp/news/1998/06/980609-drug10.htm
060109 202321 fetch of
http://www.fas.org/irp/commission/testhaas.htm failed with:
java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded
http.max.delays: retry later.
060109 202321 fetch of http://www.fas.org/asmp/profiles/bahrain.htm
failed with: java.lang.Exception:
org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays:
retry later.
060109 202321 fetching http://www.fas.org/irp/news/1998/04/980422-terror2.htm
060109 202321 fetching http://www.fas.org/irp//congress/2004_cr/index.html
060109 202321 fetching http://www.fas.org/irp//congress/2001_rpt/index.html
060109 202338 fetching http://www.fas.org/irp/budget/fy98_navy/0601152n.htm
060109 202354 fetching http://www.fas.org/irp/dia/product/cent21strat.htm
060109 202408 fetch of
http://www.fas.org/irp/news/1998/04/980422-terror2.htm failed with:
java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded
http.max.delays: retry later.
060109 202408 fetch of
http://www.fas.org/irp//congress/2004_cr/index.html failed with:
java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded
http.max.delays: retry later.
060109 202408 fetching http://www.fas.org/faspir/2001/v54n2/qna.htm
060109 202408 fetching http://www.fas.org/graphics/predator/index.htm
060109 202409 fetching http://www.fas.org/irp/doddir/dod/5200-1r/chapter_6.htm
060109 202425 fetching http://www.fas.org/irp//congress/1995_hr/140.htm
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general