Hi Sami,
The machine has direct connectivity -- no NAT, and is not running IPV6.
Cheers...
Roger
--------------------------------------------------
From: "Sami Siren" <ssi...@gmail.com>
Sent: Monday, March 30, 2009 5:42 PM
To: <nutch-user@lucene.apache.org>
Subject: Re: Fetcher2 Slow
Roger Dunk wrote:
Andrzej stated in NUTCH-669 that "some people reported performance issues
with Fetcher2, i.e. that it doesn't use the available bandwidth. These
reports are unconfirmed, and they may have been caused by suboptimal URL
/ host distribution in a fetchlist - but it would be good to review the
synchronization and threading aspects of Fetcher2."
To address this, I've tried just now generating a fetchlist using
generate.max.per.host = 1 (which gave me 35,000 unique hosts) to
guarantee unique hosts, but the problem still remains.
Therefore, I believe it's clearly not an issue of suboptimal URL / host
distribution. If you require any further information to confirm my
report, you need only ask!
I have so far seen two sources for slowness, don't know it they are
related to your case:
1. You are using nutch from behind nat box. I experienced this problem
when I did some test crawling from a machine sitting behind adsl router
that did NAT. Soon after starting a crawl the maximum number of NAT
connections was reached in the router and furter connections could only be
made after old ones timeouted from NAT table. These connections were
mostly DNS connections.
2. Your machine has ip6 enabled. This I noticed more recently when I was
wondering relatively slow fetching speed on a box. After disabling ipv6
totally I was able to fetch 2-4 times faster without any other config
changes.
--
Sami Siren