Hi Sami,

The machine has direct connectivity -- no NAT, and is not running IPV6.

Cheers...
Roger

--------------------------------------------------
From: "Sami Siren" <ssi...@gmail.com>
Sent: Monday, March 30, 2009 5:42 PM
To: <nutch-user@lucene.apache.org>
Subject: Re: Fetcher2 Slow

Roger Dunk wrote:
Andrzej stated in NUTCH-669 that "some people reported performance issues with Fetcher2, i.e. that it doesn't use the available bandwidth. These reports are unconfirmed, and they may have been caused by suboptimal URL / host distribution in a fetchlist - but it would be good to review the synchronization and threading aspects of Fetcher2."

To address this, I've tried just now generating a fetchlist using generate.max.per.host = 1 (which gave me 35,000 unique hosts) to guarantee unique hosts, but the problem still remains.

Therefore, I believe it's clearly not an issue of suboptimal URL / host distribution. If you require any further information to confirm my report, you need only ask!


I have so far seen two sources for slowness, don't know it they are related to your case:

1. You are using nutch from behind nat box. I experienced this problem when I did some test crawling from a machine sitting behind adsl router that did NAT. Soon after starting a crawl the maximum number of NAT connections was reached in the router and furter connections could only be made after old ones timeouted from NAT table. These connections were mostly DNS connections.

2. Your machine has ip6 enabled. This I noticed more recently when I was wondering relatively slow fetching speed on a box. After disabling ipv6 totally I was able to fetch 2-4 times faster without any other config changes.

--
 Sami Siren

Reply via email to