Ok, I see what you mean. You run a dns server inside your network.
Right?
We run bind 9.2.2 with a 1000 mbit internat connection to our crawler
boxes.
The box has 2 ppc cpus and 2 gig of ram, however in peaks the cpus
are 50 % bussy just by doing dns lookups.
However I don't think the real problem is cpu power or memory.
As the link i posted earlier today mention the bind and java lookup
implementation is not multithreaded and that#s why I personal guess
the nutch crawler runs in a kind of deadlock.
A good reading is:
http://buzzsurf.com/java/dns/
Hertitrix use javadns and I'm working on a similar / simple solution
that may not need any code change just some system administration
however use a real crawler box local dns cache.
If people are interested I can post results.
Stefan
Am 03.08.2005 um 17:30 schrieb Jay Pound:
you setup your own dns server, a separate machine to your crawling
box, it
doesn't have to be powerful, it can be a 500mhz Pentium 3, but you
need to
have at least 512mb of ram in it, 1gb recommended, you point your
fetcher
machine to the dns server as its primary dns server and presto
internal dns
caching!!!
-J
PS: the easiest dns server to setup if your a windows person is
windows 2000
server or windows 2003 server, you just enable it and it runs,
there are
many dns servers for linux, most distributions come with it on cd,
mac osx
server has it also.
----- Original Message -----
From: "Stefan Groschupf" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, August 03, 2005 11:05 AM
Subject: Re: dns lookup cache?
How you do 'internal' domain caching?
Thanks.
Stefan
Am 03.08.2005 um 16:51 schrieb Jay Pound:
I've got a fast internal dns cache so nutch wont need one, and it
did stop a
lot of the errors with nutch host not found-timeout, most isp's dns
server
is bogged down allready by client requests, if you dump 10000
clients worth
of dns traffic they can break or not return results so I made my own
internal dns server cache, the machine a quad xeon 4gb ram uses
over 500mb
of ram just for caching of the domains in memory!!!
-Jay
----- Original Message -----
From: "Stefan Groschupf" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, August 03, 2005 4:19 AM
Subject: dns lookup cache?
Hi there,
does anyhow nutch cache dns lookups.
I found this paper and section 3.7 gives some very interesting
information.
We notice that our crawlers often crash after a set of unknown host
exceptions.
We have already one dual cpu box with a 1Gbit network connection
running BIND.
So I have 2 questions:
People think is may java domain lookup may be a bottleneck that
crashs the crawlers?
Other crawlers have a kind of dns cache would that make sense to
introduce it to nutch as well?
Thanks for any comments.
Stefan
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers