run it on a non-ppc machine, i hate to say it but I've benchmarked nutch on all platforms but sparc, it runs best on a amd 64-opteron machine, I work next to a printshop that has g4's and g5's, they didn't fare well with nutch, if your problem is the dns server then try an intel machine microsoft's dns server is very fast on intel hardware I never have cpu over 2% when rolling with 30 queries a sec lookup(I do have a quad xeon though) Bind is not as fast unfortunately :( some day though I had bind crash on me when I was testing it I set it up to restart but it would drop the pages between restarts. (maybe someday java will take advantage of the altivec and nutch will scream on ppc, then we can run it on our xbox 360's!!!) I'm not bashing mac or ppc hardware, its just if you don't have a Power5 then its not going to do it well
-Jay ----- Original Message ----- From: "Stefan Groschupf" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Wednesday, August 03, 2005 11:53 AM Subject: Re: dns lookup cache? > Ok, I see what you mean. You run a dns server inside your network. > Right? > We run bind 9.2.2 with a 1000 mbit internat connection to our crawler > boxes. > The box has 2 ppc cpus and 2 gig of ram, however in peaks the cpus > are 50 % bussy just by doing dns lookups. > However I don't think the real problem is cpu power or memory. > As the link i posted earlier today mention the bind and java lookup > implementation is not multithreaded and that#s why I personal guess > the nutch crawler runs in a kind of deadlock. > A good reading is: > http://buzzsurf.com/java/dns/ > Hertitrix use javadns and I'm working on a similar / simple solution > that may not need any code change just some system administration > however use a real crawler box local dns cache. > If people are interested I can post results. > > Stefan > > > > Am 03.08.2005 um 17:30 schrieb Jay Pound: > > > you setup your own dns server, a separate machine to your crawling > > box, it > > doesn't have to be powerful, it can be a 500mhz Pentium 3, but you > > need to > > have at least 512mb of ram in it, 1gb recommended, you point your > > fetcher > > machine to the dns server as its primary dns server and presto > > internal dns > > caching!!! > > -J > > PS: the easiest dns server to setup if your a windows person is > > windows 2000 > > server or windows 2003 server, you just enable it and it runs, > > there are > > many dns servers for linux, most distributions come with it on cd, > > mac osx > > server has it also. > > ----- Original Message ----- > > From: "Stefan Groschupf" <[EMAIL PROTECTED]> > > To: <[email protected]> > > Sent: Wednesday, August 03, 2005 11:05 AM > > Subject: Re: dns lookup cache? > > > > > > > >> How you do 'internal' domain caching? > >> Thanks. > >> Stefan > >> Am 03.08.2005 um 16:51 schrieb Jay Pound: > >> > >> > >>> I've got a fast internal dns cache so nutch wont need one, and it > >>> did stop a > >>> lot of the errors with nutch host not found-timeout, most isp's dns > >>> server > >>> is bogged down allready by client requests, if you dump 10000 > >>> clients worth > >>> of dns traffic they can break or not return results so I made my own > >>> internal dns server cache, the machine a quad xeon 4gb ram uses > >>> over 500mb > >>> of ram just for caching of the domains in memory!!! > >>> -Jay > >>> > >>> ----- Original Message ----- > >>> From: "Stefan Groschupf" <[EMAIL PROTECTED]> > >>> To: <[email protected]> > >>> Sent: Wednesday, August 03, 2005 4:19 AM > >>> Subject: dns lookup cache? > >>> > >>> > >>> > >>> > >>>> Hi there, > >>>> does anyhow nutch cache dns lookups. > >>>> I found this paper and section 3.7 gives some very interesting > >>>> information. > >>>> We notice that our crawlers often crash after a set of unknown host > >>>> exceptions. > >>>> We have already one dual cpu box with a 1Gbit network connection > >>>> running BIND. > >>>> > >>>> So I have 2 questions: > >>>> People think is may java domain lookup may be a bottleneck that > >>>> crashs the crawlers? > >>>> Other crawlers have a kind of dns cache would that make sense to > >>>> introduce it to nutch as well? > >>>> > >>>> Thanks for any comments. > >>>> Stefan > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > >>> > >> > >> > >> > > > > > > > > > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
