yeah, I can saturate 10 mbit with 150 threads, any more and webpages will drop. check your error rate when downloading if your having a lot of error pages then drop back your threads to a lower number. your going to have to find the sweet spot for your connection!!! -J ----- Original Message ----- From: "Christophe Noel" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, August 02, 2005 8:53 AM Subject: Re: Fetcher delays - benchmarks
> Ok thank you very much. > > Something strange : i tried with 1600 threads (!!!!) instead of 800 and > it goes from 2,5 Mbits (average) to 5 ... > > Isn't these parameter (1600 threads) really to big numbers ??? > > > Jay Pound wrote: > > >I'm able to easily saturate my 10mbit connx, but it takes a powerful > >computer, if your computer is not so powerful try to fetch with > >the -noParsing flag, it will offload the parsing processing untill later, > >even a quad pentium 3 xeon 700mhz with 4gb of ram can only saturate about > >5mbit, I've used 3ghz xeon w hyperthreading and it can do 10mbit (barely) > >with parsing on, my new dual core opteron has about 10% cpu load with > >parsing on and my athlon 64 3500+ can also do it just fine. > >-J > >PS: if you have a slow(er) computer fetch without parsing you can use a > >faster computer to parse the data after the fetch is completed. > > > >BTW: for those who do not know it takes about 10% upstream bandwidth to > >fetch webpages with 100 threads, so if you have a 10mbit connx but only > >512kbit upload your max download is around 5-6mbit > >found this out with roadrunners gamer connx 10mbit in 512kbit out > >----- Original Message ----- > >From: "Christophe Noel" <[EMAIL PROTECTED]> > >To: <[email protected]> > >Sent: Tuesday, August 02, 2005 6:09 AM > >Subject: Fetcher delays - benchmarks > > > > > > > > > >>Hello, > >> > >>Following to some discussions, developpers mails, ... I tried to get the > >>best performances (pages/second) for the following case : > >> > >>- 120 web servers to crawl > >>- 10 Mbits/s connexion > >> > >>I reached about 3 Mbits/s average fetching speed with following > >>parameters (unpolite mode) : > >> > >>- fetcher.server.delay = 1.0 > >>- fetcher.per.host = 20 > >>- threads = 800 > >>- http.timeout = 5000 > >> > >>I see that Nutch is very slow for the first minuts ... performances > >>increase with time : it is now at 2500 kb/s and was at 2000kb/s 5 > >>minutes ago. > >> > >> segment 20050802115311, 7200 pages, 446 errors, 231654440 bytes, 706020 > >> > >> > >ms > > > > > >>050802 120623 148 status: 10.198011 pages/s, 2563.3838 kb/s, 32174.227 > >>bytes/page > >> > >>I read Doug Cutting mail about fetcher.max.delay, but i still don't > >>understand how i cannot reach 10 mbits/s speed with 120 different servers. > >> > >>Any tips to increase my performances please ? > >> > >> > >>Thank you very much. > >> > >>Christophe Noël > >>Cetic Grid Data Mining > >> > >> > >> > >> > > > > > > > > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
