Hello,

Following to some discussions, developpers mails, ... I tried to get the best performances (pages/second) for the following case :

- 120 web servers to crawl
- 10 Mbits/s connexion

I reached about 3 Mbits/s average fetching speed with following parameters (unpolite mode) :

- fetcher.server.delay = 1.0
- fetcher.per.host = 20
- threads = 800
- http.timeout = 5000

I see that Nutch is very slow for the first minuts ... performances increase with time : it is now at 2500 kb/s and was at 2000kb/s 5 minutes ago.

segment 20050802115311, 7200 pages, 446 errors, 231654440 bytes, 706020 ms
050802 120623 148 status: 10.198011 pages/s, 2563.3838 kb/s, 32174.227 bytes/page

I read Doug Cutting mail about fetcher.max.delay, but i still don't understand how i cannot reach 10 mbits/s speed with 120 different servers.

Any tips to increase my performances please ?


Thank you very much.

Christophe Noël
Cetic Grid Data Mining


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to