Jason Camp wrote:
Hi,
I'm trying to gage whether one crawl server is performing well, and I'm having a tough time trying to determine if I could increase settings to gain faster crawls, or if I'm approaching the max the server can handle. The server is a dual AMD Althon 2200 with 2GB of ram hanging off of a dedicated 10Mb connection. When processing 1 million url segment, I see these speeds in the log:

281147 pages, 142413 errors, 11.4 pages/s, 1918 kb/s,

What do you have the "fetcher.threads.fetch" value set to (in nutch-site.xml)?

You may also want to ensure you are using good values for http.max.delays and http.timeout.

With a similar machine I am able to pull ~30 pages/sec using the following settings:

        - http.max.delays 5
        - http.timeout 5000
        - fetcher.threads.fetch 256

HTH,
-Shawn

Reply via email to