P4 2.4/HTT, 512MB RAM, 10Mb pipe. running FreeBSD 7.0 is my current machine of choice. Without our custom plugins is ~32 pages/s.

I've tried a number of different machines/configurations so far as well:

Celeron 800, 256MB with FreeBSD 6.0 - This one seemed to max out at about 4.5 - 5 pages/s, far less than I thought. From the profiling information it seemed to be cpu bound.

Dual 600, 1G with FreeBSD 6.0 - ~20 pages/s

P4 2.4/HTT, 1GB with FreeBSD 4.1 - This was the biggest surprise as it maxed out at about 4 - 5 pages/s.

Celeron 2.4, 256MB with FreeBSD 5.3 - ~30 pages/s.

All of them have been using the same pipe, I've tried a few dns servers for each and all have been using jdk 1.4.2

Configurations ranged anywhere from 50 -> 200 threads with the results here being the optimal. The rest of the settings were default, or close to. I may have tweaked max threads/host and max delays by a couple on one or two of the boxes. This resulted in less errors, but didn't affect the overall speed significantly.

From the profiling information, the celeron 800 seemed to be CPU bound and that was its limiting factor. The box with FreeBSD 4.1, as far as I can tell, was slowed down by OS issues which seemed to have been fixed in 5. I believe that 5 has much improved threading capability and networking performance which seemed to make the difference.

As a side note, does anyone have any recommendations for profiling software? I've used the standard hprof, which slows down the process to much for my needs and jmp which seems pretty unstable.

-Ken

AJ Chen wrote:
I noticed the same problem. My temp solution is to fetch smaller number of
pages, say 200k, per cycle so that the slow-down in each cycle won't make
too much impact.
But, I also run into another problem: the start-up download speed varies
from run to run. Most of time, it's running at speed (1 page/s) that is much
slower than my bandwidth (1.5mbps) allows. What bandwidth and hardware do
you use?

AJ

On 10/27/05, Ken van Mulder <[EMAIL PROTECTED]> wrote:

Hey folks,

I'm using the mapred branch on a FreeBSD 7.0 box to do fetchs of a 300k
url list.

Initially, its able to reach ~25 pages/s with 150 threads. The fetcher
gets progressivly slower though, dropping down to about ~15 pages/s
after about 2-3 hours or so and continues to slow down. I've seen a few
references on these lists to the issue, but I'm not clear on if its
expected behaviour or if there's a solution to it? I've also noticed
that the process takes up more and more memory as it runs, is this
expected as well?

Also, I seem to have a problem with the fetcher hanging at a certain
point. At about half way through the list it will continue to run (chew
up cpu cycles) but with no output, stack traces or anything. The CPU
usage will be near 100%, memory usage will have gotten pretty close to
the boxes limit, and it will sit there for hours. I'm trying to run it
again with a profiler to see if I can figure out what its doing.

Has anyone run into a similar problem?

--
Ken van Mulder
Wavefire Technologies Corporation

http://www.wavefire.com
250.717.0200 (ext 113)




--
Ken van Mulder
Wavefire Technologies Corporation

http://www.wavefire.com
250.717.0200 (ext 113)

Reply via email to