Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Gerry Creager gerry.crea...@tamu.edu writes: Can you say something about any tuning you did to get decent results? To get the lowest latency, turn off rx interrupt coalescence, either with ethtool or module parameters, depending on the driver. Of course, you may not want to turn it off

[Beowulf] Re: HPC fault tolerance using virtualization)

2009-06-29 Thread Dave Love
Greg Lindahl lind...@pbm.com writes: What I typically see from smartd is alerts when one or more sectors has already gone bad, although that tends not to be something that will clobber the running job. How should it be configured to do better (without noise)? That isn't noise, that's

[Beowulf] Re: dedupe filesystem

2009-06-29 Thread Dave Love
Ashley Pittman ash...@pittman.co.uk writes: If you relied on the md5 sum alone there would be collisions and those collisions would result in you losing data. The question is whether the probability of collisions is high compared with other causes -- presumably hardware, assuming no-one puts

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Gerry Creager
Dave Love wrote: Gerry Creager gerry.crea...@tamu.edu writes: Can you say something about any tuning you did to get decent results? To get the lowest latency, turn off rx interrupt coalescence, either with ethtool or module parameters, depending on the driver. Of course, you may not want to

Re: [Beowulf] Re: dedupe filesystem

2009-06-29 Thread Gerry Creager
Dave Love wrote: Ashley Pittman ash...@pittman.co.uk writes: If you relied on the md5 sum alone there would be collisions and those collisions would result in you losing data. The question is whether the probability of collisions is high compared with other causes -- presumably hardware,

Re: [Beowulf] Re: dedupe filesystem

2009-06-29 Thread Joe Landman
Gerry Creager wrote: Ob-Beowulf: You can run Venti on GNU/Linux,² but I don't know how the current implementation performs. Also, GlusterFS has a `data de-duplication translator' on its roadmap, which I didn't see mentioned. Our initial results with a GlusterFS implementation led us back to

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Gerry Creager gerry.crea...@tamu.edu writes: I had rather nasty results with tg3 and abandoned it. We're using bnx2 now. The latest iteration seems (guardedly) better than the last one. I thought that they were for different hardware (NetXtreme I c.f. NetXtreme II, according to

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Scott Atchley atch...@myri.com writes: When I test Open-MX, I turn interrupt coalescing off. I run omx_pingpong to determine the lowest latency (LL). If the NIC's driver allows one to specify the interrupt value, I set it to LL-1. Right, and that's what I did before, with sensible results

[Beowulf] Xeon Nehalem 5500 series (socket 1366) DP motherboard recommendations/experiences ...

2009-06-29 Thread richard . walsh
All, I am putting together a bill of materials for a small cluster based on the Xeon Nehalem 5500 series. What dual-socket motherboards (ATX and ATX-extended) are people happy with? Which ones should I avoid? Thanks much, Richard Walsh Thrashing River Computing

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 12:10 PM, Dave Love wrote: When I test Open-MX, I turn interrupt coalescing off. I run omx_pingpong to determine the lowest latency (LL). If the NIC's driver allows one to specify the interrupt value, I set it to LL-1. Right, and that's what I did before, with sensible

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 1:44 PM, Scott Atchley wrote: Right, and that's what I did before, with sensible results I thought. Repeating it now on Centos 5.2 and OpenSuSE 10.3, it doesn't behave sensibly, and I don't know what's different from the previous SuSE results apart, probably, from the minor

Re: [Beowulf] typical latencies for gigabit ethernet

2009-06-29 Thread Rahul Nabar
On Sat, Jun 27, 2009 at 5:21 PM, Mark Hahnh...@mcmaster.ca wrote: seems to be fairly variable.  let's say 50 +- 20 microseconds. Setups is a simple: server--switch--server. it may be instructive to try a server-server test case. Hmm...well I must be doing something terribly wrong then. Our

Re: [Beowulf] typical latencies for gigabit ethernet

2009-06-29 Thread Mark Hahn
Hmm...well I must be doing something terribly wrong then. Our latencies are in the 140 microseconds range (as revealed by ping) well, ping is not _really_ a benchmark ;) but it does sound like you have interrupt coalescing enabled. on our dl145g2 nodes (BCM95721), I can peel ~40 us off ping

Re: [Beowulf] typical latencies for gigabit ethernet

2009-06-29 Thread Rahul Nabar
On Mon, Jun 29, 2009 at 3:15 PM, Mark Hahnh...@mcmaster.ca wrote: Thanks for all the help Mark! well, ping is not _really_ a benchmark ;) I thought so! :) Lazy person's first shot. Now I will try ethtool. but it does sound like you have interrupt coalescing enabled. Any way to verify if I

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Scott Atchley atch...@myri.com writes: That is odd. I have only tested with Intel e1000 and our myri10ge Ethernet driver. The Intel driver does not let you specify value other than certain settings (0, 25, etc.). I can't remember if I tried it, but it's documented to be adjustable in the

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Scott Atchley atch...@myri.com writes: As Patrick kindly pointed out, you are using rx-frames and not rx- usec. They are not equivalent. That's something I haven't seen. However, I'm only using rx-frames=1 because simply adjusting rx-usec doesn't behave as expected. (It's documented, but

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Patrick Geoffray
Dave Love wrote: That's something I haven't seen. However, I'm only using rx-frames=1 because simply adjusting rx-usec doesn't behave as expected. Instead of rx-usecs being the time between interrupts, it is sometimes implemented as the delay between the the first packet and the following

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Patrick Geoffray
Dave, Scott, Dave Love wrote: Scott Atchley atch...@myri.com writes: When I test Open-MX, I turn interrupt coalescing off. I run omx_pingpong to determine the lowest latency (LL). If the NIC's driver allows one to specify the interrupt value, I set it to LL-1. Note that it is only

[Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Dave Love
Rahul Nabar rpna...@gmail.com writes: I thought so! :) Lazy person's first shot. Now I will try ethtool. It's not relevant with all NICs. Some use driver module parameters. Any way to verify if I do? Consult the NIC's documentation. Ah! So my real latencies are 140/2 = 70 microsecs. I

RE: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Tom Elken
Ah! So my real latencies are 140/2 = 70 microsecs. I see ping times of ~70μs between the nVidias I posted data on, and they have an MPI latency of ~12μs. If you want a measurement/benchmark for MPI performance use IMB, like for the results I posted. In addition to Intel MPI Benchmarks