Dear OMPI Community: I have a modest personal cluster (3 node, 6 processor Opterons - all single core, two are 242's and 4 are 844's - each machine has 4 Gb of RAM) over gigabit (unmanaged switch) that I put together to run computational chemistry projects for my doctoral studies. I'm using the 844's as dual processors because I got a good deal on the lot of the 4 844 chips.
The 844 based systems are on a Arima / Rioworks HDAMA motherboard - the RAM is configured as 2 @ 2 Gb sticks in cpu 0 DIMM 0 and 1 locations (to use a consistant numbering scheme - the motherboard manual calls them cpu 0 and 1, but then DIMM 1 - 4 for each cpu - going by this the DIMMs are in slots 1 and 2 of the cpu 0 bank). The 242 based system is on a Tyan 2875 motherboard configured as 1 Gb stick in each of the four slots of the one bank of DIMM slots. I am running OpenSUSE 10.2 on each system. I did some benchmarking of the same executable running the same job on just the 242 system (using both processors) versus the entire cluster. The program (CPMD, www.cpmd.org) reports cpu time and elapsed time. I'm reporting the times below in hours:minutes, rounding to the nearest minute. I trust that everyone will agree that it is insignificant if I inadvertently truncated instead of rounded some of the minutes. For just the one system with two processors: CPU time: 32:43 Elapsed time: 36:52 Peak memory: 373 Mb For just the cluster: CPU time: 12:23 Elapsed time: 20:30 Peak memory: 131 Mb Is this a typical scaling or should I be thinking about doing some sort of tweaking to the [network / ompi] system at some point? The cpu time is scaling about right, but elapsed time is getting hammered - with the low memory overhead it has to be a communications issue rather than a swap issue, right? Would it be helpful to see a serial time point using the same executable (if so, I'd probably repeat all the runs with a smaller job - I don't know that I want to spend half a week just for benchmarking)? I have included the appropriate btl_tcp_if_include configuration so that OMPI only uses the gigabit ports (and not the internet connections that some of the machines have). I am already planning on doing some benchmark comparisons to determine the effect of compiler / math library on speed. Thank you, Mark Kosmowski