so this isn't really an OpenMPI questions (I don't think), but you guys will have hit the problem if anyone has...
basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR when I use different kernels. I'm testing with netpipe-3.6.2's NPmpi, but a home-grown pingpong sees the same thing. the default 2.6.9-42.0.3.ELsmp (and also sles10's kernel) gives ok bandwidth (50% of peak I guess is good?) at ~10 Gbit/s, but a pile of newer kernels (2.16.19.2, 2.6.20-rc4, 2.6.18-1.2732.4.2.el5.OFED_1_1(*)) all max out at ~5.3 Gbit/s. half the bandwidth! :-( latency is the same. the same OpenMPI (1.1.1 from OSCAR, rebuild for openib support) and NPmpi was used with all kernels. I see an intermediate bandwidth if one kernel is the 'fast' 2.6.9 and another is a 'slow', so they don't appear to be using completely different protocols. it doesn't make any difference if I try to make extra-sure it's using openib with: mpirun --mca btl openib --mca btl_tcp_if_exclude lo,eth0 ... OS is CentOS 4.4 x86_64 which AFAICT includes packages based on OFED 1.0. lspci says the PCIe card is: InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) and dmesg says that all kernels are using ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) but also winges that 'HCA FW version 1.0.700 is old'. any ideas? very odd that all new kernels (including for RHEL5) are slow. will OFED 1.1 make any difference? it didn't build cleanly when I tried, but I can try and try again... thanks for any hints. cheers, robin (*) rhel5 + OFED 1.1 test kernel, rebuilt for centos4.4 from src.rpm at http://people.redhat.com/dledford/Infiniband/kernel/2.6.18/1.2732.4.2.el5.OFED_1_1/x86_64/