Let me send you a patch off list that will print out some extra information to
see if we can figure out where things are going wrong.
We basically depend on the information reported by hwloc so the patch will
print out some extra information to see if we are getting good data from hwloc.
From: George Bosilca
First and foremost the two datatype markers (MPI_LB and MPI_UB) have been
deprecated from MPI 3.0 for exactly the reason you encountered. Once a
datatype is annotated with these markers, they are propagated to all
derived types, leading to an
Brilliant! Thank you, Rolf. This works: all ranks have reported using
the expected port number, and performance is twice of what I was
observing before :)
I can certainly live with this workaround, but I will be happy to do
some debugging to find the problem. If you tell me what is needed /
I am not sure why the distances are being computed as you are seeing. I do not
have a dual rail card system to reproduce with. However, short term, I think
you could get what you want by running like the following. The first argument
tells the selection logic to ignore locality, so both cards
I have a 4-socket machine with two dual-port Infiniband cards (devices
mlx4_0 and mlx4_1). The cards are conneted to PCI slots of different
CPUs (I hope..), both ports are active on both cards, everything
connected to the same physical network.
I use openmpi-1.10.0 and run the IBM-MPI1