Re: [OMPI users] Wrong distance calculations in multi-rail setup?

2015-08-28 Thread Rolf vandeVaart
Let me send you a patch off list that will print out some extra information to see if we can figure out where things are going wrong. We basically depend on the information reported by hwloc so the patch will print out some extra information to see if we are getting good data from hwloc.

Re: [OMPI users] MPI_LB in a recursive type

2015-08-28 Thread Roy Stogner
From: George Bosilca First and foremost the two datatype markers (MPI_LB and MPI_UB) have been deprecated from MPI 3.0 for exactly the reason you encountered. Once a datatype is annotated with these markers, they are propagated to all derived types, leading to an

Re: [OMPI users] Wrong distance calculations in multi-rail setup?

2015-08-28 Thread Marcin Krotkiewski
Brilliant! Thank you, Rolf. This works: all ranks have reported using the expected port number, and performance is twice of what I was observing before :) I can certainly live with this workaround, but I will be happy to do some debugging to find the problem. If you tell me what is needed /

Re: [OMPI users] Wrong distance calculations in multi-rail setup?

2015-08-28 Thread Rolf vandeVaart
I am not sure why the distances are being computed as you are seeing. I do not have a dual rail card system to reproduce with. However, short term, I think you could get what you want by running like the following. The first argument tells the selection logic to ignore locality, so both cards

[OMPI users] Wrong distance calculations in multi-rail setup?

2015-08-28 Thread marcin.krotkiewski
I have a 4-socket machine with two dual-port Infiniband cards (devices mlx4_0 and mlx4_1). The cards are conneted to PCI slots of different CPUs (I hope..), both ports are active on both cards, everything connected to the same physical network. I use openmpi-1.10.0 and run the IBM-MPI1