Le 01/09/2015 15:59, marcin.krotkiewski a écrit :
> Dear Rolf and Brice,
>
> Thank you very much for your help. I have now moved the 'dubious' IB
> card from Slot 1 to Slot 5. It is now reported by hwloc as bound to a
> separate NUMA node. In this case OpenMPI works as could be expected:
>
> - NUM
Hello
It's a float because we normalize to 1 on the diagonal (some AMD
machines have values like 10 on the diagonal and 16 or 22 otherwise, so
you ge 1.0, 1.6 or 2.2 after normalization), and also because some users
wanted to specify their own distance matrix.
I'd like to cleanup the distance API
Brice,
as a side note, what is the rationale for defining the distance as a
floating point number ?
i remember i had to fix a bug in ompi a while ago
/* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */
Cheers,
Gilles
On 9/1/2015 5:28 AM, Brice Goglin wrote:
The locality is mlx4_0 as
The locality is mlx4_0 as reported by lstopo is "near the entire
machine" (while mlx4_1 is reported near NUMA node #3). I would vote for
buggy PCI-NUMA affinity being reported by the BIOS. But I am not very
familiar with 4x E5-4600 machines so please make sure this PCI slot is
really attached to a
What is the output of /sbin/lspci -tv?
On Aug 31, 2015, at 4:06 PM, Rolf vandeVaart wrote:
> There was a problem reported on the User's list about Open MPI always picking
> one Mellanox card when they were two in the machine.
>
> http://www.open-mpi.org/community/lists/users/2015/08/27507.php
There was a problem reported on the User's list about Open MPI always picking
one Mellanox card when they were two in the machine.
http://www.open-mpi.org/community/lists/users/2015/08/27507.php
We dug a little deeper and I think this has to do with how hwloc is figuring
out where one of the