Re: [OMPI devel] mapper issue with heterogeneous topologies

2017-05-31 Thread r...@open-mpi.org
I don’t believe we check topologies prior to making that decision - this is why 
we provide map-by options. Seems to me that this oddball setup has a simple 
solution - all he has to do is set a mapping policy for that environment. Can 
even be done in the default mca param file.

I wouldn’t modify the code for these corner cases as it is just as likely to 
introduce errors

> On May 31, 2017, at 5:46 PM, Gilles Gouaillardet  wrote:
> 
> Hi Ralph,
> 
> 
> this is a follow-up on Siegmar's post that started at 
> https://www.mail-archive.com/users@lists.open-mpi.org/msg31177.html
> 
> 
>> mpiexec -np 3 --host loki:2,exin hello_1_mpi
>> --
>> There are not enough slots available in the system to satisfy the 3 slots
>> that were requested by the application:
>>   hello_1_mpi
>> 
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --
> 
> 
> loki is a physical machine with 2 NUMA, 2 sockets, ...
> 
> *but* exin is a virtual machine with *no* NUMA, 2 sockets, ...
> 
> 
> my guess is that mpirun is able to find some NUMA objects on 'loki', so it 
> uses the default mapping policy
> 
> (aka --map-by numa). unfortunatly exin has no NUMA objects, and mpirun fails 
> with an error message
> 
> that is hard to interpret.
> 
> 
> as a workaround, it is possible to
> 
> mpirun --map-by socket
> 
> 
> so if i understand and remember correctly, mpirun should make the decision to 
> map by numa *after* it receives the topology from exin and not before.
> 
> does that make sense ?
> 
> can you please take care of that ?
> 
> 
> fwiw, i ran
> 
> lstopo --of xml > /tmp/topo.xml
> 
> on two nodes, and manually remove the NUMANode and Bridge objects from the 
> topology of the second node, and then
> 
> mpirun --mca --mca hwloc_base_topo_file /tmp/topo.xml --host n0:2,n1 -np 3 
> hostname
> 
> in order to reproduce the issue.
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] mapper issue with heterogeneous topologies

2017-05-31 Thread Gilles Gouaillardet

Hi Ralph,


this is a follow-up on Siegmar's post that started at 
https://www.mail-archive.com/users@lists.open-mpi.org/msg31177.html




mpiexec -np 3 --host loki:2,exin hello_1_mpi
--
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
   hello_1_mpi

Either request fewer slots for your application, or make more slots available
for use.
--



loki is a physical machine with 2 NUMA, 2 sockets, ...

*but* exin is a virtual machine with *no* NUMA, 2 sockets, ...


my guess is that mpirun is able to find some NUMA objects on 'loki', so 
it uses the default mapping policy


(aka --map-by numa). unfortunatly exin has no NUMA objects, and mpirun 
fails with an error message


that is hard to interpret.


as a workaround, it is possible to

mpirun --map-by socket


so if i understand and remember correctly, mpirun should make the 
decision to map by numa *after* it receives the topology from exin and 
not before.


does that make sense ?

can you please take care of that ?


fwiw, i ran

lstopo --of xml > /tmp/topo.xml

on two nodes, and manually remove the NUMANode and Bridge objects from 
the topology of the second node, and then


mpirun --mca --mca hwloc_base_topo_file /tmp/topo.xml --host n0:2,n1 -np 
3 hostname


in order to reproduce the issue.


Cheers,


Gilles

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel