Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread Gilles Gouaillardet
Bennet, my guess is mapping/binding to sockets was deemed the best compromise from an "out of the box" performance point of view. iirc, we did fix some bugs that occured when running under asymmetric cpusets/cgroups. if you still have some issues with the latest Open MPI version (2.0.1)

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread Bennet Fauber
Pardon my naivete, but why is bind-to-none not the default, and if the user wants to specify something, they can then get into trouble knowingly? We have had all manner of problems with binding when using cpusets/cgroups. -- bennet On Thu, Sep 29, 2016 at 9:52 PM, Gilles Gouaillardet

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread Gilles Gouaillardet
David, i guess you would have expected the default mapping/binding scheme is core instead of sockets iirc, we decided *not* to bind to cores by default because it is "safer" if you simply OMP_NUM_THREADS=8 mpirun -np 2 a.out then, a default mapping/binding scheme by core means the OpenMP

Re: [OMPI users] openmpi 2.1 large messages

2016-09-29 Thread Gilles Gouaillardet
Rick, can you please provide some more information : - Open MPI version - interconnect used - number of tasks / number of nodes - does the hang occur in the first MPI_Bcast of 8000 bytes ? note there is a known issue if you MPI_Bcast with different but matching signatures (e.g. some

[OMPI users] openmpi 2.1 large messages

2016-09-29 Thread Marlborough, Rick
Folks; I am attempting to set up a task that sends large messages via MPI_Bcast api. I am finding that small message work ok, anything less then 8000 bytes. Anything more than this then the whole scenario hangs with most of the worker processes pegged at 100% cpu usage. Tried

[OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread David Shrader
Hello All, Would anyone know why the default mapping scheme is socket for jobs with more than 2 ranks? Would they be able to please take some time and explain the reasoning? Please note I am not railing against the decision, but rather trying to gather as much information about it as I can

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Cabral, Matias A
Hi Giles et.al., You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process support was/is not working in OMPI when using PSM2 because of an issue related to the transport keys. This was fixed in PR #1602 (https://github.com/open-mpi/ompi/pull/1602) and should be included in

[OMPI users] MPI_Comm_spawn

2016-09-29 Thread juraj2...@gmail.com
The solution was to use the "tcp", "sm" and "self" BTLs for the transport of MPI messages, with TCP restricting only the eth0 interface to communicate and using ob1 as p2p management layer: mpirun --mca btl_tcp_if_include eth0 --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Ah, that may be why it wouldn’t show up in the OMPI code base itself. If that is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It is fixed in the upcoming 2.0.2 > On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet > wrote: > > Ralph, >

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Gilles Gouaillardet
Ralph, My guess is that ptl.c comes from PSM lib ... Cheers, Gilles On Thursday, September 29, 2016, r...@open-mpi.org wrote: > Spawn definitely does not work with srun. I don’t recognize the name of > the file that segfaulted - what is “ptl.c”? Is that in your manager

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Spawn definitely does not work with srun. I don’t recognize the name of the file that segfaulted - what is “ptl.c”? Is that in your manager program? > On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet > wrote: > > Hi, > > I do not expect spawn can work with

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Gilles Gouaillardet
Hi, I do not expect spawn can work with direct launch (e.g. srun) Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure Can you please try mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1 and see if it help ? Note if you have the

[OMPI users] MPI_Comm_spawn

2016-09-29 Thread juraj2...@gmail.com
Hello, I am using MPI_Comm_spawn to dynamically create new processes from single manager process. Everything works fine when all the processes are running on the same node. But imposing restriction to run only a single process per node does not work. Below are the errors produced during multinode