Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-31 Thread Andy Riebs
As always, thanks for your help Ralph! Cutting over to PMIx 1.2.4 solved the problem for me. (Slurm wasn't happy building with PMIx v2.) And yes, I had ssh access to node04. (And Gilles, thanks for your note, as well.) Andy On 10/27/2017 04:31 PM, r...@open-mpi.org wrote: Two questions:

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-29 Thread Gilles Gouaillardet
Andy, The crash occurs in the orted daemon and not in the mpi_hello MPI app, so you will not see anything useful in gdb. you can use the attached launch agent script in order to get a stack trace of orted. your mpirun command line should be updated like this mpirun --mca

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest

[OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread Andy Riebs
We have built a version of Open MPI 3.0.x that works with Slurm (our primary use case), but it fails when executed without Slurm. If I srun an MPI "hello world" program, it works just fine. Likewise, if I salloc a couple of nodes and use mpirun from there, life is good. But if I just try to