Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
This is on an ARM processor? I suspect that is the root of the problems as we aren’t seeing anything like this elsewhere. > On Jun 18, 2018, at 1:27 PM, Bennet Fauber wrote: > > If it's of any use, 3.0.0 seems to hang at > > Making check in class > make[2]: Entering directory `/tmp/build/open

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
If it's of any use, 3.0.0 seems to hang at Making check in class make[2]: Entering directory `/tmp/build/openmpi-3.0.0/test/class' make ompi_rb_tree opal_bitmap opal_hash_table opal_proc_table opal_tree opal_list opal_value_array opal_pointer_array opal_lifo opal_fifo make[3]: Entering directory

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
No such luck. If it matters, mpirun does seem to work with processes on the local node that have no internal MPI code. That is, [bennet@cavium-hpc ~]$ mpirun -np 4 hello Hello, ARM Hello, ARM Hello, ARM Hello, ARM but it fails with a similar error if run while a SLURM job is active; i.e., [ben

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
I doubt Slurm is the issue. For grins, lets try adding “--mca plm rsh” to your mpirun cmd line and see if that works. > On Jun 18, 2018, at 12:57 PM, Bennet Fauber wrote: > > To eliminate possibilities, I removed all other versions of OpenMPI > from the system, and rebuilt using the same build

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread Bennet Fauber
To eliminate possibilities, I removed all other versions of OpenMPI from the system, and rebuilt using the same build script as was used to generate the prior report. [bennet@cavium-hpc bennet]$ ./ompi-3.1.0bd.sh Checking compilers and things OMPI is ompi COMP_NAME is gcc_7_1_0 SRC_ROOT is /sw/arc

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
Hmmm...well, the error has changed from your initial report. Turning off the firewall was the solution to that problem. This problem is different - it isn’t the orted that failed in the log you sent, but the application proc that couldn’t initialize. It looks like that app was compiled against