Hmmm...well, the error has changed from your initial report. Turning off the 
firewall was the solution to that problem.

This problem is different - it isn’t the orted that failed in the log you sent, 
but the application proc that couldn’t initialize. It looks like that app was 
compiled against some earlier version of OMPI? It is looking for something that 
no longer exists. I saw that you compiled it with a simple “gcc” instead of our 
wrapper compiler “mpicc” - any particular reason? My guess is that your compile 
picked up some older version of OMPI on the system.

Ralph


> On Jun 17, 2018, at 2:51 PM, Bennet Fauber <ben...@umich.edu> wrote:
> 
> I rebuilt with --enable-debug, then ran with
> 
> [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24
> salloc: Pending job allocation 158
> salloc: job 158 queued and waiting for resources
> salloc: job 158 has been allocated resources
> salloc: Granted job allocation 158
> 
> [bennet@cavium-hpc ~]$ srun ./test_mpi
> The sum = 0.866386
> Elapsed time is:  5.426759
> The sum = 0.866386
> Elapsed time is:  5.424068
> The sum = 0.866386
> Elapsed time is:  5.426195
> The sum = 0.866386
> Elapsed time is:  5.426059
> The sum = 0.866386
> Elapsed time is:  5.423192
> The sum = 0.866386
> Elapsed time is:  5.426252
> The sum = 0.866386
> Elapsed time is:  5.425444
> The sum = 0.866386
> Elapsed time is:  5.423647
> The sum = 0.866386
> Elapsed time is:  5.426082
> The sum = 0.866386
> Elapsed time is:  5.425936
> The sum = 0.866386
> Elapsed time is:  5.423964
> Total time is:  59.677830
> 
> [bennet@cavium-hpc ~]$ mpirun --mca plm_base_verbose 10 ./test_mpi
> 2>&1 | tee debug2.log
> 
> The zipped debug log should be attached.
> 
> I did that after using systemctl to turn off the firewall on the login
> node from which the mpirun is executed, as well as on the host on
> which it runs.
> 
> [bennet@cavium-hpc ~]$ mpirun hostname
> --------------------------------------------------------------------------
> An ORTE daemon has unexpectedly failed after launch and before
> communicating back to mpirun. This could be caused by a number
> of factors, including an inability to create a connection back
> to mpirun due to a lack of common network interfaces and/or no
> route found between them. Please check network connectivity
> (including firewalls and network routing requirements).
> --------------------------------------------------------------------------
> 
> [bennet@cavium-hpc ~]$ squeue
>             JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>               158  standard     bash   bennet  R      14:30      1 cav01
> [bennet@cavium-hpc ~]$ srun hostname
> cav01.arc-ts.umich.edu
> [ repeated 23 more times ]
> 
> As always, your help is much appreciated,
> 
> -- bennet
> 
> On Sun, Jun 17, 2018 at 1:06 PM r...@open-mpi.org <r...@open-mpi.org> wrote:
>> 
>> Add --enable-debug to your OMPI configure cmd line, and then add --mca 
>> plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote 
>> daemon isn’t starting - this will give you some info as to why.
>> 
>> 
>>> On Jun 17, 2018, at 9:07 AM, Bennet Fauber <ben...@umich.edu> wrote:
>>> 
>>> I have a compiled binary that will run with srun but not with mpirun.
>>> The attempts to run with mpirun all result in failures to initialize.
>>> I have tried this on one node, and on two nodes, with firewall turned
>>> on and with it off.
>>> 
>>> Am I missing some command line option for mpirun?
>>> 
>>> OMPI built from this configure command
>>> 
>>> $ ./configure --prefix=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b
>>> --mandir=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/share/man
>>> --with-pmix=/opt/pmix/2.0.2 --with-libevent=external
>>> --with-hwloc=external --with-slurm --disable-dlopen CC=gcc CXX=g++
>>> FC=gfortran
>>> 
>>> All tests from `make check` passed, see below.
>>> 
>>> [bennet@cavium-hpc ~]$ mpicc --show
>>> gcc -I/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/include -pthread
>>> -L/opt/pmix/2.0.2/lib -Wl,-rpath -Wl,/opt/pmix/2.0.2/lib -Wl,-rpath
>>> -Wl,/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib
>>> -Wl,--enable-new-dtags
>>> -L/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib -lmpi
>>> 
>>> The test_mpi was compiled with
>>> 
>>> $ gcc -o test_mpi test_mpi.c -lm
>>> 
>>> This is the runtime library path
>>> 
>>> [bennet@cavium-hpc ~]$ echo $LD_LIBRARY_PATH
>>> /opt/slurm/lib64:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/hpc-utils/lib
>>> 
>>> 
>>> These commands are given in exact sequence in which they were entered
>>> at a console.
>>> 
>>> [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24
>>> salloc: Pending job allocation 156
>>> salloc: job 156 queued and waiting for resources
>>> salloc: job 156 has been allocated resources
>>> salloc: Granted job allocation 156
>>> 
>>> [bennet@cavium-hpc ~]$ mpirun ./test_mpi
>>> --------------------------------------------------------------------------
>>> An ORTE daemon has unexpectedly failed after launch and before
>>> communicating back to mpirun. This could be caused by a number
>>> of factors, including an inability to create a connection back
>>> to mpirun due to a lack of common network interfaces and/or no
>>> route found between them. Please check network connectivity
>>> (including firewalls and network routing requirements).
>>> --------------------------------------------------------------------------
>>> 
>>> [bennet@cavium-hpc ~]$ srun ./test_mpi
>>> The sum = 0.866386
>>> Elapsed time is:  5.425439
>>> The sum = 0.866386
>>> Elapsed time is:  5.427427
>>> The sum = 0.866386
>>> Elapsed time is:  5.422579
>>> The sum = 0.866386
>>> Elapsed time is:  5.424168
>>> The sum = 0.866386
>>> Elapsed time is:  5.423951
>>> The sum = 0.866386
>>> Elapsed time is:  5.422414
>>> The sum = 0.866386
>>> Elapsed time is:  5.427156
>>> The sum = 0.866386
>>> Elapsed time is:  5.424834
>>> The sum = 0.866386
>>> Elapsed time is:  5.425103
>>> The sum = 0.866386
>>> Elapsed time is:  5.422415
>>> The sum = 0.866386
>>> Elapsed time is:  5.422948
>>> Total time is:  59.668622
>>> 
>>> Thanks,    -- bennet
>>> 
>>> 
>>> make check results
>>> ----------------------------------------------
>>> 
>>> make  check-TESTS
>>> make[3]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers'
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers'
>>> PASS: predefined_gap_test
>>> PASS: predefined_pad_test
>>> SKIP: dlopen_test
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 3
>>> # PASS:  2
>>> # SKIP:  1
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> PASS: atomic_cmpset_noinline
>>>   - 5 threads: Passed
>>> PASS: atomic_cmpset_noinline
>>>   - 8 threads: Passed
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 8
>>> # PASS:  8
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/class'
>>> PASS: ompi_rb_tree
>>> PASS: opal_bitmap
>>> PASS: opal_hash_table
>>> PASS: opal_proc_table
>>> PASS: opal_tree
>>> PASS: opal_list
>>> PASS: opal_value_array
>>> PASS: opal_pointer_array
>>> PASS: opal_lifo
>>> PASS: opal_fifo
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 10
>>> # PASS:  10
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> make  opal_thread opal_condition
>>> make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
>>> CC       opal_thread.o
>>> CCLD     opal_thread
>>> CC       opal_condition.o
>>> CCLD     opal_condition
>>> make[3]: Leaving directory `/tmp/build/openmpi-3.1.0/test/threads'
>>> make  check-TESTS
>>> make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 0
>>> # PASS:  0
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/datatype'
>>> PASS: opal_datatype_test
>>> PASS: unpack_hetero
>>> PASS: checksum
>>> PASS: position
>>> PASS: position_noncontig
>>> PASS: ddt_test
>>> PASS: ddt_raw
>>> PASS: unpack_ooo
>>> PASS: ddt_pack
>>> PASS: external32
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 10
>>> # PASS:  10
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/util'
>>> PASS: opal_bit_ops
>>> PASS: opal_path_nfs
>>> PASS: bipartite_graph
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 3
>>> # PASS:  3
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> [ elided ]
>>> make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/dss'
>>> PASS: dss_buffer
>>> PASS: dss_cmp
>>> PASS: dss_payload
>>> PASS: dss_print
>>> ============================================================================
>>> Testsuite summary for Open MPI 3.1.0
>>> ============================================================================
>>> # TOTAL: 4
>>> # PASS:  4
>>> # SKIP:  0
>>> # XFAIL: 0
>>> # FAIL:  0
>>> # XPASS: 0
>>> # ERROR: 0
>>> ============================================================================
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> <debug2.log.gz>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to