Add --enable-debug to your OMPI configure cmd line, and then add --mca plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote daemon isn’t starting - this will give you some info as to why.
> On Jun 17, 2018, at 9:07 AM, Bennet Fauber <ben...@umich.edu> wrote: > > I have a compiled binary that will run with srun but not with mpirun. > The attempts to run with mpirun all result in failures to initialize. > I have tried this on one node, and on two nodes, with firewall turned > on and with it off. > > Am I missing some command line option for mpirun? > > OMPI built from this configure command > > $ ./configure --prefix=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b > --mandir=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/share/man > --with-pmix=/opt/pmix/2.0.2 --with-libevent=external > --with-hwloc=external --with-slurm --disable-dlopen CC=gcc CXX=g++ > FC=gfortran > > All tests from `make check` passed, see below. > > [bennet@cavium-hpc ~]$ mpicc --show > gcc -I/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/include -pthread > -L/opt/pmix/2.0.2/lib -Wl,-rpath -Wl,/opt/pmix/2.0.2/lib -Wl,-rpath > -Wl,/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib > -Wl,--enable-new-dtags > -L/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib -lmpi > > The test_mpi was compiled with > > $ gcc -o test_mpi test_mpi.c -lm > > This is the runtime library path > > [bennet@cavium-hpc ~]$ echo $LD_LIBRARY_PATH > /opt/slurm/lib64:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/hpc-utils/lib > > > These commands are given in exact sequence in which they were entered > at a console. > > [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24 > salloc: Pending job allocation 156 > salloc: job 156 queued and waiting for resources > salloc: job 156 has been allocated resources > salloc: Granted job allocation 156 > > [bennet@cavium-hpc ~]$ mpirun ./test_mpi > -------------------------------------------------------------------------- > An ORTE daemon has unexpectedly failed after launch and before > communicating back to mpirun. This could be caused by a number > of factors, including an inability to create a connection back > to mpirun due to a lack of common network interfaces and/or no > route found between them. Please check network connectivity > (including firewalls and network routing requirements). > -------------------------------------------------------------------------- > > [bennet@cavium-hpc ~]$ srun ./test_mpi > The sum = 0.866386 > Elapsed time is: 5.425439 > The sum = 0.866386 > Elapsed time is: 5.427427 > The sum = 0.866386 > Elapsed time is: 5.422579 > The sum = 0.866386 > Elapsed time is: 5.424168 > The sum = 0.866386 > Elapsed time is: 5.423951 > The sum = 0.866386 > Elapsed time is: 5.422414 > The sum = 0.866386 > Elapsed time is: 5.427156 > The sum = 0.866386 > Elapsed time is: 5.424834 > The sum = 0.866386 > Elapsed time is: 5.425103 > The sum = 0.866386 > Elapsed time is: 5.422415 > The sum = 0.866386 > Elapsed time is: 5.422948 > Total time is: 59.668622 > > Thanks, -- bennet > > > make check results > ---------------------------------------------- > > make check-TESTS > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers' > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers' > PASS: predefined_gap_test > PASS: predefined_pad_test > SKIP: dlopen_test > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 3 > # PASS: 2 > # SKIP: 1 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > PASS: atomic_cmpset_noinline > - 5 threads: Passed > PASS: atomic_cmpset_noinline > - 8 threads: Passed > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 8 > # PASS: 8 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/class' > PASS: ompi_rb_tree > PASS: opal_bitmap > PASS: opal_hash_table > PASS: opal_proc_table > PASS: opal_tree > PASS: opal_list > PASS: opal_value_array > PASS: opal_pointer_array > PASS: opal_lifo > PASS: opal_fifo > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 10 > # PASS: 10 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > make opal_thread opal_condition > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > CC opal_thread.o > CCLD opal_thread > CC opal_condition.o > CCLD opal_condition > make[3]: Leaving directory `/tmp/build/openmpi-3.1.0/test/threads' > make check-TESTS > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 0 > # PASS: 0 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/datatype' > PASS: opal_datatype_test > PASS: unpack_hetero > PASS: checksum > PASS: position > PASS: position_noncontig > PASS: ddt_test > PASS: ddt_raw > PASS: unpack_ooo > PASS: ddt_pack > PASS: external32 > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 10 > # PASS: 10 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/util' > PASS: opal_bit_ops > PASS: opal_path_nfs > PASS: bipartite_graph > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 3 > # PASS: 3 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > [ elided ] > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/dss' > PASS: dss_buffer > PASS: dss_cmp > PASS: dss_payload > PASS: dss_print > ============================================================================ > Testsuite summary for Open MPI 3.1.0 > ============================================================================ > # TOTAL: 4 > # PASS: 4 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > ============================================================================ > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users