Thanks Siegmar; I posted this in https://github.com/open-mpi/ompi/issues/1569.
> On Apr 20, 2016, at 1:14 PM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > Hi, > > I have built openmpi-v1.10.2-142-g5cd9490 on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux > 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. Unfortunately I get > runtime errors for some programs. > > > Sun C 5.13: > =========== > > tyr spawn 116 mpiexec -np 1 --host tyr,sunpc1,linpc1,linpc1,ruester > spawn_master > > Parent process 0 running on tyr.informatik.hs-fulda.de > I create 4 slave processes > > Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) > (proc_pointer))->obj_magic_id, file > ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, > function ompi_group_increment_proc_count > [ruester:10077] *** Process received signal *** > [ruester:10077] Signal: Abort (6) > [ruester:10077] Signal code: (-1) > /usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c > /usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0 > /lib/sparcv9/libc.so.1:0xd8c28 > /lib/sparcv9/libc.so.1:0xcc79c > /lib/sparcv9/libc.so.1:0xcc9a8 > /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)] > /lib/sparcv9/libc.so.1:abort+0xd0 > /lib/sparcv9/libc.so.1:_assert_c99+0x78 > /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c > /usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758 > /usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0x113d4 > /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x188c > /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Init+0x26c > /home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x18 > /home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x108 > [ruester:10077] *** End of error message *** > -------------------------------------------------------------------------- > mpiexec noticed that process rank 3 with PID 0 on node ruester exited on > signal 6 (Abort). > -------------------------------------------------------------------------- > > > > > > GCC-5.1.0: > ========== > > tyr spawn 129 mpiexec -np 1 --host ruester,ruester,sunpc1,linpc1,linpc1 > spawn_master > > Parent process 0 running on ruester.informatik.hs-fulda.de > I create 4 slave processes > > [ruester.informatik.hs-fulda.de:09823] [[60617,1],0] ORTE_ERROR_LOG: > Unreachable in file > ../../../../../openmpi-v1.10.2-142-g5cd9490/ompi/mca/dpm/orte/dpm_orte.c at > line 523 > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[60617,1],0]) is on host: ruester > Process 2 ([[0,0],0]) is on host: unknown! > BTLs attempted: tcp self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > [ruester:9823] *** An error occurred in MPI_Comm_spawn > [ruester:9823] *** reported by process [3972595713,0] > [ruester:9823] *** on communicator MPI_COMM_WORLD > [ruester:9823] *** MPI_ERR_INTERN: internal error > [ruester:9823] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will > now abort, > [ruester:9823] *** and potentially your MPI job) > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[60617,1],0] > Exit code: 17 > -------------------------------------------------------------------------- > tyr spawn 130 > > > tyr spawn 133 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester > spawn_multiple_master > > Parent process 0 running on tyr.informatik.hs-fulda.de > I create 3 slave processes. > > Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) > (proc_pointer))->obj_magic_id, file > ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, > function ompi_group_increment_proc_count > [ruester:09954] *** Process received signal *** > [ruester:09954] Signal: Abort (6) > [ruester:09954] Signal code: (-1) > /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c > /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c > /lib/sparcv9/libc.so.1:0xd8c28 > /lib/sparcv9/libc.so.1:0xcc79c > /lib/sparcv9/libc.so.1:0xcc9a8 > /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)] > /lib/sparcv9/libc.so.1:abort+0xd0 > /lib/sparcv9/libc.so.1:_assert_c99+0x78 > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0 > /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638 > /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978 > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8 > /home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10 > /home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c > [ruester:09954] *** End of error message *** > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 0 on node ruester exited on > signal 6 (Abort). > -------------------------------------------------------------------------- > tyr spawn 134 > > > > I would be grateful if somebody can fix the problems. Thank you very > much for any help in advance. > > > Kind regards > > Siegmar > <spawn_master.c><spawn_multiple_master.c>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28984.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/