Thanks Siegmar; I posted this in https://github.com/open-mpi/ompi/issues/1569.

> On Apr 20, 2016, at 1:14 PM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> Hi,
> 
> I have built openmpi-v1.10.2-142-g5cd9490 on my machines
> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
> 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. Unfortunately I get
> runtime errors for some programs.
> 
> 
> Sun C 5.13:
> ===========
> 
> tyr spawn 116 mpiexec -np 1 --host tyr,sunpc1,linpc1,linpc1,ruester 
> spawn_master
> 
> Parent process 0 running on tyr.informatik.hs-fulda.de
>  I create 4 slave processes
> 
> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) 
> (proc_pointer))->obj_magic_id, file 
> ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, 
> function ompi_group_increment_proc_count
> [ruester:10077] *** Process received signal ***
> [ruester:10077] Signal: Abort (6)
> [ruester:10077] Signal code:  (-1)
> /usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c
> /usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0
> /lib/sparcv9/libc.so.1:0xd8c28
> /lib/sparcv9/libc.so.1:0xcc79c
> /lib/sparcv9/libc.so.1:0xcc9a8
> /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)]
> /lib/sparcv9/libc.so.1:abort+0xd0
> /lib/sparcv9/libc.so.1:_assert_c99+0x78
> /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c
> /usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758
> /usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0x113d4
> /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x188c
> /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Init+0x26c
> /home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x18
> /home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x108
> [ruester:10077] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 3 with PID 0 on node ruester exited on 
> signal 6 (Abort).
> --------------------------------------------------------------------------
> 
> 
> 
> 
> 
> GCC-5.1.0:
> ==========
> 
> tyr spawn 129 mpiexec -np 1 --host ruester,ruester,sunpc1,linpc1,linpc1 
> spawn_master
> 
> Parent process 0 running on ruester.informatik.hs-fulda.de
>  I create 4 slave processes
> 
> [ruester.informatik.hs-fulda.de:09823] [[60617,1],0] ORTE_ERROR_LOG: 
> Unreachable in file 
> ../../../../../openmpi-v1.10.2-142-g5cd9490/ompi/mca/dpm/orte/dpm_orte.c at 
> line 523
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>  Process 1 ([[60617,1],0]) is on host: ruester
>  Process 2 ([[0,0],0]) is on host: unknown!
>  BTLs attempted: tcp self
> 
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> [ruester:9823] *** An error occurred in MPI_Comm_spawn
> [ruester:9823] *** reported by process [3972595713,0]
> [ruester:9823] *** on communicator MPI_COMM_WORLD
> [ruester:9823] *** MPI_ERR_INTERN: internal error
> [ruester:9823] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [ruester:9823] ***    and potentially your MPI job)
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[60617,1],0]
>  Exit code:    17
> --------------------------------------------------------------------------
> tyr spawn 130
> 
> 
> tyr spawn 133 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester 
> spawn_multiple_master
> 
> Parent process 0 running on tyr.informatik.hs-fulda.de
>  I create 3 slave processes.
> 
> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) 
> (proc_pointer))->obj_magic_id, file 
> ../../openmpi-v1.10.2-142-g5cd9490/ompi/group/group_init.c, line 215, 
> function ompi_group_increment_proc_count
> [ruester:09954] *** Process received signal ***
> [ruester:09954] Signal: Abort (6)
> [ruester:09954] Signal code:  (-1)
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c
> /lib/sparcv9/libc.so.1:0xd8c28
> /lib/sparcv9/libc.so.1:0xcc79c
> /lib/sparcv9/libc.so.1:0xcc9a8
> /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
> /lib/sparcv9/libc.so.1:abort+0xd0
> /lib/sparcv9/libc.so.1:_assert_c99+0x78
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0
> /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638
> /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8
> /home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10
> /home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c
> [ruester:09954] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 0 on node ruester exited on 
> signal 6 (Abort).
> --------------------------------------------------------------------------
> tyr spawn 134
> 
> 
> 
> I would be grateful if somebody can fix the problems. Thank you very
> much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> <spawn_master.c><spawn_multiple_master.c>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/04/28984.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to