Hi,

I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server
12 (x86_64)" with Sun C 5.13  and gcc-6.1.0. Unfortunately I get
a segmentation fault for "--slot-list" for one of my small programs.


loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
absolute:"
      OPAL repo revision: v1.10.2-201-gd23dda8
     C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc


loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:                    1
                  tasks in COMM_CHILD_PROCESSES local group:  1
                  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave




loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:17326] *** Process received signal ***
[loki:17326] Signal: Segmentation fault (11)
[loki:17326] Signal code: Address not mapped (1)
[loki:17326] Failing at address: 0x8
[loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870]
[loki:17326] [ 1] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:17324] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0]
[loki:17326] [ 2] /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08]
[loki:17326] [ 3] *** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[loki:17325] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a]
[loki:17326] [ 4] /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e]
[loki:17326] [ 5] spawn_slave[0x40097e]
[loki:17326] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05]
[loki:17326] [ 7] spawn_slave[0x400a54]
[loki:17326] *** End of error message ***
-------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[56340,2],0]
  Exit code:    1
--------------------------------------------------------------------------
loki spawn 122




I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar

Reply via email to