Hi, I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler 9.1. I get this error message during an MPI_Alltoall call:
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1cd04fe0 [0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75] [1] func:/lib64/libpthread.so.0 [0x2b569739b140] [2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) [0x2b5697278cf0] *** End of error message *** and have no idea about the problem. It arises if I exceed a specific number (10) of MPI nodes. The error occures in this code: do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do this code is said to be running on another system (running IBGD 1.8.x). I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar error message when using 13 nodes: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libpthread.so.0 00002B65DA39B140 Unknown Unknown Unknown main.out 0000000000448BDB Unknown Unknown Unknown [9] Registration failed, file : intra_rdma_alltoall.c, line : 163 [6] Registration failed, file : intra_rdma_alltoall.c, line : 163 9 - MPI_ALLTOALL : Unknown error [9] [] Aborting Program! 6 - MPI_ALLTOALL : Unknown error [6] [] Aborting Program! [2] Registration failed, file : intra_rdma_alltoall.c, line : 163 [11] Registration failed, file : intra_rdma_alltoall.c, line : 163 11 - MPI_ALLTOALL : Unknown error [11] [] Aborting Program! 2 - MPI_ALLTOALL : Unknown error [2] [] Aborting Program! [10] Registration failed, file : intra_rdma_alltoall.c, line : 163 10 - MPI_ALLTOALL : Unknown error [10] [] Aborting Program! [5] Registration failed, file : intra_rdma_alltoall.c, line : 163 5 - MPI_ALLTOALL : Unknown error [5] [] Aborting Program! [3] Registration failed, file : intra_rdma_alltoall.c, line : 163 [8] Registration failed, file : intra_rdma_alltoall.c, line : 163 3 - MPI_ALLTOALL : Unknown error [3] [] Aborting Program! 8 - MPI_ALLTOALL : Unknown error [8] [] Aborting Program! [4] Registration failed, file : intra_rdma_alltoall.c, line : 163 4 - MPI_ALLTOALL : Unknown error [4] [] Aborting Program! [7] Registration failed, file : intra_rdma_alltoall.c, line : 163 7 - MPI_ALLTOALL : Unknown error [7] [] Aborting Program! [0] Registration failed, file : intra_rdma_alltoall.c, line : 163 0 - MPI_ALLTOALL : Unknown error [0] [] Aborting Program! [1] Registration failed, file : intra_rdma_alltoall.c, line : 163 1 - MPI_ALLTOALL : Unknown error [1] [] Aborting Program! I don't know wether this is a problem with MPI or Intel Compiler. Please, can anybody point me in the right direction what I could have done wrong? This is my first post (so be gentle) and at this time I'm not very used to the verbosity of this list, so if you need any further informations do not hesitate do request them. Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax: +49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/