Hi Howard/OpenMPI Users,

I have had a similar seg-fault this week using OpenMPI 2.1.1 with GCC 4.9.3 so 
I tried to compile the example code in the email below. I see similar behavior 
to a small benchmark we have in house (but using inc not finc).

When I run on a single node (both PE’s on the same node) I get the error below. 
But, if I run on multiple nodes (say 2 nodes with one PE per node) then the 
code runs fine. Same thing for my benchmark which uses shmem_longlong_inc. For 
reference, we are using InfiniBand on our cluster and dual-socket Haswell 
processors.

Hope that helps,

S.

$ shmemrun -n 2 ./testfinc
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: shepard-lsm1
--------------------------------------------------------------------------
[shepard-lsm1:49505] *** Process received signal ***
[shepard-lsm1:49505] Signal: Segmentation fault (11)
[shepard-lsm1:49505] Signal code: Address not mapped (1)
[shepard-lsm1:49505] Failing at address: 0x18
[shepard-lsm1:49505] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7ffc4cd9e710]
[shepard-lsm1:49505] [ 1] 
/home/projects/x86-64-haswell/openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_spml_yoda.so(mca_spml_yoda_get+0x86d)[0x7ffc337cf37d]
[shepard-lsm1:49505] [ 2] 
/home/projects/x86-64-haswell/openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_atomic_basic.so(atomic_basic_lock+0x9a)[0x7ffc32f190aa]
[shepard-lsm1:49505] [ 3] 
/home/projects/x86-64-haswell/openmpi/2.1.1/gcc/4.9.3/lib/openmpi/mca_atomic_basic.so(mca_atomic_basic_fadd+0x39)[0x7ffc32f19409]
[shepard-lsm1:49505] [ 4] 
/home/projects/x86-64-haswell/openmpi/2.1.1/gcc/4.9.3/lib/liboshmem.so.20(shmem_int_fadd+0x80)[0x7ffc4d2fc110]
[shepard-lsm1:49505] [ 5] ./testfinc[0x400888]
[shepard-lsm1:49505] [ 6] 
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7ffc4ca19d5d]
[shepard-lsm1:49505] [ 7] ./testfinc[0x400739]
[shepard-lsm1:49505] *** End of error message ***
--------------------------------------------------------------------------
shmemrun noticed that process rank 1 with PID 0 on node shepard-lsm1 exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[shepard-lsm1:49499] 1 more process has sent help message 
help-mpi-btl-openib.txt / no active ports found
[shepard-lsm1:49499] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
all help / error messages

--
Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA


From: users <users-boun...@lists.open-mpi.org> on behalf of Howard Pritchard 
<hpprit...@gmail.com>
Reply-To: Open MPI Users <users@lists.open-mpi.org>
Date: Monday, November 20, 2017 at 4:11 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: [EXTERNAL] Re: [OMPI users] Using shmem_int_fadd() in OpenMPI's SHMEM

HI Ben,

What version of Open MPI are you trying to use?

Also, could you describe something about your system.  If its a cluster
what sort of interconnect is being used.

Howard


2017-11-20 14:13 GMT-07:00 Benjamin Brock 
<br...@cs.berkeley.edu<mailto:br...@cs.berkeley.edu>>:
What's the proper way to use shmem_int_fadd() in OpenMPI's SHMEM?

A minimal example seems to seg fault:

#include <cstdlib>
#include <cstdio>

#include <mpp/shmem.h>

int main(int argc, char **argv) {
  shmem_init();
  const size_t shared_segment_size = 1024;
  void *shared_segment = shmem_malloc(shared_segment_size);

  int *arr = (int *) shared_segment;
  int *local_arr = (int *) malloc(sizeof(int) * 10);

  if (shmem_my_pe() == 1) {
    shmem_int_fadd((int *) shared_segment, 1, 0);
  }
  shmem_barrier_all();

  return 0;
}

Where am I going wrong here?  This sort of thing works in Cray SHMEM.

Ben Bock

_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to