Nathan,

I am trying to track down some memory corruption that leads to crashes in my application running on the Cray system using Open MPI (git-6093f2d). Valgrind reports quite some invalid reads and writes inside Open MPI when running the benchmark that I sent you earlier.

There are plenty of invalid reads in MPI_Init and MPI_Win_allocate. Valgrind also reports some invalid writes during communication:

```
==42751== Invalid write of size 8
==42751== at 0x94C647D: GNII_POST_FMA_GET (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0) ==42751== by 0x94C8D74: GNI_PostFma (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0) ==42751== by 0x10FA21D0: mca_btl_ugni_get (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so) ==42751== by 0x134AF6C5: ompi_osc_get_data_blocking (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so) ==42751== by 0x134D0CC4: ompi_osc_rdma_peer_lookup (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so) ==42751== by 0x134B4A1F: ompi_osc_rdma_rget (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so) ==42751== by 0x46C1D52: PMPI_Rget (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0) ==42751== by 0x20001EA9: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop) ==42751== Address 0x2aaaaabc0000 is not stack'd, malloc'd or (recently) free'd

==42751== Invalid write of size 8
==42751== at 0x94D76BC: GNII_SmsgSend (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0) ==42751== by 0x94D9D5C: GNI_SmsgSendWTag (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0) ==42751== by 0x10F9D9E6: mca_btl_ugni_sendi (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so) ==42751== by 0x11BE5DDF: mca_pml_ob1_isend (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_pml_ob1.so) ==42751== by 0x1201DC40: NBC_Progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so) ==42751== by 0x1201DC91: NBC_Progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so) ==42751== by 0x1201C692: ompi_coll_libnbc_progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so) ==42751== by 0x631A503: opal_progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x632111C: ompi_sync_wait_mt (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x4669A4C: ompi_comm_nextcid (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0) ==42751== by 0x4667ECC: ompi_comm_dup_with_info (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0) ==42751== by 0x134C15AE: ompi_osc_rdma_component_select (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so) ==42751== Address 0x2aaaaabaf000 is not stack'd, malloc'd or (recently) free'd

And some write-after-free during MPI_Finalize:
==42751== Invalid write of size 8
==42751== at 0x6316E64: opal_rb_tree_delete (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x1076BA03: mca_mpool_hugepage_seg_free (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so) ==42751== by 0x1015EB33: mca_allocator_bucket_cleanup (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so) ==42751== by 0x1015DF5C: mca_allocator_bucket_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so) ==42751== by 0x1076BAE6: mca_mpool_hugepage_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so) ==42751== by 0x1076C202: mca_mpool_hugepage_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so) ==42751== by 0x633CED9: mca_base_component_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x633CE01: mca_base_components_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x63C6F31: mca_mpool_base_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x634AEF7: mca_base_framework_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x4687B6A: ompi_mpi_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0) ==42751== by 0x20001F35: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop) ==42751== Address 0xa3aa348 is 16,440 bytes inside a block of size 16,568 free'd
==42751==    at 0x4428CDA: free (vg_replace_malloc.c:530)
==42751== by 0x630FED2: opal_free_list_destruct (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x63160C1: opal_rb_tree_destruct (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x1076BACE: mca_mpool_hugepage_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so) ==42751== by 0x1076C202: mca_mpool_hugepage_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so) ==42751== by 0x633CED9: mca_base_component_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x633CE01: mca_base_components_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x63C6F31: mca_mpool_base_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x634AEF7: mca_base_framework_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0) ==42751== by 0x4687B6A: ompi_mpi_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0) ==42751== by 0x20001F35: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
```

I'm not sure whether the invalid writes (and reads) during initialization and communication are caused by Open MPI or uGNI itself and whether they are critical (the addresses seem to be "special"). The write-after-free in MPI_Finalize seems suspicious though. I cannot say whether that causes the memory corruption I am seeing but I thought I report it. I will dig further into this to try to figure out what causes the crashes (they are not deterministically reproducible, unfortunately).

Cheers,
Joseph

On 05/10/2018 03:24 AM, Nathan Hjelm wrote:
Thanks for confirming that it works for you as well. I have a PR open on v3.1.x 
that brings osc/rdma up to date with master. I will also be bringing some code 
that greatly improves the multi-threaded RMA performance on Aries systems (at 
least with benchmarks— github.com/hpc/rma-mt). That will not make it into 
v3.1.x but will be in v4.0.0.

-Nathan

On May 9, 2018, at 1:26 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:

Nathan,

Thank you, I can confirm that it works as expected with master on our system. I 
will stick to this version then until 3.1.1 is out.

Joseph

On 05/08/2018 05:34 PM, Nathan Hjelm wrote:
Looks like it doesn't fail with master so at some point I fixed this bug. The 
current plan is to bring all the master changes into v3.1.1. This includes a 
number of bug fixes.
-Nathan
On May 08, 2018, at 08:25 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:
Nathan,

Thanks for looking into that. My test program is attached.

Best
Joseph

On 05/08/2018 02:56 PM, Nathan Hjelm wrote:
I will take a look today. Can you send me your test program?

-Nathan

On May 8, 2018, at 2:49 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:

All,

I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 
(Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. 
Unfortunately, a simple (single-threaded) test case consisting of two processes 
performing an MPI_Rget+MPI_Wait hangs when running on two nodes. It succeeds if 
both processes run on a single node.

For completeness, I am attaching the config.log. The build environment was set 
up to build Open MPI for the login nodes (I wasn't sure how to properly 
cross-compile the libraries):

```
# this seems necessary to avoid a linker error during build
export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-cray PrgEnv-intel
module sw craype-haswell craype-sandybridge
module unload craype-hugepages16M
module unload cray-mpich
```

I am using mpirun to launch the test code. Below is the BTL debug log (with tcp 
disabled for clarity, turning it on makes no difference):

```
mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 ./mpi_test_loop
[nid03060:36184] mca: base: components_register: registering framework btl 
components
[nid03060:36184] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: component self register 
function successful
[nid03060:36184] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: registering framework btl 
components
[nid03061:36208] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: found loaded component ugni
[nid03061:36208] mca: base: components_register: component self register 
function successful
[nid03061:36208] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: found loaded component ugni
[nid03060:36184] mca: base: components_register: component ugni register 
function successful
[nid03060:36184] mca: base: components_register: found loaded component vader
[nid03061:36208] mca: base: components_register: component ugni register 
function successful
[nid03061:36208] mca: base: components_register: found loaded component vader
[nid03060:36184] mca: base: components_register: component vader register 
function successful
[nid03060:36184] mca: base: components_open: opening btl components
[nid03060:36184] mca: base: components_open: found loaded component self
[nid03060:36184] mca: base: components_open: component self open function 
successful
[nid03060:36184] mca: base: components_open: found loaded component ugni
[nid03060:36184] mca: base: components_open: component ugni open function 
successful
[nid03060:36184] mca: base: components_open: found loaded component vader
[nid03060:36184] mca: base: components_open: component vader open function 
successful
[nid03060:36184] select: initializing btl component self
[nid03060:36184] select: init of component self returned success
[nid03060:36184] select: initializing btl component ugni
[nid03061:36208] mca: base: components_register: component vader register 
function successful
[nid03061:36208] mca: base: components_open: opening btl components
[nid03061:36208] mca: base: components_open: found loaded component self
[nid03061:36208] mca: base: components_open: component self open function 
successful
[nid03061:36208] mca: base: components_open: found loaded component ugni
[nid03061:36208] mca: base: components_open: component ugni open function 
successful
[nid03061:36208] mca: base: components_open: found loaded component vader
[nid03061:36208] mca: base: components_open: component vader open function 
successful
[nid03061:36208] select: initializing btl component self
[nid03061:36208] select: init of component self returned success
[nid03061:36208] select: initializing btl component ugni
[nid03061:36208] select: init of component ugni returned success
[nid03061:36208] select: initializing btl component vader
[nid03061:36208] select: init of component vader returned failure
[nid03061:36208] mca: base: close: component vader closed
[nid03061:36208] mca: base: close: unloading component vader
[nid03060:36184] select: init of component ugni returned success
[nid03060:36184] select: initializing btl component vader
[nid03060:36184] select: init of component vader returned failure
[nid03060:36184] mca: base: close: component vader closed
[nid03060:36184] mca: base: close: unloading component vader
[nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on node 
nid03061
[nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on node 
nid03060
[nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on node 
(null)
[nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on node 
(null)
```

It looks like the UGNI btl is being initialized correctly but then fails to 
find the node to communicate with? Is there a way to get more information? 
There doesn't seem to be an MCA parameter to increase verbosity specifically of 
the UGNI btl.

Any help would be appreciated!

Cheers
Joseph
<config.log.tgz>
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to