The invalid writes in uGNI are nothing. I suggest adding any GNI_ call to a 
suppression file. The RB tree invalid write looks like a bug. I will take a 
look and see what might be causing it.

BTW, you can add --with-valgrind(=DIR) to configure. This will suppress some 
uninitialized value errors with btl/vader and other components. It won’t help 
with btl/ugni right now though.

-Nathan

> On May 17, 2018, at 3:50 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:
> 
> Nathan,
> 
> I am trying to track down some memory corruption that leads to crashes in my 
> application running on the Cray system using Open MPI (git-6093f2d). Valgrind 
> reports quite some invalid reads and writes inside Open MPI when running the 
> benchmark that I sent you earlier.
> 
> There are plenty of invalid reads in MPI_Init and MPI_Win_allocate. Valgrind 
> also reports some invalid writes during communication:
> 
> ```
> ==42751== Invalid write of size 8
> ==42751==    at 0x94C647D: GNII_POST_FMA_GET (in 
> /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
> ==42751==    by 0x94C8D74: GNI_PostFma (in 
> /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
> ==42751==    by 0x10FA21D0: mca_btl_ugni_get (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
> ==42751==    by 0x134AF6C5: ompi_osc_get_data_blocking (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
> ==42751==    by 0x134D0CC4: ompi_osc_rdma_peer_lookup (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
> ==42751==    by 0x134B4A1F: ompi_osc_rdma_rget (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
> ==42751==    by 0x46C1D52: PMPI_Rget (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
> ==42751==    by 0x20001EA9: main (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
> ==42751==  Address 0x2aaaaabc0000 is not stack'd, malloc'd or (recently) 
> free'd
> 
> ==42751== Invalid write of size 8
> ==42751==    at 0x94D76BC: GNII_SmsgSend (in 
> /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
> ==42751==    by 0x94D9D5C: GNI_SmsgSendWTag (in 
> /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
> ==42751==    by 0x10F9D9E6: mca_btl_ugni_sendi (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
> ==42751==    by 0x11BE5DDF: mca_pml_ob1_isend (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_pml_ob1.so)
> ==42751==    by 0x1201DC40: NBC_Progress (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
> ==42751==    by 0x1201DC91: NBC_Progress (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
> ==42751==    by 0x1201C692: ompi_coll_libnbc_progress (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
> ==42751==    by 0x631A503: opal_progress (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x632111C: ompi_sync_wait_mt (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x4669A4C: ompi_comm_nextcid (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
> ==42751==    by 0x4667ECC: ompi_comm_dup_with_info (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
> ==42751==    by 0x134C15AE: ompi_osc_rdma_component_select (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
> ==42751==  Address 0x2aaaaabaf000 is not stack'd, malloc'd or (recently) 
> free'd
> 
> And some write-after-free during MPI_Finalize:
> ==42751== Invalid write of size 8
> ==42751==    at 0x6316E64: opal_rb_tree_delete (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x1076BA03: mca_mpool_hugepage_seg_free (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
> ==42751==    by 0x1015EB33: mca_allocator_bucket_cleanup (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
> ==42751==    by 0x1015DF5C: mca_allocator_bucket_finalize (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
> ==42751==    by 0x1076BAE6: mca_mpool_hugepage_finalize (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
> ==42751==    by 0x1076C202: mca_mpool_hugepage_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
> ==42751==    by 0x633CED9: mca_base_component_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x633CE01: mca_base_components_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x63C6F31: mca_mpool_base_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x634AEF7: mca_base_framework_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x4687B6A: ompi_mpi_finalize (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
> ==42751==    by 0x20001F35: main (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
> ==42751==  Address 0xa3aa348 is 16,440 bytes inside a block of size 16,568 
> free'd
> ==42751==    at 0x4428CDA: free (vg_replace_malloc.c:530)
> ==42751==    by 0x630FED2: opal_free_list_destruct (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x63160C1: opal_rb_tree_destruct (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x1076BACE: mca_mpool_hugepage_finalize (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
> ==42751==    by 0x1076C202: mca_mpool_hugepage_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
> ==42751==    by 0x633CED9: mca_base_component_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x633CE01: mca_base_components_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x63C6F31: mca_mpool_base_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x634AEF7: mca_base_framework_close (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
> ==42751==    by 0x4687B6A: ompi_mpi_finalize (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
> ==42751==    by 0x20001F35: main (in 
> /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
> ```
> 
> I'm not sure whether the invalid writes (and reads) during initialization and 
> communication are caused by Open MPI or uGNI itself and whether they are 
> critical (the addresses seem to be "special"). The write-after-free in 
> MPI_Finalize seems suspicious though. I cannot say whether that causes the 
> memory corruption I am seeing but I thought I report it. I will dig further 
> into this to try to figure out what causes the crashes (they are not 
> deterministically reproducible, unfortunately).
> 
> Cheers,
> Joseph
> 
> On 05/10/2018 03:24 AM, Nathan Hjelm wrote:
>> Thanks for confirming that it works for you as well. I have a PR open on 
>> v3.1.x that brings osc/rdma up to date with master. I will also be bringing 
>> some code that greatly improves the multi-threaded RMA performance on Aries 
>> systems (at least with benchmarks— github.com/hpc/rma-mt). That will not 
>> make it into v3.1.x but will be in v4.0.0.
>> -Nathan
>>> On May 9, 2018, at 1:26 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:
>>> 
>>> Nathan,
>>> 
>>> Thank you, I can confirm that it works as expected with master on our 
>>> system. I will stick to this version then until 3.1.1 is out.
>>> 
>>> Joseph
>>> 
>>> On 05/08/2018 05:34 PM, Nathan Hjelm wrote:
>>>> Looks like it doesn't fail with master so at some point I fixed this bug. 
>>>> The current plan is to bring all the master changes into v3.1.1. This 
>>>> includes a number of bug fixes.
>>>> -Nathan
>>>> On May 08, 2018, at 08:25 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:
>>>>> Nathan,
>>>>> 
>>>>> Thanks for looking into that. My test program is attached.
>>>>> 
>>>>> Best
>>>>> Joseph
>>>>> 
>>>>> On 05/08/2018 02:56 PM, Nathan Hjelm wrote:
>>>>>> I will take a look today. Can you send me your test program?
>>>>>> 
>>>>>> -Nathan
>>>>>> 
>>>>>>> On May 8, 2018, at 2:49 AM, Joseph Schuchart <schuch...@hlrs.de> wrote:
>>>>>>> 
>>>>>>> All,
>>>>>>> 
>>>>>>> I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 
>>>>>>> (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. 
>>>>>>> Unfortunately, a simple (single-threaded) test case consisting of two 
>>>>>>> processes performing an MPI_Rget+MPI_Wait hangs when running on two 
>>>>>>> nodes. It succeeds if both processes run on a single node.
>>>>>>> 
>>>>>>> For completeness, I am attaching the config.log. The build environment 
>>>>>>> was set up to build Open MPI for the login nodes (I wasn't sure how to 
>>>>>>> properly cross-compile the libraries):
>>>>>>> 
>>>>>>> ```
>>>>>>> # this seems necessary to avoid a linker error during build
>>>>>>> export CRAYPE_LINK_TYPE=dynamic
>>>>>>> module swap PrgEnv-cray PrgEnv-intel
>>>>>>> module sw craype-haswell craype-sandybridge
>>>>>>> module unload craype-hugepages16M
>>>>>>> module unload cray-mpich
>>>>>>> ```
>>>>>>> 
>>>>>>> I am using mpirun to launch the test code. Below is the BTL debug log 
>>>>>>> (with tcp disabled for clarity, turning it on makes no difference):
>>>>>>> 
>>>>>>> ```
>>>>>>> mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 
>>>>>>> ./mpi_test_loop
>>>>>>> [nid03060:36184] mca: base: components_register: registering framework 
>>>>>>> btl components
>>>>>>> [nid03060:36184] mca: base: components_register: found loaded component 
>>>>>>> self
>>>>>>> [nid03060:36184] mca: base: components_register: component self 
>>>>>>> register function successful
>>>>>>> [nid03060:36184] mca: base: components_register: found loaded component 
>>>>>>> sm
>>>>>>> [nid03061:36208] mca: base: components_register: registering framework 
>>>>>>> btl components
>>>>>>> [nid03061:36208] mca: base: components_register: found loaded component 
>>>>>>> self
>>>>>>> [nid03060:36184] mca: base: components_register: found loaded component 
>>>>>>> ugni
>>>>>>> [nid03061:36208] mca: base: components_register: component self 
>>>>>>> register function successful
>>>>>>> [nid03061:36208] mca: base: components_register: found loaded component 
>>>>>>> sm
>>>>>>> [nid03061:36208] mca: base: components_register: found loaded component 
>>>>>>> ugni
>>>>>>> [nid03060:36184] mca: base: components_register: component ugni 
>>>>>>> register function successful
>>>>>>> [nid03060:36184] mca: base: components_register: found loaded component 
>>>>>>> vader
>>>>>>> [nid03061:36208] mca: base: components_register: component ugni 
>>>>>>> register function successful
>>>>>>> [nid03061:36208] mca: base: components_register: found loaded component 
>>>>>>> vader
>>>>>>> [nid03060:36184] mca: base: components_register: component vader 
>>>>>>> register function successful
>>>>>>> [nid03060:36184] mca: base: components_open: opening btl components
>>>>>>> [nid03060:36184] mca: base: components_open: found loaded component self
>>>>>>> [nid03060:36184] mca: base: components_open: component self open 
>>>>>>> function successful
>>>>>>> [nid03060:36184] mca: base: components_open: found loaded component ugni
>>>>>>> [nid03060:36184] mca: base: components_open: component ugni open 
>>>>>>> function successful
>>>>>>> [nid03060:36184] mca: base: components_open: found loaded component 
>>>>>>> vader
>>>>>>> [nid03060:36184] mca: base: components_open: component vader open 
>>>>>>> function successful
>>>>>>> [nid03060:36184] select: initializing btl component self
>>>>>>> [nid03060:36184] select: init of component self returned success
>>>>>>> [nid03060:36184] select: initializing btl component ugni
>>>>>>> [nid03061:36208] mca: base: components_register: component vader 
>>>>>>> register function successful
>>>>>>> [nid03061:36208] mca: base: components_open: opening btl components
>>>>>>> [nid03061:36208] mca: base: components_open: found loaded component self
>>>>>>> [nid03061:36208] mca: base: components_open: component self open 
>>>>>>> function successful
>>>>>>> [nid03061:36208] mca: base: components_open: found loaded component ugni
>>>>>>> [nid03061:36208] mca: base: components_open: component ugni open 
>>>>>>> function successful
>>>>>>> [nid03061:36208] mca: base: components_open: found loaded component 
>>>>>>> vader
>>>>>>> [nid03061:36208] mca: base: components_open: component vader open 
>>>>>>> function successful
>>>>>>> [nid03061:36208] select: initializing btl component self
>>>>>>> [nid03061:36208] select: init of component self returned success
>>>>>>> [nid03061:36208] select: initializing btl component ugni
>>>>>>> [nid03061:36208] select: init of component ugni returned success
>>>>>>> [nid03061:36208] select: initializing btl component vader
>>>>>>> [nid03061:36208] select: init of component vader returned failure
>>>>>>> [nid03061:36208] mca: base: close: component vader closed
>>>>>>> [nid03061:36208] mca: base: close: unloading component vader
>>>>>>> [nid03060:36184] select: init of component ugni returned success
>>>>>>> [nid03060:36184] select: initializing btl component vader
>>>>>>> [nid03060:36184] select: init of component vader returned failure
>>>>>>> [nid03060:36184] mca: base: close: component vader closed
>>>>>>> [nid03060:36184] mca: base: close: unloading component vader
>>>>>>> [nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on 
>>>>>>> node nid03061
>>>>>>> [nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on 
>>>>>>> node nid03060
>>>>>>> [nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on 
>>>>>>> node (null)
>>>>>>> [nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on 
>>>>>>> node (null)
>>>>>>> ```
>>>>>>> 
>>>>>>> It looks like the UGNI btl is being initialized correctly but then 
>>>>>>> fails to find the node to communicate with? Is there a way to get more 
>>>>>>> information? There doesn't seem to be an MCA parameter to increase 
>>>>>>> verbosity specifically of the UGNI btl.
>>>>>>> 
>>>>>>> Any help would be appreciated!
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Joseph
>>>>>>> <config.log.tgz>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to