Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv
Things that read like they should be unsigned look suspicious to me: nbElems -909934592 count -1819869184 Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov > On Nov 1, 2018, at 10:34 PM, Ben Menadue wrote: > > Hi, > > I haven’t heard back from the user yet, but I just put this example together > which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a > fair amount of memory, about 14.3GB per process, so I was running it with > -map-by ppr:1:node. > > It doesn’t fail with the segfault as the user’s code does, but it does > SIGABRT: > > 16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll > -map-by ppr:1:node -np 4 ./a.out > [r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53 > Pointer 0x2bb7ceedb010 size 131040 is outside > [0x2b9ec63cb010,0x2bad1458b010] for > base ptr 0x2b9ec63cb010 count 1 and data > [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 > used 6 > true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 > (extent 6144000) > nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---] >contain OPAL_FLOAT4:* > --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent > 8000 > --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0xaba95 > (4608000) blen 0 extent 4 (size 8000) > --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement > 4608000 size of data 8000 > --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent > 8000 > --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 > extent 4 (size 8000) > --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement > 0 size of data 8000 > ---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement > 4608000 size of data 655228928 > Optimized description > -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0xaba95 > (4608000) blen 1 extent 1 (size 1536000) > -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 > extent 1 (size 1536000) > ---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement > 4608000 > [r4450:11544] *** Process received signal *** > [r4450:11544] Signal: Aborted (6) > [r4450:11544] Signal code: (-6) > > Cheers, > Ben > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv
Hi,I haven’t heard back from the user yet, but I just put this example together which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a fair amount of memory, about 14.3GB per process, so I was running it with -map-by ppr:1:node.It doesn’t fail with the segfault as the user’s code does, but it does SIGABRT:16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll -map-by ppr:1:node -np 4 ./a.out[r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53 Pointer 0x2bb7ceedb010 size 131040 is outside [0x2b9ec63cb010,0x2bad1458b010] for base ptr 0x2b9ec63cb010 count 1 and data [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 used 6true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 (extent 6144000)nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---] contain OPAL_FLOAT4:* --C[---][---] OPAL_LOOP_S 192 times the next 2 elements extent 8000--C---P-D--[---][---] OPAL_FLOAT4 count 2000 disp 0xaba95 (4608000) blen 0 extent 4 (size 8000)--C[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 4608000 size of data 8000--C[---][---] OPAL_LOOP_S 192 times the next 2 elements extent 8000--C---P-D--[---][---] OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 extent 4 (size 8000)--C[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 0 size of data 8000---G---[---][---] OPAL_LOOP_E prev 6 elements first elem displacement 4608000 size of data 655228928Optimized description -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0xaba95 (4608000) blen 1 extent 1 (size 1536000)-cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 extent 1 (size 1536000)---G---[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 4608000 [r4450:11544] *** Process received signal ***[r4450:11544] Signal: Aborted (6)[r4450:11544] Signal code: (-6)Cheers,Ben allgatherv_failure.c Description: Binary data On 2 Nov 2018, at 12:09 pm, Ben Menaduewrote:HI Gilles,On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet wrote:I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to CUDA environments ?No, this is just on normal CPU-only nodes. But memcpy always goes through opal_cuda_memcpy when CUDA support is enabled, even if there’s no GPUs in use (or indeed, even installed).The coll/tuned default collective module is known not to work when tasks use matching but different signatures.For example, one task sends one vector of N elements, and the other task receives N elements.This is the call that triggers it: ierror = MPI_Allgatherv(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, S[0], recvcounts, displs, mpitype_vec_nobs, node_comm);(and changing the source datatype to MPI_BYTE to avoid the NULL handle doesn’t help).A workaround worth trying is tompirun --mca coll basic ...Thanks — using --mca coll basic,libnbc fixes it (basic on its own fails because it can’t work out what to use for Iallgather).Last but not least, could you please post a minimal example (and the number of MPI tasks used) that can evidence the issue ?I’m just waiting for the user to get back to me with the okay to share the code. Otherwise, I’ll see what I can put together myself. It works on 42 cores (at 14 per node = 3 nodes) but fails for 43 cores (so 1 rank on the 4th node). The communicator includes 1 rank per node, so it’s going from a three-rank communicator to a four-rank communicator — perhaps the tuned algorithm changes at that point?Cheers,Ben___devel mailing listdevel@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/devel___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv
HI Gilles, > On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet wrote: > I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to > CUDA environments ? No, this is just on normal CPU-only nodes. But memcpy always goes through opal_cuda_memcpy when CUDA support is enabled, even if there’s no GPUs in use (or indeed, even installed). > The coll/tuned default collective module is known not to work when tasks use > matching but different signatures. > For example, one task sends one vector of N elements, and the other task > receives N elements. This is the call that triggers it: ierror = MPI_Allgatherv(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, S[0], recvcounts, displs, mpitype_vec_nobs, node_comm); (and changing the source datatype to MPI_BYTE to avoid the NULL handle doesn’t help). > A workaround worth trying is to > mpirun --mca coll basic ... Thanks — using --mca coll basic,libnbc fixes it (basic on its own fails because it can’t work out what to use for Iallgather). > Last but not least, could you please post a minimal example (and the number > of MPI tasks used) that can evidence the issue ? I’m just waiting for the user to get back to me with the okay to share the code. Otherwise, I’ll see what I can put together myself. It works on 42 cores (at 14 per node = 3 nodes) but fails for 43 cores (so 1 rank on the 4th node). The communicator includes 1 rank per node, so it’s going from a three-rank communicator to a four-rank communicator — perhaps the tuned algorithm changes at that point? Cheers, Ben ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv
Hi Ben, I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to CUDA environments ? The coll/tuned default collective module is known not to work when tasks use matching but different signatures. For example, one task sends one vector of N elements, and the other task receives N elements. A workaround worth trying is to mpirun --mca coll basic ... Last but not least, could you please post a minimal example (and the number of MPI tasks used) that can evidence the issue ? Cheers, Gilles On 11/2/2018 7:59 AM, Ben Menadue wrote: Hi, One of our users is reporting an issue using MPI_Allgatherv with a large derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI 3.1.2 produces a ton of messages like this before the segfault: [r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53 Pointer 0x2acd0121b010 size 131040 is outside [0x2ac5ed268010,0x2ac980ad8010] for base ptr 0x2ac5ed268010 count 1 and data [r3816:50921] Datatype 0x42998b0[] size 592000 align 4 id 0 length 7 used 6 true_lb 0 true_ub 1536000 (true_extent 1536000) lb 0 ub 1536000 (extent 1536000) nbElems 148000 loops 4 flags 104 (committed )-c-GD--[---][---] contain OPAL_FLOAT4:* --C[---][---] OPAL_LOOP_S 4 times the next 2 elements extent 8000 --C---P-D--[---][---] OPAL_FLOAT4 count 2000 disp 0x380743000 (1504000) blen 0 extent 4 (size 8000) --C[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 1504000 size of data 8000 --C[---][---] OPAL_LOOP_S 70 times the next 2 elements extent 8000 --C---P-D--[---][---] OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 extent 4 (size 8000) --C[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 0 size of data 8000 ---G---[---][---] OPAL_LOOP_E prev 6 elements first elem displacement 1504000 size of data 1625032704 Optimized description -cC---P-DB-[---][---] OPAL_UINT1 count 32000 disp 0x380743000 (1504000) blen 1 extent 1 (size 32000) -cC---P-DB-[---][---] OPAL_UINT1 count 1305032704 disp 0x0 (0) blen 1 extent 1 (size 56) ---G---[---][---] OPAL_LOOP_E prev 2 elements first elem displacement 1504000 size of d Here is the backtrace: backtrace 0 0x0008987b memcpy() ???:0 1 0x000639b6 opal_cuda_memcpy() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99 2 0x0005cd7a pack_predefined_data() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56 3 0x0005e845 opal_generic_simple_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319 4 0x0004ce6e opal_convertor_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272 5 0xe3b6 mca_btl_openib_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609 6 0x00023c75 mca_bml_base_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341 7 0x00027d2a mca_pml_ob1_send_request_schedule_once() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995 8 0x0002473c mca_pml_ob1_send_request_schedule_exclusive() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313 9 0x0002479d mca_pml_ob1_send_request_schedule() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337 10 0x000256fe mca_pml_ob1_frag_completion() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321 11 0x0001baaf handle_wc() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565 12 0x0001c20c poll_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719 13 0x0001c6c0 progress_one_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829 14 0x0001c763 btl_
[OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv
Hi, One of our users is reporting an issue using MPI_Allgatherv with a large derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI 3.1.2 produces a ton of messages like this before the segfault: [r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53 Pointer 0x2acd0121b010 size 131040 is outside [0x2ac5ed268010,0x2ac980ad8010] for base ptr 0x2ac5ed268010 count 1 and data [r3816:50921] Datatype 0x42998b0[] size 592000 align 4 id 0 length 7 used 6 true_lb 0 true_ub 1536000 (true_extent 1536000) lb 0 ub 1536000 (extent 1536000) nbElems 148000 loops 4 flags 104 (committed )-c-GD--[---][---] contain OPAL_FLOAT4:* --C[---][---]OPAL_LOOP_S 4 times the next 2 elements extent 8000 --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x380743000 (1504000) blen 0 extent 4 (size 8000) --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 1504000 size of data 8000 --C[---][---]OPAL_LOOP_S 70 times the next 2 elements extent 8000 --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 extent 4 (size 8000) --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 0 size of data 8000 ---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement 1504000 size of data 1625032704 Optimized description -cC---P-DB-[---][---] OPAL_UINT1 count 32000 disp 0x380743000 (1504000) blen 1 extent 1 (size 32000) -cC---P-DB-[---][---] OPAL_UINT1 count 1305032704 disp 0x0 (0) blen 1 extent 1 (size 56) ---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 1504000 size of d Here is the backtrace: backtrace 0 0x0008987b memcpy() ???:0 1 0x000639b6 opal_cuda_memcpy() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99 2 0x0005cd7a pack_predefined_data() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56 3 0x0005e845 opal_generic_simple_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319 4 0x0004ce6e opal_convertor_pack() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272 5 0xe3b6 mca_btl_openib_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609 6 0x00023c75 mca_bml_base_prepare_src() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341 7 0x00027d2a mca_pml_ob1_send_request_schedule_once() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995 8 0x0002473c mca_pml_ob1_send_request_schedule_exclusive() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313 9 0x0002479d mca_pml_ob1_send_request_schedule() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337 10 0x000256fe mca_pml_ob1_frag_completion() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321 11 0x0001baaf handle_wc() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565 12 0x0001c20c poll_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719 13 0x0001c6c0 progress_one_device() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829 14 0x0001c763 btl_openib_component_progress() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3853 15 0x0002ff90 opal_progress() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/../../../../opal/runtime/opal_progress.c:228 16 0x0001114c ompi_request_wait_completion() /short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/request/request.h: