Alltoallv has both a large count and large displacement problem in the API.
You can work around the latter by using neighborhood alltoallv using a
duplicate of your original communicator that’s neighborhood compatible.
Neighborhood collectives use MPI_Aint displacements instead of int.

If you need tests,
https://github.com/jeffhammond/BigMPI test suite is nothing but large count
MPI calls using derived data types.

Jeff

On Thu 2. Jun 2022 at 22.28 Eric Chamberland via users <
users@lists.open-mpi.org> wrote:

> Hi Josh,
>
> ok, thanks for the suggestion.  We are in process to test with IntelMPI
> right now.  I hope to do it with a newer version of OpenMPI too.
>
> Do you suggest a minimum version for UCX lib?
>
> Thanks,
>
> Eric
> On 2022-06-02 04:05, Josh Hursey via users wrote:
>
> I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
>  * https://www.open-mpi.org/software/ompi/v4.1/
>  * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html
>
> We fixed some large payload collective issues in that release which might
> be what you are seeing here with MPI_Alltoallv with the tuned collective
> component.
>
>
>
> On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users <
> users@lists.open-mpi.org> wrote:
>
>> Hi Eric,
>>
>>
>>
>> Yes, UCX is supposed to be stable for large sized problems.
>>
>> Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and
>> OMPI-4.1.2 + UCX1.11.2?
>>
>> Have you also tried to run large sized problems test with OMPI-5.0.x?
>>
>> Regarding the application, at some point it invokes MPI_Alltoallv sending
>> more than 2GB to some of the ranks (using derived dt), right?
>>
>>
>>
>> //WBR, Mikhail
>>
>>
>>
>> *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Eric
>> Chamberland via users
>> *Sent:* Thursday, June 2, 2022 5:31 AM
>> *To:* Open MPI Users <users@lists.open-mpi.org>
>> *Cc:* Eric Chamberland <eric.chamberl...@giref.ulaval.ca>; Thomas
>> Briffard <thomas.briff...@michelin.com>; Vivien Clauzon <
>> vivien.clau...@michelin.com>; dave.mar...@giref.ulaval.ca; Ramses van
>> Zon <r...@scinet.utoronto.ca>; charles.coulomb...@ulaval.ca
>> *Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX
>> library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI
>> 4.0.3 and 4.1.2
>>
>>
>>
>> Hi,
>>
>> In the past, we have successfully launched large sized (finite elements)
>> computations using PARMetis as mesh partitioner.
>>
>> It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with
>> OpenMPI 3.1.2 that we succeeded.
>>
>> Today, we have a bunch of nightly (small) tests running nicely and
>> testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI
>> 2021.6.
>>
>> Preparing for launching the same computation we did in 2012, and even
>> larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI
>> 4.1.2+ucx-1.11.2 and launched computation from small to large problems
>> (meshes).
>>
>> For small meshes, it goes fine.
>>
>> But when we reach near 2^31 faces into the 3D mesh we are using and call
>> ParMETIS_V3_PartMeshKway, we always get a segfault with the same backtrace
>> pointing into ucx library:
>>
>> Wed Jun  1 23:04:54
>> 2022<stdout>:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut
>> VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 359012
>> <etiq_18>
>> Wed Jun  1 23:07:07 2022<stdout>:Erreur    :  MEF++ Signal recu : 11 :
>>  segmentation violation
>> Wed Jun  1 23:07:07 2022<stdout>:Erreur    :
>> Wed Jun  1 23:07:07 2022<stdout>:------------------------------ (Début
>> des informations destinées aux développeurs C++)
>> ------------------------------
>> Wed Jun  1 23:07:07 2022<stdout>:La pile d'appels contient 27 symboles.
>> Wed Jun  1 23:07:07 2022<stdout>:# 000:
>> reqBacktrace(std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> >&)  >>>  probGD.opt
>> (probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
>> [0x4119f1])
>> Wed Jun  1 23:07:07 2022<stdout>:# 001: attacheDebugger()  >>>
>>  probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
>> Wed Jun  1 23:07:07 2022<stdout>:# 002:
>> /gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
>> [0x2ab3aef0e5cf]
>> Wed Jun  1 23:07:07 2022<stdout>:# 003: /lib64/libc.so.6(+0x36400)
>> [0x2ab3bd59a400]
>> Wed Jun  1 23:07:07 2022<stdout>:# 004:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
>> [0x2ab3c966e353]
>> Wed Jun  1 23:07:07 2022<stdout>:# 005:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
>> [0x2ab3c968d6b7]
>> Wed Jun  1 23:07:07 2022<stdout>:# 006:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
>> [0x2ab3ca712137]
>> Wed Jun  1 23:07:07 2022<stdout>:# 007:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
>> [0x2ab3c968cd3c]
>> Wed Jun  1 23:07:07 2022<stdout>:# 008:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
>> [0x2ab3c9696dcd]
>> Wed Jun  1 23:07:07 2022<stdout>:# 009:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
>> [0x2ab3c922e0b2]
>> Wed Jun  1 23:07:07 2022<stdout>:# 010:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
>> [0x2ab3bbca5a32]
>> Wed Jun  1 23:07:07 2022<stdout>:# 011:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
>> [0x2ab3bbcad941]
>> Wed Jun  1 23:07:07 2022<stdout>:# 012:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)
>> [0x2ab3d4836da2]
>> Wed Jun  1 23:07:07 2022<stdout>:# 013:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29)
>> [0x2ab3bbc7bdf9]
>> Wed Jun  1 23:07:07 2022<stdout>:# 014:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106)
>> [0x2ab3bb0e1c06]
>> Wed Jun  1 23:07:07 2022<stdout>:# 015:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6)
>> [0x2ab3bb0f10b6]
>> Wed Jun  1 23:07:07 2022<stdout>:# 016:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100)
>> [0x2ab3bb0f1ac0]
>>
>> PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here
>> are PETSc configure options:
>>
>>
>> --prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
>> COPTFLAGS=\"-O2 -march=native\"
>> CXXOPTFLAGS=\"-O2 -march=native\"
>> FOPTFLAGS=\"-O2 -march=native\"
>> --download-fftw=1
>> --download-hdf5=1
>> --download-hypre=1
>> --download-metis=1
>> --download-mumps=1
>> --download-parmetis=1
>> --download-plapack=1
>> --download-prometheus=1
>> --download-ptscotch=1
>> --download-scotch=1
>> --download-sprng=1
>> --download-superlu_dist=1
>> --download-triangle=1
>> --with-avx512-kernels=1
>> --with-blaslapack-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
>> --with-cc=mpicc
>> --with-cxx=mpicxx
>> --with-cxx-dialect=C++11
>> --with-debugging=0
>> --with-fc=mpifort
>> --with-mkl_pardiso-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
>> --with-scalapack=1
>>
>> --with-scalapack-lib=\"[/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_blacs_openmpi_lp64.so]\"
>> --with-x=0
>> --with-64-bit-indices=1
>> --with-memalign=64
>>
>> and OpenMPI configure options:
>>
>>
>> '--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2'
>> '--enable-mpi-cxx'
>> '--enable-mpi1-compatibility'
>> '--with-hwloc=internal'
>> '--with-knem=/opt/knem-1.1.3.90mlnx1'
>> '--with-libevent=internal'
>> '--with-platform=contrib/platform/mellanox/optimized'
>> '--with-pmix=internal'
>> '--with-slurm=/opt/slurm'
>> '--with-ucx=/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2'
>>
>> I am then wondering:
>>
>> 1) Is UCX library considered "stable" for production use with very large
>> sized problems ?
>>
>> 2) Is there a way to "bypass" UCX at runtime?
>>
>> 3) Any idea for debugging this?
>>
>> Of course, I do not yet have a "minimum reproducer" that bugs, since it
>> happens only on "large" problems, but I think I could export the data for a
>> 512 processes reproducer with PARMetis call only...
>>
>> Thanks for helping,
>>
>> Eric
>>
>> --
>>
>> Eric Chamberland, ing., M. Ing
>>
>> Professionnel de recherche
>>
>> GIREF/Université Laval
>>
>> (418) 656-2131 poste 41 22 42
>>
>>
>
> --
> Josh Hursey
> IBM Spectrum MPI Developer
>
> --
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42
>
> --
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/

Reply via email to