Hi Josh,

ok, thanks for the suggestion.  We are in process to test with IntelMPI right now.  I hope to do it with a newer version of OpenMPI too.

Do you suggest a minimum version for UCX lib?

Thanks,

Eric

On 2022-06-02 04:05, Josh Hursey via users wrote:
I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
 * https://www.open-mpi.org/software/ompi/v4.1/
 * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html

We fixed some large payload collective issues in that release which might be what you are seeing here with MPI_Alltoallv with the tuned collective component.



On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users <users@lists.open-mpi.org> wrote:

    Hi Eric,

    Yes, UCX is supposed to be stable for large sized problems.

    Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and
    OMPI-4.1.2 + UCX1.11.2?

    Have you also tried to run large sized problems test with OMPI-5.0.x?

    Regarding the application, at some point it invokes MPI_Alltoallv
    sending more than 2GB to some of the ranks (using derived dt), right?

    //WBR, Mikhail

    *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of
    *Eric Chamberland via users
    *Sent:* Thursday, June 2, 2022 5:31 AM
    *To:* Open MPI Users <users@lists.open-mpi.org>
    *Cc:* Eric Chamberland <eric.chamberl...@giref.ulaval.ca>; Thomas
    Briffard <thomas.briff...@michelin.com>; Vivien Clauzon
    <vivien.clau...@michelin.com>; dave.mar...@giref.ulaval.ca; Ramses
    van Zon <r...@scinet.utoronto.ca>; charles.coulomb...@ulaval.ca
    *Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX
    library 1.8.0 and 1.11.2 for large sized communications using both
    OpenMPI 4.0.3 and 4.1.2

    Hi,

    In the past, we have successfully launched large sized (finite
    elements) computations using PARMetis as mesh partitioner.

    It was first in 2012 with OpenMPI (v2.?) and secondly in March
    2019 with OpenMPI 3.1.2 that we succeeded.

    Today, we have a bunch of nightly (small) tests running nicely and
    testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and
    IntelMPI 2021.6.

    Preparing for launching the same computation we did in 2012, and
    even larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and
    OpenMPI 4.1.2+ucx-1.11.2 and launched computation from small to
    large problems (meshes).

    For small meshes, it goes fine.

    But when we reach near 2^31 faces into the 3D mesh we are using
    and call ParMETIS_V3_PartMeshKway, we always get a segfault with
    the same backtrace pointing into ucx library:

    Wed Jun  1 23:04:54
    2022<stdout>:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut
    VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764
    VmHWM: 359012 <etiq_18>
    Wed Jun  1 23:07:07 2022<stdout>:Erreur    :  MEF++ Signal recu :
    11 :  segmentation violation
    Wed Jun  1 23:07:07 2022<stdout>:Erreur    :
    Wed Jun  1 23:07:07 2022<stdout>:------------------------------
    (Début des informations destinées aux développeurs C++)
    ------------------------------
    Wed Jun  1 23:07:07 2022<stdout>:La pile d'appels contient 27
    symboles.
    Wed Jun  1 23:07:07 2022<stdout>:# 000:
    reqBacktrace(std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> >&)  >>>  probGD.opt
    
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
    [0x4119f1])
    Wed Jun  1 23:07:07 2022<stdout>:# 001: attacheDebugger()  >>>
     probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
    Wed Jun  1 23:07:07 2022<stdout>:# 002:
    
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
    [0x2ab3aef0e5cf]
    Wed Jun  1 23:07:07 2022<stdout>:# 003: /lib64/libc.so.6(+0x36400)
    [0x2ab3bd59a400]
    Wed Jun  1 23:07:07 2022<stdout>:# 004:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
    [0x2ab3c966e353]
    Wed Jun  1 23:07:07 2022<stdout>:# 005:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
    [0x2ab3c968d6b7]
    Wed Jun  1 23:07:07 2022<stdout>:# 006:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
    [0x2ab3ca712137]
    Wed Jun  1 23:07:07 2022<stdout>:# 007:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
    [0x2ab3c968cd3c]
    Wed Jun  1 23:07:07 2022<stdout>:# 008:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
    [0x2ab3c9696dcd]
    Wed Jun  1 23:07:07 2022<stdout>:# 009:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
    [0x2ab3c922e0b2]
    Wed Jun  1 23:07:07 2022<stdout>:# 010:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
    [0x2ab3bbca5a32]
    Wed Jun  1 23:07:07 2022<stdout>:# 011:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
    [0x2ab3bbcad941]
    Wed Jun  1 23:07:07 2022<stdout>:# 012:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)
    [0x2ab3d4836da2]
    Wed Jun  1 23:07:07 2022<stdout>:# 013:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29)
    [0x2ab3bbc7bdf9]
    Wed Jun  1 23:07:07 2022<stdout>:# 014:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106)
    [0x2ab3bb0e1c06]
    Wed Jun  1 23:07:07 2022<stdout>:# 015:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6)
    [0x2ab3bb0f10b6]
    Wed Jun  1 23:07:07 2022<stdout>:# 016:
    
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100)
    [0x2ab3bb0f1ac0]

    PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices. 
    Here are PETSc configure options:

    
--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
    COPTFLAGS=\"-O2 -march=native\"
    CXXOPTFLAGS=\"-O2 -march=native\"
    FOPTFLAGS=\"-O2 -march=native\"
    --download-fftw=1
    --download-hdf5=1
    --download-hypre=1
    --download-metis=1
    --download-mumps=1
    --download-parmetis=1
    --download-plapack=1
    --download-prometheus=1
    --download-ptscotch=1
    --download-scotch=1
    --download-sprng=1
    --download-superlu_dist=1
    --download-triangle=1
    --with-avx512-kernels=1
    --with-blaslapack-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
    --with-cc=mpicc
    --with-cxx=mpicxx
    --with-cxx-dialect=C++11
    --with-debugging=0
    --with-fc=mpifort
    --with-mkl_pardiso-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
    --with-scalapack=1
    
--with-scalapack-lib=\"[/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/oneapi/2021u4/mkl/2021.4.0/lib/intel64/libmkl_blacs_openmpi_lp64.so]\"
    --with-x=0
    --with-64-bit-indices=1
    --with-memalign=64

    and OpenMPI configure options:

    
'--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2'
    '--enable-mpi-cxx'
    '--enable-mpi1-compatibility'
    '--with-hwloc=internal'
    '--with-knem=/opt/knem-1.1.3.90mlnx1'
    '--with-libevent=internal'
    '--with-platform=contrib/platform/mellanox/optimized'
    '--with-pmix=internal'
    '--with-slurm=/opt/slurm'
    '--with-ucx=/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2'

    I am then wondering:

    1) Is UCX library considered "stable" for production use with very
    large sized problems ?

    2) Is there a way to "bypass" UCX at runtime?

    3) Any idea for debugging this?

    Of course, I do not yet have a "minimum reproducer" that bugs,
    since it happens only on "large" problems, but I think I could
    export the data for a 512 processes reproducer with PARMetis call
    only...

    Thanks for helping,

    Eric

--
    Eric Chamberland, ing., M. Ing

    Professionnel de recherche

    GIREF/Université Laval

    (418) 656-2131 poste 41 22 42



--
Josh Hursey
IBM Spectrum MPI Developer

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

Reply via email to