[OMPI users] SLURM seems to ignore --output-filename option of OpenMPI

2019-09-30 Thread Eric Chamberland via users

Hi,

I am using OpenMPI 3.1.2 with slurm 17.11.12 and it looks like I can't 
have the "--output-filename" option taken into account.  All my outputs 
are going into slurms output files.


Can it be imposed or ignored by a slurm configuration?

How is it possible to bypass that?

Strangely, the "--timestamp-output" seems to work well...

Thanks,

Eric




Re: [OMPI users] SLURM seems to ignore --output-filename option of OpenMPI

2019-10-10 Thread Eric Chamberland via users

Hi,

ok I think I just completely missed out a default behavior change with 
3.x and over:


--output-filename foo

is now generating a *directory* foo

Before 3.x, it was *files* named with "foo.1.rank".

Is there a way to have --output-filename behave like in vers 2.x and before?

Thanks,

Eric


Le 2019-09-30 à 3:34 p.m., Eric Chamberland via users a écrit :

Hi,

I am using OpenMPI 3.1.2 with slurm 17.11.12 and it looks like I can't 
have the "--output-filename" option taken into account.  All my 
outputs are going into slurms output files.


Can it be imposed or ignored by a slurm configuration?

How is it possible to bypass that?

Strangely, the "--timestamp-output" seems to work well...

Thanks,

Eric



[OMPI users] Error code for I/O operations

2021-06-30 Thread Eric Chamberland via users

Hi,

I have a simple question about error codes returned by MPI_File_*_all* 
and MPI_File_open/close functions:


If an error is returned will it be the same for *all* processes? In 
other worlds, are error codes communicated under the hood so we, end 
users, can avoid to add "reduce" on those error codes?


Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42



[OMPI users] Status of pNFS, CephFS and MPI I/O

2021-09-23 Thread Eric Chamberland via users

Hi,

I am looking around for information about parallel filesystems supported 
for MPI I/O.


Clearly, GFPS, Lustre are fully supported, but what about others?

- CephFS

- pNFS

- Other?

when I "grep" for "pnfs\|cephfs" into ompi source code, I found nothing...

Otherwise I found this into ompi/mca/common/ompio/common_ompio.h :

enum ompio_fs_type
{
    NONE = 0,
    UFS = 1,
    PVFS2 = 2,
    LUSTRE = 3,
    PLFS = 4,
    IME = 5,
    GPFS = 6
};

Does that mean that other fs types (pNFS, CephFS) does not need special 
treatment or are not supported or not optimally supported?


Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42



Re: [OMPI users] Status of pNFS, CephFS and MPI I/O

2021-09-23 Thread Eric Chamberland via users

Thanks for your answer Edgard!

In fact, we are able to use NFS and certainly any POSIX file system on a 
single node basis.


I should have been asking for: What are the supported file systems for 
*multiple nodes* read/write access to files?


For nfs, MPI I/O is known to *not* work on NFS when using multiple nodes 
... except for NFS v3 with "noac" mount option (we are about to test 
with "actimeo=0" option to see if it works).


Btw, is OpenMPI MPI I/O  having some "hidden" (mca?) options to make a 
multiple nodes NFS cluster to work?


Thanks,

Eric

On 2021-09-23 1:57 p.m., Gabriel, Edgar wrote:

Eric,

generally speaking, ompio should be able to operate correctly on all file 
systems that have support for POSIX functions.  The generic ufs component is 
for example being used on  BeeGFS parallel file systems without problems, we 
are using that on a daily basis. For GPFS, the only reason we handle that file 
system separately is because of some custom info objects that can be used to 
configure the file during file_open. If one would not use these info objects 
the generic ufs component would be as good as the GPFS specific component.

Note, the generic ufs component is also being used for NFS, it has logic built 
in to recognize an NFS file system and handle some operations slightly 
differently (but still relying on POSIX functions). The one big exception is 
Lustre: due its different file locking strategy we are required to use a 
different collective I/O component (dynamic_gen2 vs. vulcan). Generic ufs would 
work on Lustre, too, but it would be horribly slow.

I cannot comment on CephFS and pNFS since I do not have access to those file 
systems, it would come down to test them.

Thanks
Edgar


-Original Message-----
From: users  On Behalf Of Eric Chamberland 
via users
Sent: Thursday, September 23, 2021 9:28 AM
To: Open MPI Users 
Cc: Eric Chamberland ; Vivien Clauzon 

Subject: [OMPI users] Status of pNFS, CephFS and MPI I/O

Hi,

I am looking around for information about parallel filesystems supported for 
MPI I/O.

Clearly, GFPS, Lustre are fully supported, but what about others?

- CephFS

- pNFS

- Other?

when I "grep" for "pnfs\|cephfs" into ompi source code, I found nothing...

Otherwise I found this into ompi/mca/common/ompio/common_ompio.h :

enum ompio_fs_type
{
      NONE = 0,
      UFS = 1,
      PVFS2 = 2,
      LUSTRE = 3,
      PLFS = 4,
      IME = 5,
      GPFS = 6
};

Does that mean that other fs types (pNFS, CephFS) does not need special 
treatment or are not supported or not optimally supported?

Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42


--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42



[OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-01 Thread Eric Chamberland via users

Hi,

In the past, we have successfully launched large sized (finite elements) 
computations using PARMetis as mesh partitioner.


It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with 
OpenMPI 3.1.2 that we succeeded.


Today, we have a bunch of nightly (small) tests running nicely and 
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI 
2021.6.


Preparing for launching the same computation we did in 2012, and even 
larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 
4.1.2+ucx-1.11.2 and launched computation from small to large problems 
(meshes).


For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and call 
ParMETIS_V3_PartMeshKway, we always get a segfault with the same 
backtrace pointing into ucx library:


Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut 
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 
359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71) 
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f) 
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123) 
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7) 
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7) 
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c) 
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad) 
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2) 
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92) 
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141) 
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42) 
[0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29) 
[0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106) 
[0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6) 
[0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100) 
[0x2ab3bb0f1ac0]


PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here 
are PETSc configure options:


--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
COPTFLAGS=\"-O2 -march=native\"
CXXOPTFLAGS=\"-O2 -march=native\"
FOPTFLAGS=\"-O2 -march=native\"
--download-fftw=1
--download-hdf5=1
--download-hypre=1
--download-metis=1
--download-mumps=1
--download-parmetis=1
--download-plapack=1
--download-prometheus=1
--download-ptscotch=1
--download-scotch=1
--download-sprng=1
--download-superlu_dist=1
--download-triangle=1
--with-avx512-kernels=1
--with-blaslapack-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-cc=mpicc
--with-cxx=mpicxx
--with-cxx-dialect=C++11
--with-debugging=0
--with-fc=mpifort
--with-mkl_pardiso-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-scalapack=1

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Eric Chamberland via users

Hi Josh,

ok, thanks for the suggestion.  We are in process to test with IntelMPI 
right now.  I hope to do it with a newer version of OpenMPI too.


Do you suggest a minimum version for UCX lib?

Thanks,

Eric

On 2022-06-02 04:05, Josh Hursey via users wrote:

I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
 * https://www.open-mpi.org/software/ompi/v4.1/
 * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html

We fixed some large payload collective issues in that release which 
might be what you are seeing here with MPI_Alltoallv with the tuned 
collective component.




On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users 
 wrote:


Hi Eric,

Yes, UCX is supposed to be stable for large sized problems.

Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and
OMPI-4.1.2 + UCX1.11.2?

Have you also tried to run large sized problems test with OMPI-5.0.x?

Regarding the application, at some point it invokes MPI_Alltoallv
sending more than 2GB to some of the ranks (using derived dt), right?

//WBR, Mikhail

*From:* users  *On Behalf Of
*Eric Chamberland via users
*Sent:* Thursday, June 2, 2022 5:31 AM
*To:* Open MPI Users 
*Cc:* Eric Chamberland ; Thomas
Briffard ; Vivien Clauzon
; dave.mar...@giref.ulaval.ca; Ramses
van Zon ; charles.coulomb...@ulaval.ca
*Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX
library 1.8.0 and 1.11.2 for large sized communications using both
OpenMPI 4.0.3 and 4.1.2

Hi,

In the past, we have successfully launched large sized (finite
elements) computations using PARMetis as mesh partitioner.

It was first in 2012 with OpenMPI (v2.?) and secondly in March
2019 with OpenMPI 3.1.2 that we succeeded.

Today, we have a bunch of nightly (small) tests running nicely and
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and
IntelMPI 2021.6.

Preparing for launching the same computation we did in 2012, and
even larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and
OpenMPI 4.1.2+ucx-1.11.2 and launched computation from small to
large problems (meshes).

For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using
and call ParMETIS_V3_PartMeshKway, we always get a segfault with
the same backtrace pointing into ucx library:

Wed Jun  1 23:04:54
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764
VmHWM: 359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu :
11 :  segmentation violation
Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:--
(Début des informations destinées aux développeurs C++)
--
Wed Jun  1 23:07:07 2022:La pile d'appels contient 27
symboles.
Wed Jun  1 23:07:07 2022:# 000:
reqBacktrace(std::__cxx11::basic_string, std::allocator >&)  >>>  probGD.opt

(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>>
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002:

/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400)
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012:

/scinet/niagara/software/2022a/opt

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Eric Chamberland via users
) 
[0x2aaeb98e2980]
Wed May 25 21:34:02 2022:# 004: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.8.0/lib/libucp.so.0(ucp_dt_pack+0x13b) 
[0x2aaebd3d407b]
Wed May 25 21:34:02 2022:# 005: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.8.0/lib/libucp.so.0(+0x3872a) 
[0x2aaebd3e472a]
Wed May 25 21:34:02 2022:# 006: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.8.0/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd3) 
[0x2aaebd6a4713]
Wed May 25 21:34:02 2022:# 007: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.8.0/lib/libucp.so.0(+0x38ffc) 
[0x2aaebd3e4ffc]
Wed May 25 21:34:02 2022:# 008: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.8.0/lib/libucp.so.0(ucp_tag_send_nbr+0x511) 
[0x2aaebd3f7b91]
Wed May 25 21:34:02 2022:# 009: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/openmpi/4.0.3/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xbb) 
[0x2aaea87132eb]
Wed May 25 21:34:02 2022:# 010: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/openmpi/4.0.3/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x8c) 
[0x2aaeb955d90c]
Wed May 25 21:34:02 2022:# 011: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/openmpi/4.0.3/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x13f) 
[0x2aaeb9562eff]
Wed May 25 21:34:02 2022:# 012: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/openmpi/4.0.3/lib/libmpi.so.40(MPI_Alltoallv+0x1a3) 
[0x2aaeb9511be3]
Wed May 25 21:34:02 2022:# 013: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/petsc-pardiso-64bits/3.17.1/lib/libstrumpack.so(libparmetis__gkMPI_Alltoallv+0x108) 
[0x2aaeb15b1ca8]
Wed May 25 21:34:02 2022:# 014: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/MPI/gcc9/openmpi4/petsc-pardiso-64bits/3.17.1/lib/libpetsc.so.3.17(ParMETIS_V3_Mesh2Dual+0x10af) 
[0x2aaeb081c98f]
Wed May 25 21:34:02 2022:# 015: 
probGD.opt(ParMETIS_V3_PartMeshKway+0x100) [0x432680]



Have you also  tried to run large sized problems test with OMPI-5.0.x?


Not for large problems, only small ones without UCX...  I am not 
compiling my own mpi on Compute Canada clusters, but I do in our lab for 
our nightly validation tests.


Regarding the application, at some point it invokes MPI_Alltoallv 
sending more than 2GB to some of the ranks (using derived dt), right?


I have to track down the very specific call, but I am not sure it is 
sending 2GB to a specific rank but maybe have 2GB divided between many 
rank.  The fact is that this part of the code, when it works, does not 
create such a bump in memory usage...  But I have to dig a bit more...


Regards,

Eric


//WBR, Mikhail

*From:* users  *On Behalf Of *Eric 
Chamberland via users

*Sent:* Thursday, June 2, 2022 5:31 AM
*To:* Open MPI Users 
*Cc:* Eric Chamberland ; Thomas 
Briffard ; Vivien Clauzon 
; dave.mar...@giref.ulaval.ca; Ramses van 
Zon ; charles.coulomb...@ulaval.ca
*Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX 
library 1.8.0 and 1.11.2 for large sized communications using both 
OpenMPI 4.0.3 and 4.1.2


Hi,

In the past, we have successfully launched large sized (finite 
elements) computations using PARMetis as mesh partitioner.


It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 
with OpenMPI 3.1.2 that we succeeded.


Today, we have a bunch of nightly (small) tests running nicely and 
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and 
IntelMPI 2021.6.


Preparing for launching the same computation we did in 2012, and even 
larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 
4.1.2+ucx-1.11.2 and launched computation from small to large problems 
(meshes).


For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and 
call ParMETIS_V3_PartMeshKway, we always get a segfault with the same 
backtrace pointing into ucx library:


Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut 
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 
359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71) 
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-10 Thread Eric Chamberland via users

Hi,

to give further information about this problem... it seems not related 
to MPI or UCX at all but seems to come from ParMETIS itself...


With ParMETIS installed from SPACK, with "+int64" option,  I have been 
able to use both OpenMPI 4.1.2 and IntelMPI 2021.6 successfully!


With ParMETIS installed by PETSc, with "--with-64-bit-indices=1" option, 
all MPI implementations listed later do not work.


I've opened an issue at Petsc here: 
https://gitlab.com/petsc/petsc/-/issues/1204#note_980344101


So, sorry for disturbing MPI guys here...

Thanks for all suggestions!

Eric

On 2022-06-01 23:31, Eric Chamberland via users wrote:


Hi,

In the past, we have successfully launched large sized (finite 
elements) computations using PARMetis as mesh partitioner.


It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 
with OpenMPI 3.1.2 that we succeeded.


Today, we have a bunch of nightly (small) tests running nicely and 
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and 
IntelMPI 2021.6.


Preparing for launching the same computation we did in 2012, and even 
larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 
4.1.2+ucx-1.11.2 and launched computation from small to large problems 
(meshes).


For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and 
call ParMETIS_V3_PartMeshKway, we always get a segfault with the same 
backtrace pointing into ucx library:


Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut 
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 
359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71) 
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f) 
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123) 
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7) 
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7) 
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c) 
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad) 
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2) 
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92) 
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141) 
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42) 
[0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29) 
[0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106) 
[0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6) 
[0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100) 
[0x2ab3bb0f1ac0]


PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here 
are PETSc configure options:


--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
COPTFLAGS=\"-O2 -march=na

Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS

2022-06-11 Thread Eric Chamberland via users

Hi,

just almost found what I wanted with "--mca io_base_verbose100"

Now I am looking at performances for GPFS and I must say OpenMPI 4.1.2 
performs very poorly when it comes the time to write.


I am launching a 512 processes, read+compute (ghosts components of a 
mesh), and then later write a 79Gb file.


Here are the timings (all in seconds):



IO module ;  reading+ghost computing ; writing

ompio   ;   24.9   ;2040+ (job got killed before completion)

romio321 ;  20.8    ; 15.6



I have run many times the job with Ompio module (the default) and Romio 
and the timings are always similar to those given.


I also activated maximum debug output with " --mca mca_base_verbose 
stdout,level:9  --mca mpi_show_mca_params all --mca io_base_verbose100" 
and got a few lines but nothing relevant to debug:


Sat Jun 11 20:08:28 2022:chrono::ecritMaillageMPI::debut VmSize: 
6530408 VmRSS: 5599604 VmPeak: 7706396 VmData: 5734408 VmHWM: 5699324 

Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:delete: deleting file: resultat01_-2.mail
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:delete: Checking all available modules
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:delete: component available: ompio, priority: 30
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:delete: component available: romio321, priority: 10
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:delete: Selected io component ompio
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: new file: resultat01_-2.mail
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: Checking all available modules
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: component available: ompio, priority: 30
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: component available: romio321, priority: 10
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: Selected io module ompio


What else can I do to dig into this?

Are there parameters ompio is aware of with GPFS?

Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42

On 2022-06-10 16:23, Eric Chamberland via users wrote:

Hi,

I want to try romio with OpenMPI 4.1.2 because I am observing a big 
performance difference with IntelMPI on GPFS.


I want to see, at *runtime*, all parameters (default values, names) 
used by MPI (at least for the "io" framework).


I would like to have all the same output as "ompi_info --all" gives me...

I have tried this:

mpiexec --mca io romio321  --mca mca_verbose 1  --mca 
mpi_show_mca_params 1 --mca io_base_verbose 1 ...


But I cannot see anything about io coming out...

With "ompi_info" I do...

Is it possible?

Thanks,

Eric



--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42


[OMPI users] MPI I/O, ROMIO and showing io mca parameters at run-time

2022-06-10 Thread Eric Chamberland via users

Hi,

I want to try romio with OpenMPI 4.1.2 because I am observing a big 
performance difference with IntelMPI on GPFS.


I want to see, at *runtime*, all parameters (default values, names) used 
by MPI (at least for the "io" framework).


I would like to have all the same output as "ompi_info --all" gives me...

I have tried this:

mpiexec --mca io romio321  --mca mca_verbose 1  --mca 
mpi_show_mca_params 1 --mca io_base_verbose 1 ...


But I cannot see anything about io coming out...

With "ompi_info" I do...

Is it possible?

Thanks,

Eric


--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42



[OMPI users] CephFS and striping_factor

2022-11-28 Thread Eric Chamberland via users

Hi,

I would like to know if OpenMPI is supporting file creation with 
"striping_factor" for CephFS?


According to CephFS library, I *think* it would be possible to do it at 
file creation with "ceph_open_layout".


https://github.com/ceph/ceph/blob/main/src/include/cephfs/libcephfs.h

Is it a possible futur enhancement?

Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42



[OMPI users] How to force striping_factor (on lustre or other FS)?

2022-11-25 Thread Eric Chamberland via users

Hi,

In 2012 we wrote and tested our functions to use MPI I/O to have good 
performances while doing I/O on a Lustre filesystem. Everything was fine 
about "striping_factor" we passed to file creation.


Now I am trying to verify some performance degradation we observed and I 
am surprised because it looks like I am unable to create a new file with 
a given "striping_factor" with any mpi flavor.


I attached a simple example for file creation with hints, and tried it 
the following way with OpenMPI:


OpenMPI-4.0.3:

which mpicc
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/openmpi/4.0.3/bin/mpicc

mpicc -o s simple_file_create_with_hint.c

rm -f foo && mpiexec -n 1 --mca io romio321 ./s foo && lfs getstripe foo

Creating the file by MPI_file_open : foo
Informations on file:
Key is 'striping_factor' and worth: '2'
 ...closing the file foo
Informations on file:
foo
lmm_stripe_count:  1
lmm_stripe_size:   1048576
lmm_pattern:   raid0
lmm_layout_gen:    0
lmm_stripe_offset: 49
   obdidx   objid   objid   group
   49   485863591 0x1cf5b0a7    0

#Not forcing romio321:

rm -f foo && mpiexec -n 1 ./s foo && lfs getstripe foo


Creating the file by MPI_file_open : foo
Informations on file:
Key is 'striping_factor' and worth: '2'
 ...closing the file foo
Informations on file:
foo
lmm_stripe_count:  1
lmm_stripe_size:   1048576
lmm_pattern:   raid0
lmm_layout_gen:    0
lmm_stripe_offset: 19
   obdidx   objid   objid   group
   19   482813449 0x1cc72609    0

First, as you can see, even if I ask for a striping_factor of 2, I only 
get one!  I tried to write some data too, but it changed nothing...


Where am I wrong?

Second, I was expecting that when I re-open the file for read-only, I 
would have some information in "MPI_Info" but it is empty... is that normal?


For example, using mpich-3.2.1 I have the following output:

which mpicc
/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/gcc7.3/mpich/3.2.1/bin/mpicc 



mpicc -o s simple_file_create_with_hint.c


rm -f foo && mpiexec -n 1 ./s foo && lfs getstripe foo

Creating the file by MPI_file_open : foo
Informations on file:
Key is 'cb_buffer_size' and worth: '16777216'
Key is 'romio_cb_read' and worth: 'automatic'
Key is 'romio_cb_write' and worth: 'automatic'
Key is 'cb_nodes' and worth: '1'
Key is 'romio_no_indep_rw' and worth: 'false'
Key is 'romio_cb_pfr' and worth: 'disable'
Key is 'romio_cb_fr_types' and worth: 'aar'
Key is 'romio_cb_fr_alignment' and worth: '1'
Key is 'romio_cb_ds_threshold' and worth: '0'
Key is 'romio_cb_alltoall' and worth: 'automatic'
Key is 'ind_rd_buffer_size' and worth: '4194304'
Key is 'ind_wr_buffer_size' and worth: '524288'
Key is 'romio_ds_read' and worth: 'automatic'
Key is 'romio_ds_write' and worth: 'automatic'
Key is 'cb_config_list' and worth: '*:1'
Key is 'romio_filesystem_type' and worth: 'UFS: Generic ROMIO driver for 
all UNIX-like file systems'

Key is 'romio_aggregator_list' and worth: '0 '
 ...closing the file foo
Informations on file:
Key is 'cb_buffer_size' and worth: '16777216'
Key is 'romio_cb_read' and worth: 'automatic'
Key is 'romio_cb_write' and worth: 'automatic'
Key is 'cb_nodes' and worth: '1'
Key is 'romio_no_indep_rw' and worth: 'false'
Key is 'romio_cb_pfr' and worth: 'disable'
Key is 'romio_cb_fr_types' and worth: 'aar'
Key is 'romio_cb_fr_alignment' and worth: '1'
Key is 'romio_cb_ds_threshold' and worth: '0'
Key is 'romio_cb_alltoall' and worth: 'automatic'
Key is 'ind_rd_buffer_size' and worth: '4194304'
Key is 'ind_wr_buffer_size' and worth: '524288'
Key is 'romio_ds_read' and worth: 'automatic'
Key is 'romio_ds_write' and worth: 'automatic'
Key is 'cb_config_list' and worth: '*:1'
Key is 'romio_filesystem_type' and worth: 'UFS: Generic ROMIO driver for 
all UNIX-like file systems'

Key is 'romio_aggregator_list' and worth: '0 '
foo
lmm_stripe_count:  1
lmm_stripe_size:   1048576
lmm_pattern:   raid0
lmm_layout_gen:    0
lmm_stripe_offset: 67
   obdidx   objid   objid   group
   67   357367195 0x154cfd9b    0

but still have only a striping_factor of 1 on the created file...

Thanks,

Eric

--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
#include "mpi.h"
#include 
#include 

/*
 * Simple function to abort execution and print an error based on MPI return values:
*/
void abortOnError(int ierr) {
  if (ierr != MPI_SUCCESS) {
printf("ERROR Returned by MPI: %d\n",ierr);
char* lCharPtr = (char*) malloc(sizeof(char)*MPI_MAX_ERROR_STRING);
int lLongueur = 0;
MPI_Error_string(ierr,lCharPtr, );
printf("ERROR_string Returned by MPI: %s\n",lCharPtr);
free(lCharPtr);
MPI_Abort( MPI_COMM_WORLD, ierr );
  }
}

/*
 * Here I use only the first hint for now, but you can try a few to see the difference: