Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-10 Thread Eric Chamberland via users

Hi,

to give further information about this problem... it seems not related 
to MPI or UCX at all but seems to come from ParMETIS itself...


With ParMETIS installed from SPACK, with "+int64" option,  I have been 
able to use both OpenMPI 4.1.2 and IntelMPI 2021.6 successfully!


With ParMETIS installed by PETSc, with "--with-64-bit-indices=1" option, 
all MPI implementations listed later do not work.


I've opened an issue at Petsc here: 
https://gitlab.com/petsc/petsc/-/issues/1204#note_980344101


So, sorry for disturbing MPI guys here...

Thanks for all suggestions!

Eric

On 2022-06-01 23:31, Eric Chamberland via users wrote:


Hi,

In the past, we have successfully launched large sized (finite 
elements) computations using PARMetis as mesh partitioner.


It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 
with OpenMPI 3.1.2 that we succeeded.


Today, we have a bunch of nightly (small) tests running nicely and 
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and 
IntelMPI 2021.6.


Preparing for launching the same computation we did in 2012, and even 
larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 
4.1.2+ucx-1.11.2 and launched computation from small to large problems 
(meshes).


For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and 
call ParMETIS_V3_PartMeshKway, we always get a segfault with the same 
backtrace pointing into ucx library:


Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut 
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 
359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71) 
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f) 
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123) 
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7) 
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7) 
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c) 
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad) 
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2) 
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92) 
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141) 
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42) 
[0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29) 
[0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106) 
[0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6) 
[0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100) 
[0x2ab3bb0f1ac0]


PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here 
are PETSc configure options:


--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
COPTFLAGS=\"-O2 -march=native\"
CXXOPTFLAGS=\"-O2 -march=native\"

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-05 Thread Jeff Hammond via users
Alltoallv has both a large count and large displacement problem in the API.
You can work around the latter by using neighborhood alltoallv using a
duplicate of your original communicator that’s neighborhood compatible.
Neighborhood collectives use MPI_Aint displacements instead of int.

If you need tests,
https://github.com/jeffhammond/BigMPI test suite is nothing but large count
MPI calls using derived data types.

Jeff

On Thu 2. Jun 2022 at 22.28 Eric Chamberland via users <
users@lists.open-mpi.org> wrote:

> Hi Josh,
>
> ok, thanks for the suggestion.  We are in process to test with IntelMPI
> right now.  I hope to do it with a newer version of OpenMPI too.
>
> Do you suggest a minimum version for UCX lib?
>
> Thanks,
>
> Eric
> On 2022-06-02 04:05, Josh Hursey via users wrote:
>
> I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
>  * https://www.open-mpi.org/software/ompi/v4.1/
>  * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html
>
> We fixed some large payload collective issues in that release which might
> be what you are seeing here with MPI_Alltoallv with the tuned collective
> component.
>
>
>
> On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users <
> users@lists.open-mpi.org> wrote:
>
>> Hi Eric,
>>
>>
>>
>> Yes, UCX is supposed to be stable for large sized problems.
>>
>> Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and
>> OMPI-4.1.2 + UCX1.11.2?
>>
>> Have you also tried to run large sized problems test with OMPI-5.0.x?
>>
>> Regarding the application, at some point it invokes MPI_Alltoallv sending
>> more than 2GB to some of the ranks (using derived dt), right?
>>
>>
>>
>> //WBR, Mikhail
>>
>>
>>
>> *From:* users  *On Behalf Of *Eric
>> Chamberland via users
>> *Sent:* Thursday, June 2, 2022 5:31 AM
>> *To:* Open MPI Users 
>> *Cc:* Eric Chamberland ; Thomas
>> Briffard ; Vivien Clauzon <
>> vivien.clau...@michelin.com>; dave.mar...@giref.ulaval.ca; Ramses van
>> Zon ; charles.coulomb...@ulaval.ca
>> *Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX
>> library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI
>> 4.0.3 and 4.1.2
>>
>>
>>
>> Hi,
>>
>> In the past, we have successfully launched large sized (finite elements)
>> computations using PARMetis as mesh partitioner.
>>
>> It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with
>> OpenMPI 3.1.2 that we succeeded.
>>
>> Today, we have a bunch of nightly (small) tests running nicely and
>> testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI
>> 2021.6.
>>
>> Preparing for launching the same computation we did in 2012, and even
>> larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI
>> 4.1.2+ucx-1.11.2 and launched computation from small to large problems
>> (meshes).
>>
>> For small meshes, it goes fine.
>>
>> But when we reach near 2^31 faces into the 3D mesh we are using and call
>> ParMETIS_V3_PartMeshKway, we always get a segfault with the same backtrace
>> pointing into ucx library:
>>
>> Wed Jun  1 23:04:54
>> 2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut
>> VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 359012
>> 
>> Wed Jun  1 23:07:07 2022:Erreur:  MEF++ Signal recu : 11 :
>>  segmentation violation
>> Wed Jun  1 23:07:07 2022:Erreur:
>> Wed Jun  1 23:07:07 2022:-- (Début
>> des informations destinées aux développeurs C++)
>> --
>> Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
>> Wed Jun  1 23:07:07 2022:# 000:
>> reqBacktrace(std::__cxx11::basic_string,
>> std::allocator >&)  >>>  probGD.opt
>> (probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
>> [0x4119f1])
>> Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>>
>>  probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
>> Wed Jun  1 23:07:07 2022:# 002:
>> /gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
>> [0x2ab3aef0e5cf]
>> Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400)
>> [0x2ab3bd59a400]
>> Wed Jun  1 23:07:07 2022:# 004:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
>> [0x2ab3c966e353]
>> Wed Jun  1 23:07:07 2022:# 005:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
>> [0x2ab3c968d6b7]
>> Wed Jun  1 23:07:07 2022:# 006:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
>> [0x2ab3ca712137]
>> Wed Jun  1 23:07:07 2022:# 007:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
>> [0x2ab3c968cd3c]
>> Wed Jun  1 23:07:07 2022:# 008:
>> /scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
>> [0x2ab3c9696dcd]
>> Wed Jun  1 23:07:07 2022:# 009:
>> 

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Eric Chamberland via users

Hi Josh,

ok, thanks for the suggestion.  We are in process to test with IntelMPI 
right now.  I hope to do it with a newer version of OpenMPI too.


Do you suggest a minimum version for UCX lib?

Thanks,

Eric

On 2022-06-02 04:05, Josh Hursey via users wrote:

I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
 * https://www.open-mpi.org/software/ompi/v4.1/
 * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html

We fixed some large payload collective issues in that release which 
might be what you are seeing here with MPI_Alltoallv with the tuned 
collective component.




On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users 
 wrote:


Hi Eric,

Yes, UCX is supposed to be stable for large sized problems.

Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and
OMPI-4.1.2 + UCX1.11.2?

Have you also tried to run large sized problems test with OMPI-5.0.x?

Regarding the application, at some point it invokes MPI_Alltoallv
sending more than 2GB to some of the ranks (using derived dt), right?

//WBR, Mikhail

*From:* users  *On Behalf Of
*Eric Chamberland via users
*Sent:* Thursday, June 2, 2022 5:31 AM
*To:* Open MPI Users 
*Cc:* Eric Chamberland ; Thomas
Briffard ; Vivien Clauzon
; dave.mar...@giref.ulaval.ca; Ramses
van Zon ; charles.coulomb...@ulaval.ca
*Subject:* [OMPI users] Segfault in ucp_dt_pack function from UCX
library 1.8.0 and 1.11.2 for large sized communications using both
OpenMPI 4.0.3 and 4.1.2

Hi,

In the past, we have successfully launched large sized (finite
elements) computations using PARMetis as mesh partitioner.

It was first in 2012 with OpenMPI (v2.?) and secondly in March
2019 with OpenMPI 3.1.2 that we succeeded.

Today, we have a bunch of nightly (small) tests running nicely and
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and
IntelMPI 2021.6.

Preparing for launching the same computation we did in 2012, and
even larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and
OpenMPI 4.1.2+ucx-1.11.2 and launched computation from small to
large problems (meshes).

For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using
and call ParMETIS_V3_PartMeshKway, we always get a segfault with
the same backtrace pointing into ucx library:

Wed Jun  1 23:04:54
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764
VmHWM: 359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu :
11 :  segmentation violation
Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:--
(Début des informations destinées aux développeurs C++)
--
Wed Jun  1 23:07:07 2022:La pile d'appels contient 27
symboles.
Wed Jun  1 23:07:07 2022:# 000:
reqBacktrace(std::__cxx11::basic_string, std::allocator >&)  >>>  probGD.opt

(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>>
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002:

/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400)
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011:

/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012:


Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Eric Chamberland via users

Hi Mikhail,


On 2022-06-02 02:51, Mikhail Brinskii wrote:


Hi Eric,

Yes, UCX is supposed to be stable for large sized problems.

Since I am highly aware about what it implies to deliver and deploy 
tested and verified software, I have headaches here because the 
validation toolchain required for an MPI library (or component of) to 
testify use cases with large scale computation shall requires 
large-scale hardware and large data-set... which is not given to 
everybody...


So, how can be very large use cases be tested, nighlty or in a CI, for 
libraries like UCX or MPI itself?  And by curiosity, how is it done for UCX?


Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and 
OMPI-4.1.2 + UCX1.11.2?


Yep!  Exactly at the same place, here is the stack for UCX 1.9.0 and 
OpenMPI 4.1.1:



Fri May 27 21:23:44 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Fri May 27 21:23:44 2022:Erreur    :
Fri May 27 21:23:44 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Fri May 27 21:23:44 2022:La pile d'appels contient 27 symboles.
Fri May 27 21:23:44 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x42) 
[0x411942])
Fri May 27 21:23:44 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x2a1) [0x4137b1])
Fri May 27 21:23:44 2022:# 002: 
/gpfs/fs0/project/d/deteix/MEF++_petscGIREF_64bits/avx2/bin/../lib/libgiref_opt_Util.so(traitementSignal+0x1fef) 
[0x2b1c7a27017f]
Fri May 27 21:23:44 2022:# 003: 
/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libc.so.6(+0x38980) 
[0x2b1c872fb980]
Fri May 27 21:23:44 2022:# 004: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/lib/libucp.so.0(ucp_dt_pack+0x13e) 
[0x2b1c8ae558fe]
Fri May 27 21:23:44 2022:# 005: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/lib/libucp.so.0(+0x2cc10) 
[0x2b1c8ae5ec10]
Fri May 27 21:23:44 2022:# 006: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xbd) 
[0x2b1c8b0c163d]
Fri May 27 21:23:44 2022:# 007: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/lib/libucp.so.0(+0x2c557) 
[0x2b1c8ae5e557]
Fri May 27 21:23:44 2022:# 008: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/lib/libucp.so.0(ucp_tag_send_nbx+0x34d) 
[0x2b1c8ae696ad]
Fri May 27 21:23:44 2022:# 009: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc10/openmpi/4.1.1/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xea) 
[0x2b1c8ae2489a]
Fri May 27 21:23:44 2022:# 010: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc10/openmpi/4.1.1/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x94) 
[0x2b1c86c86cb4]
Fri May 27 21:23:44 2022:# 011: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc10/openmpi/4.1.1/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x145) 
[0x2b1c86c8ccd5]
Fri May 27 21:23:44 2022:# 012: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc10/openmpi/4.1.1/lib/libmpi.so.40(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42) 
[0x2b1c86c976e2]
Fri May 27 21:23:44 2022:# 013: 
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc10/openmpi/4.1.1/lib/libmpi.so.40(MPI_Alltoallv+0x1a3) 
[0x2b1c86c3a043]
Fri May 27 21:23:44 2022:# 014: 
/gpfs/fs0/project/d/deteix/petsc-3.17.1_ompi-4.1.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x111) 
[0x2b1c86661211]
Fri May 27 21:23:44 2022:# 015: 
/gpfs/fs0/project/d/deteix/petsc-3.17.1_ompi-4.1.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0x10b9) 
[0x2b1c86674399]
Fri May 27 21:23:44 2022:# 016: 
/gpfs/fs0/project/d/deteix/petsc-3.17.1_ompi-4.1.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100) 
[0x2b1c86674e10]


And for OpenMPI 4.0.3 with UCX 1.9.0:

Wed May 25 21:34:02 2022:Erreur    :  MEF++ Signal recu : 11 : 
segmentation violation

Wed May 25 21:34:02 2022:Erreur    :
Wed May 25 21:34:02 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed May 25 21:34:02 2022:La pile d'appels contient 26 symboles.
Wed May 25 21:34:02 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x42) 
[0x411a42])
Wed May 25 21:34:02 2022:# 001: attacheDebugger()  >>>  
probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x287) [0x4137b7])
Wed May 25 21:34:02 2022:# 002: 
/gpfs/fs0/project/d/deteix/MEF++_64bits/avx2/bin/../lib/libgiref_opt_Util.so(traitementSignal+0x1e07) 
[0x2aaeaea82cb7]
Wed May 25 21:34:02 2022:# 003: 
/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libc.so.6(+0x38980) 
[0x2aaeb98e2980]
Wed May 25 21:34:02 2022:# 004: 

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Josh Hursey via users
I would suggest trying OMPI v4.1.4 (or the v5 snapshot)
 * https://www.open-mpi.org/software/ompi/v4.1/ 
 
 * https://www.mail-archive.com/announce@lists.open-mpi.org//msg00152.html 
 

We fixed some large payload collective issues in that release which might be 
what you are seeing here with MPI_Alltoallv with the tuned collective component.



On Thu, Jun 2, 2022 at 1:54 AM Mikhail Brinskii via users 
mailto:users@lists.open-mpi.org> > wrote:
Hi Eric,

 

Yes, UCX is supposed to be stable for large sized problems.

Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and OMPI-4.1.2 + 
UCX1.11.2?

Have you also tried to run large sized problems test with OMPI-5.0.x?

Regarding the application, at some point it invokes MPI_Alltoallv sending more 
than 2GB to some of the ranks (using derived dt), right?

 

//WBR, Mikhail

 

From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Eric Chamberland via 
users
Sent: Thursday, June 2, 2022 5:31 AM
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Eric Chamberland mailto:eric.chamberl...@giref.ulaval.ca> >; Thomas Briffard 
mailto:thomas.briff...@michelin.com> >; Vivien 
Clauzon mailto:vivien.clau...@michelin.com> >; 
dave.mar...@giref.ulaval.ca  ; Ramses van 
Zon mailto:r...@scinet.utoronto.ca> >; 
charles.coulomb...@ulaval.ca  
Subject: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 
and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

 

Hi,

In the past, we have successfully launched large sized (finite elements) 
computations using PARMetis as mesh partitioner.

It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with 
OpenMPI 3.1.2 that we succeeded.

Today, we have a bunch of nightly (small) tests running nicely and testing all 
of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI 2021.6.

Preparing for launching the same computation we did in 2012, and even larger 
ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 4.1.2+ucx-1.11.2 
and launched computation from small to large problems (meshes).

For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and call 
ParMETIS_V3_PartMeshKway, we always get a segfault with the same backtrace 
pointing into ucx library:

Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut VmSize: 
1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 359012  
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 :  
segmentation violation  
Wed Jun  1 23:07:07 2022:Erreur    :   
Wed Jun  1 23:07:07 2022:-- (Début des 
informations destinées aux développeurs C++) --
Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles. 
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
 [0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>>  probGD.opt 
(probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
 [0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
 [0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
 [0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
 [0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
 [0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
 [0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
 [0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
 [0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
 [0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)
 

Re: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-02 Thread Mikhail Brinskii via users
Hi Eric,

Yes, UCX is supposed to be stable for large sized problems.
Did you see the same crash with both OMPI-4.0.3 + UCX 1.8.0 and OMPI-4.1.2 + 
UCX1.11.2?
Have you also tried to run large sized problems test with OMPI-5.0.x?
Regarding the application, at some point it invokes MPI_Alltoallv sending more 
than 2GB to some of the ranks (using derived dt), right?

//WBR, Mikhail

From: users  On Behalf Of Eric Chamberland 
via users
Sent: Thursday, June 2, 2022 5:31 AM
To: Open MPI Users 
Cc: Eric Chamberland ; Thomas Briffard 
; Vivien Clauzon ; 
dave.mar...@giref.ulaval.ca; Ramses van Zon ; 
charles.coulomb...@ulaval.ca
Subject: [OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 
and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2


Hi,

In the past, we have successfully launched large sized (finite elements) 
computations using PARMetis as mesh partitioner.

It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with 
OpenMPI 3.1.2 that we succeeded.

Today, we have a bunch of nightly (small) tests running nicely and testing all 
of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI 2021.6.

Preparing for launching the same computation we did in 2012, and even larger 
ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 4.1.2+ucx-1.11.2 
and launched computation from small to large problems (meshes).

For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and call 
ParMETIS_V3_PartMeshKway, we always get a segfault with the same backtrace 
pointing into ucx library:

Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut VmSize: 
1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 359012 
Wed Jun  1 23:07:07 2022:Erreur:  MEF++ Signal recu : 11 :  
segmentation violation
Wed Jun  1 23:07:07 2022:Erreur:
Wed Jun  1 23:07:07 2022:-- (Début des 
informations destinées aux développeurs C++) --
Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71)
 [0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>>  probGD.opt 
(probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f)
 [0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123)
 [0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7)
 [0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7)
 [0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c)
 [0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad)
 [0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2)
 [0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92)
 [0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141)
 [0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)
 [0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29)
 [0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106)
 [0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6)
 [0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100)
 [0x2ab3bb0f1ac0]

PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here are 
PETSc configure options:


[OMPI users] Segfault in ucp_dt_pack function from UCX library 1.8.0 and 1.11.2 for large sized communications using both OpenMPI 4.0.3 and 4.1.2

2022-06-01 Thread Eric Chamberland via users

Hi,

In the past, we have successfully launched large sized (finite elements) 
computations using PARMetis as mesh partitioner.


It was first in 2012 with OpenMPI (v2.?) and secondly in March 2019 with 
OpenMPI 3.1.2 that we succeeded.


Today, we have a bunch of nightly (small) tests running nicely and 
testing all of OpenMPI (4.0.x, 4.1.x and 5.0x), MPICH-3.3.2 and IntelMPI 
2021.6.


Preparing for launching the same computation we did in 2012, and even 
larger ones, we compiled with bot OpenMPI 4.0.3+ucx-1.8.0 and OpenMPI 
4.1.2+ucx-1.11.2 and launched computation from small to large problems 
(meshes).


For small meshes, it goes fine.

But when we reach near 2^31 faces into the 3D mesh we are using and call 
ParMETIS_V3_PartMeshKway, we always get a segfault with the same 
backtrace pointing into ucx library:


Wed Jun  1 23:04:54 
2022:chrono::InterfaceParMetis::ParMETIS_V3_PartMeshKway::debut 
VmSize: 1202304 VmRSS: 349456 VmPeak: 1211736 VmData: 500764 VmHWM: 
359012 
Wed Jun  1 23:07:07 2022:Erreur    :  MEF++ Signal recu : 11 : 
 segmentation violation

Wed Jun  1 23:07:07 2022:Erreur    :
Wed Jun  1 23:07:07 2022:-- (Début 
des informations destinées aux développeurs C++) 
--

Wed Jun  1 23:07:07 2022:La pile d'appels contient 27 symboles.
Wed Jun  1 23:07:07 2022:# 000: 
reqBacktrace(std::__cxx11::basic_string, 
std::allocator >&)  >>>  probGD.opt 
(probGD.opt(_Z12reqBacktraceRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x71) 
[0x4119f1])
Wed Jun  1 23:07:07 2022:# 001: attacheDebugger()  >>> 
 probGD.opt (probGD.opt(_Z15attacheDebuggerv+0x29a) [0x41386a])
Wed Jun  1 23:07:07 2022:# 002: 
/gpfs/fs0/project/d/deteix/ericc/GIREF/lib/libgiref_opt_Util.so(traitementSignal+0x1f9f) 
[0x2ab3aef0e5cf]
Wed Jun  1 23:07:07 2022:# 003: /lib64/libc.so.6(+0x36400) 
[0x2ab3bd59a400]
Wed Jun  1 23:07:07 2022:# 004: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_dt_pack+0x123) 
[0x2ab3c966e353]
Wed Jun  1 23:07:07 2022:# 005: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x536b7) 
[0x2ab3c968d6b7]
Wed Jun  1 23:07:07 2022:# 006: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/ucx/libuct_ib.so.0(uct_dc_mlx5_ep_am_bcopy+0xd7) 
[0x2ab3ca712137]
Wed Jun  1 23:07:07 2022:# 007: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(+0x52d3c) 
[0x2ab3c968cd3c]
Wed Jun  1 23:07:07 2022:# 008: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/ucx/1.11.2/lib/libucp.so.0(ucp_tag_send_nbx+0x5ad) 
[0x2ab3c9696dcd]
Wed Jun  1 23:07:07 2022:# 009: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0xf2) 
[0x2ab3c922e0b2]
Wed Jun  1 23:07:07 2022:# 010: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0x92) 
[0x2ab3bbca5a32]
Wed Jun  1 23:07:07 2022:# 011: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(ompi_coll_base_alltoallv_intra_pairwise+0x141) 
[0x2ab3bbcad941]
Wed Jun  1 23:07:07 2022:# 012: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42) 
[0x2ab3d4836da2]
Wed Jun  1 23:07:07 2022:# 013: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0/openmpi/4.1.2+ucx-1.11.2/lib/libmpi.so.40(PMPI_Alltoallv+0x29) 
[0x2ab3bbc7bdf9]
Wed Jun  1 23:07:07 2022:# 014: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(libparmetis__gkMPI_Alltoallv+0x106) 
[0x2ab3bb0e1c06]
Wed Jun  1 23:07:07 2022:# 015: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_Mesh2Dual+0xdd6) 
[0x2ab3bb0f10b6]
Wed Jun  1 23:07:07 2022:# 016: 
/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1/lib/libparmetis.so(ParMETIS_V3_PartMeshKway+0x100) 
[0x2ab3bb0f1ac0]


PARMetis is compiled as part of PETSc-3.17.1 with 64bit indices.  Here 
are PETSc configure options:


--prefix=/scinet/niagara/software/2022a/opt/gcc-11.2.0-openmpi-4.1.2+ucx-1.11.2/petsc-64bits/3.17.1
COPTFLAGS=\"-O2 -march=native\"
CXXOPTFLAGS=\"-O2 -march=native\"
FOPTFLAGS=\"-O2 -march=native\"
--download-fftw=1
--download-hdf5=1
--download-hypre=1
--download-metis=1
--download-mumps=1
--download-parmetis=1
--download-plapack=1
--download-prometheus=1
--download-ptscotch=1
--download-scotch=1
--download-sprng=1
--download-superlu_dist=1
--download-triangle=1
--with-avx512-kernels=1
--with-blaslapack-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-cc=mpicc
--with-cxx=mpicxx
--with-cxx-dialect=C++11
--with-debugging=0
--with-fc=mpifort
--with-mkl_pardiso-dir=/scinet/intel/oneapi/2021u4/mkl/2021.4.0
--with-scalapack=1