Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi Howard and Michael, thanks for your feedback. I did not want to write a toot long mail with non pertinent information so I just show how the two different builds give different result. I'm using a small test case based on my large code, the same used to show the memory leak with mpi_Alltoallv

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
t expect 4007 but it fails too. Patrick Le 25/01/2021 à 19:34, Ralph Castain via users a écrit : > I think you mean add "--mca mtl ofi" to the mpirun cmd line > > >> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users >> wrote: >> >> What

[OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-25 Thread Patrick Begou via users
Hi, I'm trying to deploy OpenMPI 4.0.5 on the university's supercomputer: * Debian GNU/Linux 9 (stretch) * Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) and for several days I have a bug (wrong results using MPI_AllToAllW) on this server when using OmniPath.

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Patrick Begou via users
e if it's supposed to stop at some point > > i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, > without-{psm,ucx,verbs} > > On Tue, Jan 26, 2021 at 3:44 PM Patrick Begou via users > wrote: > > > > Hi Michael > >

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
at reproduces > the problem? I can’t think of another way I can give you more help > without being able to see what’s going on. It’s always possible > there’s a bug in the PSM2 MTL but it would be surprising at this point. > > Sent from my iPad > >> On Jan 26, 2021, at 1:1

Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path

2021-01-26 Thread Patrick Begou via users
Hi all, I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged with Nix was running using openib. So I add the --with-verbs option to setup this module. That I can see now is that: mpirun -hostfile $OAR_NODEFILE *--mca mtl psm -mca btl_openib_allow_ib true* -

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-02-08 Thread Patrick Begou via users
help > > if i had to guess totally pulling junk from the air, there's probably > something incompatible with PSM and OPA when running specifically on debian > (likely due to library versioning). i don't know how common that is, so it's > not clear how flushed out and tested it is

[OMPI users] Need help for troubleshooting OpenMPI performances

2022-02-28 Thread Patrick Begou via users
Hi, I meet a performance problem with OpenMPI on my cluster. In some situation my parallel code is really slow (same binary running on a different mesh). To investigate, the fortran code code is built with profiling option (mpifort -p -O3.) and launched on 91 cores. One mon.out file

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-03-24 Thread Patrick Begou via users
Le 28/02/2022 à 17:56, Patrick Begou via users a écrit : Hi, I meet a performance problem with OpenMPI on my cluster. In some situation my parallel code is really slow (same binary running on a different mesh). To investigate, the fortran code code is built with profiling option (mpifort

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-04-07 Thread Patrick Begou via users
btl? If the former, is it built with multi threading support? If the latter, I suggest you give UCX - built with multi threading support - a try and see how it goes Cheers, Gilles On Thu, Mar 24, 2022 at 5:43 PM Patrick Begou via users wrote: Le 28/02/2022 à 17:56, Patrick Begou via

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-21 Thread Patrick Begou via users
G6VadQJ@univ-grenoble-alpes.fr] Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> From: users<mailto:users-boun...@lists.open-mp

[OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users
that is occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we