[OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-03 Thread Angel de Vicente via users
Hi, in one of our codes, we want to create a log of events that happen in the MPI processes, where the number of these events and their timing is unpredictable. So I implemented a simple test code, where process 0 creates a thread that is just busy-waiting for messages from any process, and

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-06 Thread Angel de Vicente via users
Hi, Joshua Ladd writes: > This is an ancient version of HCOLL. Please upgrade to the latest > version (you can do this by installing HPC-X > https://www.mellanox.com/products/hpc-x-toolkit) Just to close the circle and inform that all seems OK now. I don't have root permission in this

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-05 Thread Angel de Vicente via users
Hi, Joshua Ladd writes: > We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it > takes exactly the same 19 secs (80 ranks).  > > What version of HCOLL are you using? Command line? Thanks for having a look at this. According to ompi_info, our OpenMPI (version 3.0.1) was

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread Angel de Vicente via users
Hi, George Bosilca writes: > If I'm not mistaken, hcoll is playing with the opal_progress in a way > that conflicts with the blessed usage of progress in OMPI and prevents > other components from advancing and timely completing requests. The > impact is minimal for sequential applications using

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-11 Thread Angel de Vicente via users
Hello, Joshua Ladd writes: > These are very, very old versions of UCX and HCOLL installed in your > environment. Also, MXM was deprecated years ago in favor of UCX. What > version of MOFED is installed (run ofed_info -s)? What HCA generation > is present (run ibstat). MOFED is:

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-18 Thread Angel de Vicente via users
Hello, Gilles Gouaillardet via users writes: > Infiniband detection likely fails before checking expanded verbs. thanks for this. At the end, after playing a bit with different options, I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted 4.1.1, but that would not compile

[OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-17 Thread Angel de Vicente via users
Hi, I'm trying to compile the latest OpenMPI version with Infiniband support in our local cluster, but didn't get very far (since I'm installing this via Spack, I also asked in their support group). I'm doing the installation via Spack, which is issuing the following .configure step (see the

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-28 Thread Angel de Vicente via users
Hello, "Jeff Squyres (jsquyres)" writes: > I'd recommend against using Open MPI v3.1.0 -- it's quite old. If you > have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which > has all the rolled-up bug fixes on the v3.1.x series. > > That being said, Open MPI v4.1.2 is the most

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-01 Thread Angel de Vicente via users
Hello, John Hearns via users writes: > Stupid answer from me. If latency/bandwidth numbers are bad then check > that you are really running over the interface that you think you > should be. You could be falling back to running over Ethernet. I'm quite out of my depth here, so all answers are

[OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello, I'm running out of ideas, and wonder if someone here could have some tips on how to debug a segmentation fault I'm having with my application [due to the nature of the problem I'm wondering if the problem is with OpenMPI itself rather than my app, though at this point I'm not leaning

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Thanks Gilles, Gilles Gouaillardet via users writes: > You can first double check you > MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...) my code uses "mpi_thread_funneled" and OpenMPI was compiled with MPI_THREAD_MULTIPLE support: , | ompi_info | grep -i thread | Thread support:

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello Jeff, "Jeff Squyres (jsquyres)" writes: > With THREAD_FUNNELED, it means that there can only be one thread in > MPI at a time -- and it needs to be the same thread as the one that > called MPI_INIT_THREAD. > > Is that the case in your app? the master rank (i.e. 0) never creates threads,

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello, "Keller, Rainer" writes: > You’re using MPI_Probe() with Threads; that’s not safe. > Please consider using MPI_Mprobe() together with MPI_Mrecv(). many thanks for the suggestion. I will try with the M variants, though I was under the impression that mpi_probe() was OK as far as one made

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-26 Thread Angel de Vicente via users
Hello, thanks for your help and suggestions. At the end it was no issue with OpenMPI or with any other system stuff, but rather a single line in our code. I thought I was doing the tests with the -fbounds-check option, but it turns out I was not, arrrghh!! At some point I was writing outside one

Re: [OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hi, Angel de Vicente via users writes: > I have tried: > + /etc/pmix-mca-params.conf > + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf > but no luck. Never mind, /etc/openmpi/pmix-mca-params.conf was the right one. Cheers, -- Ángel de Vicente

[OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hello, with our current setting of OpenMPI and Slurm in a Ubuntu 22.04 server, when we submit MPI jobs I get the message: PMIX ERROR: ERROR in file ../../../../../../src/mca/gds/ds12/gds_ds12_lock_pthread.c at line 169 Following https://github.com/open-mpi/ompi/issues/7516, I tried setting