[OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-03 Thread Angel de Vicente via users
Hi,

in one of our codes, we want to create a log of events that happen in
the MPI processes, where the number of these events and their timing is
unpredictable.

So I implemented a simple test code, where process 0
creates a thread that is just busy-waiting for messages from any
process, and which is sent to stdout/stderr/log file upon receiving
them. The test code is at https://github.com/angel-devicente/thread_io
and the same idea went into our "real" code.

As far as I could see, this behaves very nicely, there are no deadlocks,
no lost messages and the performance penalty is minimal when considering
the real application this is intended for.

But then I found that in a local cluster the performance was very bad
(from ~5min 50s to ~5s for some test) when run with the locally
installed OpenMPI and my own OpenMPI installation (same gcc and OpenMPI
versions). Checking the OpenMPI configuration details, I found that the
locally installed OpenMPI was configured to use the Mellanox IB driver,
and in particular the hcoll component was somehow killing performance:

running with

mpirun  --mca coll_hcoll_enable 0 -np 51 ./test_t

was taking ~5s, while enabling coll_hcoll was killing performance, as
stated above (when run in a single node the performance also goes down,
but only about a factor 2X).

Has anyone seen anything like this? Perhaps a newer Mellanox driver
would solve the problem?

We were planning on making our code public, but before we do so, I want
to understand under which conditions we could have this problem with the
"Threaded I/O" approach and if possible how to get rid of it completely.

Any help/pointers appreciated.
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread Angel de Vicente via users
Hi,

George Bosilca  writes:

> If I'm not mistaken, hcoll is playing with the opal_progress in a way
> that conflicts with the blessed usage of progress in OMPI and prevents
> other components from advancing and timely completing requests. The
> impact is minimal for sequential applications using only blocking
> calls, but is jeopardizing performance when multiple types of
> communications are simultaneously executing or when multiple threads
> are active.
>
> The solution might be very simple: hcoll is a module providing support
> for collective communications so as long as you don't use collectives,
> or the tuned module provides collective performance similar to hcoll
> on your cluster, just go ahead and disable hcoll. You can also reach
> out to Mellanox folks asking them to fix the hcoll usage of
> opal_progress.

until we find a more robust solution I was thinking on trying to just
enquiry the MPI implementation at running time and use the threaded
version if hcoll is not present and go for the unthreaded version if it
is. Looking at the coll.h file I see that some functions there might be
useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but
I have never delved here. Would this be an appropriate approach? Any
examples on how to enquiry in code for a particular component?

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-05 Thread Angel de Vicente via users
Hi,

Joshua Ladd  writes:

> We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it
> takes exactly the same 19 secs (80 ranks).  
>
> What version of HCOLL are you using? Command line? 

Thanks for having a look at this.

According to ompi_info, our OpenMPI (version 3.0.1) was configured with
(and gcc version 7.2.0):

,
|   Configure command line: 'CFLAGS=-I/apps/OPENMPI/SRC/PMI/include'
|   '--prefix=/storage/apps/OPENMPI/3.0.1/gnu'
|   '--with-mxm=/opt/mellanox/mxm'
|   '--with-hcoll=/opt/mellanox/hcoll'
|   '--with-knem=/opt/knem-1.1.2.90mlnx2'
|   '--with-slurm' '--with-pmi=/usr'
|   '--with-pmi-libdir=/usr/lib64'
|   
'--with-platform=../contrib/platform/mellanox/optimized'
`

Not sure if there is a better way to find out the HCOLL version, but the
file hcoll_version.h in /opt/mellanox/hcoll/include/hcoll/api/ says we
have version 3.8.1649

Code compiled as:

,
| $ mpicc -o test_t thread_io.c test.c
`

To run the tests, I just submit the job to Slurm with the following
script (changing the coll_hcoll_enable param accordingly):

,
| #!/bin/bash
| #
| #SBATCH -J test
| #SBATCH -N 5
| #SBATCH -n 51
| #SBATCH -t 00:07:00
| #SBATCH -o test-%j.out
| #SBATCH -e test-%j.err
| #SBATCH -D .
| 
| module purge
| module load openmpi/gnu/3.0.1
| 
| time mpirun --mca coll_hcoll_enable 1 -np 51 ./test_t
`

In the latest test I managed to squeeze in our queuing system, the
hcoll-disabled run took ~3.5s, and the hcoll-enabled one ~43.5s (in this
one I actually commented out all the fprintf statements just in case, so
the code was pure communication).

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-06 Thread Angel de Vicente via users
Hi,

Joshua Ladd  writes:

> This is an ancient version of HCOLL. Please upgrade to the latest
> version (you can do this by installing HPC-X
> https://www.mellanox.com/products/hpc-x-toolkit) 

Just to close the circle and inform that all seems OK now.

I don't have root permission in this machine, so I could not change the
Mellanox drivers (4.1.1.0.2), but I downloaded the latest (as far as I
can tell) HPC-X version compatible with that driver (v.2.1.0), which
comes with hcoll version (4.0.2127). I compiled the OpenMPI version that
comes with HPC-X toolkit, so that I was using the same compiler
(gcc-7.2.0) used in the cluster version of OpenMPI, then HDF5 as well. 

And with that, both the thread_io test code and our real app seem to
behave very nicely and I get basically same timings when using the
MPI_THREAD_MULTIPLE or the single threaded MPI version.

All sorted. Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



[OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-17 Thread Angel de Vicente via users
Hi,

I'm trying to compile the latest OpenMPI version with Infiniband support
in our local cluster, but didn't get very far (since I'm installing this
via Spack, I also asked in their support group).

I'm doing the installation via Spack, which is issuing the following
.configure step (see the options given for --with-knem, --with-hcoll and
--with-mxm):

,
| configure'
| 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-4.1.1-jsvbusyjgthr2d6oyny5klt62gm6ma2u'
| '--enable-shared' '--disable-silent-rules' '--disable-builtin-atomics'
| '--enable-static' '--without-pmi'
| 
'--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz'
| '--enable-mpi1-compatibility' '--with-knem=/opt/knem-1.1.2.90mlnx2'
| '--with-hcoll=/opt/mellanox/hcoll' '--without-psm' '--without-ofi'
| '--without-cma' '--without-ucx' '--without-fca'
| '--with-mxm=/opt/mellanox/mxm' '--without-verbs' '--without-xpmem'
| '--without-psm2' '--without-alps' '--without-lsf' '--without-sge'
| '--without-slurm' '--without-tm' '--without-loadleveler'
| '--disable-memchecker'
| 
'--with-libevent=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/libevent-2.1.12-yd5l4tjmnigv6dqlv5afpn4zc6ekdchc'
| 
'--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-2.6.0-bfnt4g3givflydpe5d2iglyupgbzxbfn'
| '--disable-java' '--disable-mpi-java' '--without-cuda'
| '--enable-wrapper-rpath' '--disable-wrapper-runpath' '--disable-mpi-cxx'
| '--disable-cxx-exceptions'
| 
'--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_64-pc-linux-gnu/9.3.0
| 
-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64'
`

Later on in the configuration phase I see:

,
| --- MCA component btl:openib (m4 configuration macro)
| checking for MCA component btl:openib compile mode... static
| checking whether expanded verbs are available... yes
| checking whether IBV_EXP_ATOMIC_HCA_REPLY_BE is declared... yes
| checking whether IBV_EXP_QP_CREATE_ATOMIC_BE_REPLY is declared... yes
| checking whether ibv_exp_create_qp is declared... yes
| checking whether ibv_exp_query_device is declared... yes
| checking whether IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG is declared... yes
| checking for struct ibv_exp_device_attr.ext_atom... yes
| checking for struct ibv_exp_device_attr.exp_atomic_cap... yes
| checking if MCA component btl:openib can compile... no
`

This is the first time I try to compile OpenMPI this way, and I get a
bit confused with what each bit is doing, but it looks like it goes
through the moves to get the btl:openib built, but then for some reason
it cannot compile it.

Any suggestions/pointers?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-18 Thread Angel de Vicente via users
Hello,

Gilles Gouaillardet via users  writes:

> Infiniband detection likely fails before checking expanded verbs.

thanks for this. At the end, after playing a bit with different options,
I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted
4.1.1, but that would not compile cleanly with the old version of UCX
that is installed in the cluster). The configure command line (as
reported by ompi_info) was:

,
|   Configure command line: 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-3.1.0-g5a7szwxcsgmyibqvwwavfkz5b4i2ym7'
|   '--enable-shared' '--disable-silent-rules'
|   '--disable-builtin-atomics' '--with-pmi=/usr'
|   
'--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz'
|   '--without-knem' '--with-hcoll=/opt/mellanox/hcoll'
|   '--without-psm' '--without-ofi' '--without-cma'
|   '--with-ucx=/opt/ucx' '--without-fca'
|   '--without-mxm' '--without-verbs' '--without-xpmem'
|   '--without-psm2' '--without-alps' '--without-lsf'
|   '--without-sge' '--with-slurm' '--without-tm'
|   '--without-loadleveler' '--disable-memchecker'
|   
'--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-1.11.13-kpjkidab37wn25h2oyh3eva43ycjb6c5'
|   '--disable-java' '--disable-mpi-java'
|   '--without-cuda' '--enable-wrapper-rpath'
|   '--disable-wrapper-runpath' '--disable-mpi-cxx'
|   '--disable-cxx-exceptions'
|   
'--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_\
| 64-pc-linux-gnu/9.3.0
|   
-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64'
`


The versions that I'm using are:

gcc:   9.3.0
mxm:   3.6.3102  (though I configure OpenMPI --without-mxm)
hcoll: 3.8.1649
knem:  1.1.2.90mlnx2 (though I configure OpenMPI --without-knem)
ucx:   1.2.2947
slurm: 18.08.7


It looks like everything executes fine, but I have a couple of warnings,
and I'm not sure how much I should worry and what I could do about them:

1) Conflicting CPU frequencies detected:

[1645221586.038838] [s01r3b78:11041:0] sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 3151.41
[1645221585.740595] [s01r3b79:11484:0] sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 2998.76

2) Won't use knem. In a previous try, I was specifying --with-knem, but
I was getting this warning about not being able to open /dev/knem. I
guess our cluster is not properly configured w.r.t knem, so I built
OpenMPI again --without-knem, but I still get this message?

[1645221587.091122] [s01r3b74:9054 :0] shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.
[1645221587.104807] [s01r3b76:8610 :0] shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.


Any help/pointers appreciated. Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-28 Thread Angel de Vicente via users
Hello,

"Jeff Squyres (jsquyres)"  writes:

> I'd recommend against using Open MPI v3.1.0 -- it's quite old.  If you
> have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which
> has all the rolled-up bug fixes on the v3.1.x series.
>
> That being said, Open MPI v4.1.2 is the most current.  Open MPI v4.1.2 does
> restrict which versions of UCX it uses because there are bugs in the older
> versions of UCX.  I am not intimately familiar with UCX -- you'll need to ask
> Nvidia for support there -- but I was under the impression that it's just a
> user-level library, and you could certainly install your own copy of UCX to 
> use
> with your compilation of Open MPI.  I.e., you're not restricted to whatever 
> UCX
> is installed in the cluster system-default locations.

I did follow your advice, so I compiled my own version of UCX (1.11.2)
and OpenMPI v4.1.1, but for some reason the latency / bandwidth numbers
are really bad compared to the previous ones, so something is wrong, but
not sure how to debug it. 

> I don't know why you're getting MXM-specific error messages; those don't 
> appear
> to be coming from Open MPI (especially since you configured Open MPI with
> --without-mxm).  If you can upgrade to Open MPI v4.1.2 and the latest UCX, see
> if you are still getting those MXM error messages.

In this latest attempt, yes, the MXM error messages are still there.

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-01 Thread Angel de Vicente via users
Hello,

John Hearns via users  writes:

> Stupid answer from me. If latency/bandwidth numbers are bad then check
> that you are really running over the interface that you think you
> should be. You could be falling back to running over Ethernet.

I'm quite out of my depth here, so all answers are helpful, as I might have
skipped something very obvious.

In order to try and avoid the possibility of falling back to running
over Ethernet, I submitted the job with:

mpirun -n 2 --mca btl ^tcp osu_latency

which gives me the following error:

,
| At least one pair of MPI processes are unable to reach each other for
| MPI communications.  This means that no Open MPI device has indicated
| that it can be used to communicate between these processes.  This is
| an error; Open MPI requires that all MPI processes be able to reach
| each other.  This error can sometimes be the result of forgetting to
| specify the "self" BTL.
| 
|   Process 1 ([[37380,1],1]) is on host: s01r1b20
|   Process 2 ([[37380,1],0]) is on host: s01r1b19
|   BTLs attempted: self
| 
| Your MPI job is now going to abort; sorry.
`

This is certainly not happening when I use the "native" OpenMPI,
etc. provided in the cluster. I have not knowingly specified anywhere
not to support "self", so I have no clue what might be going on, as I
assumed that "self" was always built for OpenMPI.

Any hints on what (and where) I should look for?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-11 Thread Angel de Vicente via users
Hello,


Joshua Ladd  writes:

> These are very, very old versions of UCX and HCOLL installed in your
> environment. Also, MXM was deprecated years ago in favor of UCX. What
> version of MOFED is installed (run ofed_info -s)? What HCA generation
> is present (run ibstat).

MOFED is: MLNX_OFED_LINUX-4.1-1.0.2.0

As for the HCA generation, we don't seem to have the command ibstat
installed, any other way to get this info? But I *think* they are
ConnectX-3. 


> > Stupid answer from me. If latency/bandwidth numbers are bad then check
> > that you are really running over the interface that you think you
> > should be. You could be falling back to running over Ethernet.

apparently the problem with my first attempt was that I was installing a
very bare version of UCX. I re-did the installation with the following
configuration:

,
| 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/ucx-1.11.2-67aihiwsolnad6aqt2ei6j6iaptqgecf'
| '--enable-mt' '--enable-cma' '--disable-params-check' '--with-avx'
| '--enable-optimizations' '--disable-assertions' '--disable-logging'
| '--with-pic' '--with-rc' '--with-ud' '--with-dc' '--without-mlx5-dv'
| '--with-ib-hw-tm' '--with-dm' '--with-cm' '--without-rocm'
| '--without-java' '--without-cuda' '--without-gdrcopy' '--with-knem'
| '--without-xpmem'
`


and now the numbers are very good, most of the time better than the
"native" OpenMPI provided in the cluster.


So now I wanted to try another combination, using the Intel compiler
instead of gnu one. Apparently everything was compiled OK, and when I
try to run the OSU Microbenchmaks I have no problems with the
point-to-point benchmarks, but I get Segmentation Faults:

,
| load intel/2018.2 Set Intel compilers (LICENSE NEEDED! Please, contact 
support if you have any issue with license)
| /scratch/slurm/job1182830/slurm_script: line 59: unalias: despacktivate: not 
found
| [s01r2b22:26669] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././.][./././././././.]
| [s01r2b23:20286] MCW rank 1 bound to socket 0[core 0[hwt 0]]: 
[B/././././././.][./././././././.]
| [s01r2b22:26681:0] Caught signal 11 (Segmentation fault)
| [s01r2b23:20292:0] Caught signal 11 (Segmentation fault)
|  backtrace 
|  2 0x001c mxm_handle_error()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
|  3 0x0010055c mxm_error_signal_handler()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
|  4 0x00034950 killpg()  ??:0
|  5 0x000a7d41 PMPI_Comm_rank()  ??:0
|  6 0x00402e56 main()  ??:0
|  7 0x000206e5 __libc_start_main()  ??:0
|  8 0x00402ca9 _start()  
/home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118
| ===
|  backtrace 
|  2 0x001c mxm_handle_error()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
|  3 0x0010055c mxm_error_signal_handler()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
|  4 0x00034950 killpg()  ??:0
|  5 0x000a7d41 PMPI_Comm_rank()  ??:0
|  6 0x00402e56 main()  ??:0
|  7 0x000206e5 __libc_start_main()  ??:0
|  8 0x00402ca9 _start()  
/home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118
| ===
`


Any idea how I could try to debug/solve this?

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


[OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello,

I'm running out of ideas, and wonder if someone here could have some
tips on how to debug a segmentation fault I'm having with my
application [due to the nature of the problem I'm wondering if the
problem is with OpenMPI itself rather than my app, though at this point
I'm not leaning strongly either way].

The code is hybrid MPI+OpenMP and I compile it with gcc 10.3.0 and
OpenMPI 4.1.3.

Usually I was running the code with "mpirun -np X --bind-to none [...]"
so that the threads created by OpenMP don't get bound to a single core
and I actually get proper speedup out of OpenMP.

Now, since I introduced some changes to the code this week (though I
have read the changes carefully a number of times, and I don't see
anything suspicious), I now get a segmentation fault sometimes, but only
when I run with "--bind-to none" and only in my workstation. It is not
always with the same running configuration, but I can see some pattern,
and the problem shows up only if I run the version compiled with OpenMP
support and most of the times only when the number of rank*threads goes
above 4 or so. If I run it with "--bind-to socket" all looks good all
the time.

If I run it in another server, "--bind-to none" doesn't seem to be any
issue (I submitted the jobs many many times and not a single
segmentation fault), but in my workstation it fails almost every time if
using MPI+OpenMP with a handful of threads and with "--bind-to none". In
this other server I'm running gcc 9.3.0 and OpenMPI 4.1.3.

For example, setting OMP_NUM_THREADS to 1, I run the code like the
following, and get the segmentation fault as below:

,
| angelv@sieladon:~/.../Fe13_NL3/t~gauss+isat+istim$ mpirun -np 4 --bind-to 
none  ../../../../../pcorona+openmp~gauss Fe13_NL3.params 
|  Reading control file: Fe13_NL3.params
|   ... Control file parameters broadcasted
| 
| [...]
|  
|  Starting calculation loop on the line of sight
|  Receiving results from:2
|  Receiving results from:1
| 
| Program received signal SIGSEGV: Segmentation fault - invalid memory 
reference.
| 
| Backtrace for this error:
|  Receiving results from:3
| #0  0x7fd747e7555f in ???
| #1  0x7fd7488778e1 in ???
| #2  0x7fd7488667a4 in ???
| #3  0x7fd7486fe84c in ???
| #4  0x7fd7489aa9ce in ???
| #5  0x414959 in __pcorona_main_MOD_main_loop._omp_fn.0
| at src/pcorona_main.f90:627
| #6  0x7fd74813ec75 in ???
| #7  0x412bb0 in pcorona
| at src/pcorona.f90:49
| #8  0x40361c in main
| at src/pcorona.f90:17
| 
| [...]
| 
| --
| mpirun noticed that process rank 3 with PID 0 on node sieladon exited on 
signal 11 (Segmentation fault).
| ---
`

I cannot see inside the MPI library (I don't really know if that would
be helpful) but line 627 in pcorona_main.f90 is:

,
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
`

Any ideas/suggestions what could be going on or how to try an get some
more clues about the possible causes of this?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Thanks Gilles,

Gilles Gouaillardet via users  writes:

> You can first double check you
> MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...)

my code uses "mpi_thread_funneled" and OpenMPI was compiled with
MPI_THREAD_MULTIPLE support:

,
| ompi_info | grep  -i thread
|   Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, 
OMPI progress: no, ORTE progress: yes, Event lib: yes)
|FT Checkpoint support: no (checkpoint thread: no)
`

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello Jeff,

"Jeff Squyres (jsquyres)"  writes:

> With THREAD_FUNNELED, it means that there can only be one thread in
> MPI at a time -- and it needs to be the same thread as the one that
> called MPI_INIT_THREAD.
>
> Is that the case in your app?


the master rank (i.e. 0) never creates threads, while other ranks go through 
the following
to communicate with it, so I check that it is indeed the master thread
communicating only: 

,
|tid = 0  
| #ifdef _OPENMP  
|tid = omp_get_thread_num()   
| #endif  
| 
|do   
|   if (tid == 0) then
|  call mpi_send(my_rank, 1, mpi_integer, master, ask_job, &  
|   mpi_comm_world, mpierror) 
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
| 
|  if (stat(mpi_tag) == stop_signal) then 
| call mpi_recv(b_,1,mpi_integer,master,stop_signal, &
|  mpi_comm_world,stat,mpierror)  
|  else   
| call mpi_recv(iyax,1,mpi_integer,master,give_job, & 
|  mpi_comm_world,stat,mpierror)  
|  end if 
|   end if
| 
|   !$omp barrier
| 
|   [... actual work...]
`


> Also, what is your app doing at src/pcorona_main.f90:627?

It is the mpi_probe call above.


In case it can clarify things, my app follows a master-worker paradigm,
where rank 0 hands over jobs, and all mpi ranks > 0 just do the following:

,
| !$OMP PARALLEL DEFAULT(NONE)
| do
|   !  (the code above) 
|   if (tid == 0) then receive job number | stop signal
|  
|   !$OMP DO schedule(dynamic)
|   loop_izax: do izax=sol_nz_min,sol_nz_max
| 
|  [big computing loop body]
| 
|   end do loop_izax  
|   !$OMP END DO  
| 
|   if (tid == 0) then 
|   call mpi_send(iyax,1,mpi_integer,master,results_tag, & 
|mpi_comm_world,mpierror)  
|   call mpi_send(stokes_buf_y,nz*8,mpi_double_precision, &
|master,results_tag,mpi_comm_world,mpierror)   
|   end if 
|  
|   !omp barrier   
|  
| end do   
| !$OMP END PARALLEL  
`



Following Gilles' suggestion, I also tried changing MPI_THREAD_FUNELLED
to MPI_THREAD_MULTIPLE just in case, but I get the same segmentation
fault in the same line (mind you, the segmentation fault doesn't happen
all the time). But again, no issues if running with --bind-to socket
(and no apparent issues at all in the other computer even with --bind-to
none).

Many thanks for any suggestions,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello,

"Keller, Rainer"  writes:

> You’re using MPI_Probe() with Threads; that’s not safe.
> Please consider using MPI_Mprobe() together with MPI_Mrecv().

many thanks for the suggestion. I will try with the M variants, though I
was under the impression that mpi_probe() was OK as far as one made sure
that the source and tag matched between the mpi_probe() and the
mpi_recv() calls.

As you can see below, I'm careful with that (in any case I'm not sure
the problems lies there, since the error I get is about invalid
reference in the mpi_probe call itself). 

,
|tid = 0  
| #ifdef _OPENMP  
|tid = omp_get_thread_num()   
| #endif  
| 
|do   
|   if (tid == 0) then
|  call mpi_send(my_rank, 1, mpi_integer, master, ask_job, &  
|   mpi_comm_world, mpierror) 
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
| 
|  if (stat(mpi_tag) == stop_signal) then 
| call mpi_recv(b_,1,mpi_integer,master,stop_signal, &
|  mpi_comm_world,stat,mpierror)  
|  else   
| call mpi_recv(iyax,1,mpi_integer,master,give_job, & 
|  mpi_comm_world,stat,mpierror)  
|  end if 
|   end if
| 
|   !$omp barrier
| 
|   [... actual work...]
`


> So getting into valgrind may be of help, possibly recompiling Open MPI
> enabling valgrind-checking together with debugging options.

I was hoping to avoid this route, but it certainly is looking like I'll
have to bite the bullet...

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-26 Thread Angel de Vicente via users
Hello,

thanks for your help and suggestions.

At the end it was no issue with OpenMPI or with any other system stuff,
but rather a single line in our code. I thought I was doing the tests
with the -fbounds-check option, but it turns out I was not, arrrghh!! At
some point I was writing outside one of our arrays, and you can imagine
the rest... That it was happening only when I was running with
'--bind-to none' and only in my workstation brought me down all the
wrong debugging paths. Once I realized -fbounds-check was not being
used, figuring out the issue was a matter of seconds...

Our code is now happily performing the +3000 tests without a hitch.

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


[OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hello,

with our current setting of OpenMPI and Slurm in a Ubuntu 22.04 server,
when we submit MPI jobs I get the message:

PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_ds12_lock_pthread.c at line 169

Following https://github.com/open-mpi/ompi/issues/7516, I tried setting
PMIX_MCA_gds=hash.

Setting it as an environment variable or setting it in the file
~/.pmix/mca-params.conf does work. I now want to set it system-wide, but
I cannot find which file that should be.

I have tried: 
+ /etc/pmix-mca-params.conf
+ /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf
but no luck.

Do you know which file is the system-wide configuration or how to find
it?

Thanks,
-- 
Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939)
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)



Re: [OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hi,

Angel de Vicente via users  writes:

> I have tried: 
> + /etc/pmix-mca-params.conf
> + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf
> but no luck.

Never mind, /etc/openmpi/pmix-mca-params.conf was the right one.

Cheers,
-- 
Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939)
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)