Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-01 Thread Joshua Ladd via users
These are very, very old versions of UCX and HCOLL installed in your
environment. Also, MXM was deprecated years ago in favor of UCX. What
version of MOFED is installed (run ofed_info -s)? What HCA generation is
present (run ibstat).

Josh

On Tue, Mar 1, 2022 at 6:42 AM Angel de Vicente via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> John Hearns via users  writes:
>
> > Stupid answer from me. If latency/bandwidth numbers are bad then check
> > that you are really running over the interface that you think you
> > should be. You could be falling back to running over Ethernet.
>
> I'm quite out of my depth here, so all answers are helpful, as I might have
> skipped something very obvious.
>
> In order to try and avoid the possibility of falling back to running
> over Ethernet, I submitted the job with:
>
> mpirun -n 2 --mca btl ^tcp osu_latency
>
> which gives me the following error:
>
> ,
> | At least one pair of MPI processes are unable to reach each other for
> | MPI communications.  This means that no Open MPI device has indicated
> | that it can be used to communicate between these processes.  This is
> | an error; Open MPI requires that all MPI processes be able to reach
> | each other.  This error can sometimes be the result of forgetting to
> | specify the "self" BTL.
> |
> |   Process 1 ([[37380,1],1]) is on host: s01r1b20
> |   Process 2 ([[37380,1],0]) is on host: s01r1b19
> |   BTLs attempted: self
> |
> | Your MPI job is now going to abort; sorry.
> `
>
> This is certainly not happening when I use the "native" OpenMPI,
> etc. provided in the cluster. I have not knowingly specified anywhere
> not to support "self", so I have no clue what might be going on, as I
> assumed that "self" was always built for OpenMPI.
>
> Any hints on what (and where) I should look for?
>
> Many thanks,
> --
> Ángel de Vicente
>
> Tel.: +34 922 605 747
> Web.: http://research.iac.es/proyecto/polmag/
>
> -
> AVISO LEGAL: Este mensaje puede contener información confidencial y/o
> privilegiada. Si usted no es el destinatario final del mismo o lo ha
> recibido por error, por favor notifíquelo al remitente inmediatamente.
> Cualquier uso no autorizadas del contenido de este mensaje está
> estrictamente prohibida. Más información en:
> https://www.iac.es/es/responsabilidad-legal
> DISCLAIMER: This message may contain confidential and / or privileged
> information. If you are not the final recipient or have received it in
> error, please notify the sender immediately. Any unauthorized use of the
> content of this message is strictly prohibited. More information:
> https://www.iac.es/en/disclaimer
>


Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-05 Thread Joshua Ladd via users
This is an ancient version of HCOLL. Please upgrade to the latest version
(you can do this by installing HPC-X
https://www.mellanox.com/products/hpc-x-toolkit)

Josh

On Wed, Feb 5, 2020 at 4:35 AM Angel de Vicente 
wrote:

> Hi,
>
> Joshua Ladd  writes:
>
> > We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it
> > takes exactly the same 19 secs (80 ranks).
> >
> > What version of HCOLL are you using? Command line?
>
> Thanks for having a look at this.
>
> According to ompi_info, our OpenMPI (version 3.0.1) was configured with
> (and gcc version 7.2.0):
>
> ,
> |   Configure command line: 'CFLAGS=-I/apps/OPENMPI/SRC/PMI/include'
> |   '--prefix=/storage/apps/OPENMPI/3.0.1/gnu'
> |   '--with-mxm=/opt/mellanox/mxm'
> |   '--with-hcoll=/opt/mellanox/hcoll'
> |   '--with-knem=/opt/knem-1.1.2.90mlnx2'
> |   '--with-slurm' '--with-pmi=/usr'
> |   '--with-pmi-libdir=/usr/lib64'
> |
>  '--with-platform=../contrib/platform/mellanox/optimized'
> `
>
> Not sure if there is a better way to find out the HCOLL version, but the
> file hcoll_version.h in /opt/mellanox/hcoll/include/hcoll/api/ says we
> have version 3.8.1649
>
> Code compiled as:
>
> ,
> | $ mpicc -o test_t thread_io.c test.c
> `
>
> To run the tests, I just submit the job to Slurm with the following
> script (changing the coll_hcoll_enable param accordingly):
>
> ,
> | #!/bin/bash
> | #
> | #SBATCH -J test
> | #SBATCH -N 5
> | #SBATCH -n 51
> | #SBATCH -t 00:07:00
> | #SBATCH -o test-%j.out
> | #SBATCH -e test-%j.err
> | #SBATCH -D .
> |
> | module purge
> | module load openmpi/gnu/3.0.1
> |
> | time mpirun --mca coll_hcoll_enable 1 -np 51 ./test_t
> `
>
> In the latest test I managed to squeeze in our queuing system, the
> hcoll-disabled run took ~3.5s, and the hcoll-enabled one ~43.5s (in this
> one I actually commented out all the fprintf statements just in case, so
> the code was pure communication).
>
> Thanks,
> --
> Ángel de Vicente
>
> Tel.: +34 922 605 747
> Web.: http://research.iac.es/proyecto/polmag/
>
> -
> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de
> Datos, acceda a http://www.iac.es/disclaimer.php
> WARNING: For more information on privacy and fulfilment of the Law
> concerning the Protection of Data, consult
> http://www.iac.es/disclaimer.php?lang=en
>
>


Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread Joshua Ladd via users
We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it takes
exactly the same 19 secs (80 ranks).

What version of HCOLL are you using? Command line?

Josh

On Tue, Feb 4, 2020 at 8:44 AM George Bosilca via users <
users@lists.open-mpi.org> wrote:

> Hcoll will be present in many cases, you don’t really want to skip them
> all. I foresee 2 problem with the approach you propose:
> - collective components are selected per communicator, so even if they
> will not be used they are still loaded.
> - from outside the MPI library you have little access to internal
> information, especially to components that are loaded and actives.
>
> I’m afraid the best solution is to prevent OMPI from loading the hcoll
> component if you want to use threading, by adding ‘—mca coll ^hcoll’ to
> your mpirun.
>
>   George.
>
>
> On Tue, Feb 4, 2020 at 8:32 AM Angel de Vicente 
> wrote:
>
>> Hi,
>>
>> George Bosilca  writes:
>>
>> > If I'm not mistaken, hcoll is playing with the opal_progress in a way
>> > that conflicts with the blessed usage of progress in OMPI and prevents
>> > other components from advancing and timely completing requests. The
>> > impact is minimal for sequential applications using only blocking
>> > calls, but is jeopardizing performance when multiple types of
>> > communications are simultaneously executing or when multiple threads
>> > are active.
>> >
>> > The solution might be very simple: hcoll is a module providing support
>> > for collective communications so as long as you don't use collectives,
>> > or the tuned module provides collective performance similar to hcoll
>> > on your cluster, just go ahead and disable hcoll. You can also reach
>> > out to Mellanox folks asking them to fix the hcoll usage of
>> > opal_progress.
>>
>> until we find a more robust solution I was thinking on trying to just
>> enquiry the MPI implementation at running time and use the threaded
>> version if hcoll is not present and go for the unthreaded version if it
>> is. Looking at the coll.h file I see that some functions there might be
>> useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but
>> I have never delved here. Would this be an appropriate approach? Any
>> examples on how to enquiry in code for a particular component?
>>
>> Thanks,
>> --
>> Ángel de Vicente
>>
>> Tel.: +34 922 605 747
>> Web.: http://research.iac.es/proyecto/polmag/
>>
>> -
>> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección
>> de Datos, acceda a http://www.iac.es/disclaimer.php
>> WARNING: For more information on privacy and fulfilment of the Law
>> concerning the Protection of Data, consult
>> http://www.iac.es/disclaimer.php?lang=en
>>
>>


Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
OK. Please try:

mpirun -np 128 --debug-daemons  --map-by ppr:64:socket  hostname

Josh

On Tue, Jan 28, 2020 at 12:49 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:

> Input: mpirun -np 128 --debug-daemons -mca plm rsh hostname
>
>
>
> Output:
>
> [Gen2Node3:54039] [[16643,0],0] orted_cmd: received add_local_procs
>
> [Gen2Node3:54039] [[16643,0],0] orted_cmd: received exit cmd
>
> [Gen2Node3:54039] [[16643,0],0] orted_cmd: all routes and children gone -
> exiting
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 12:48 PM
> *To:* Collin Strassburger 
> *Cc:* Open MPI Users ; Ralph Castain <
> r...@open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Sorry, typo, try:
>
>
>
> mpirun -np 128 --debug-daemons -mca plm rsh hostname
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 12:45 PM Joshua Ladd  wrote:
>
> And if you try:
>
> mpirun -np 128 --debug-daemons -plm rsh hostname
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 12:34 PM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> Input:   mpirun -np 128 --debug-daemons hostname
>
>
>
> Output:
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received add_local_procs
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received exit cmd
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: all routes and children gone -
> exiting
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 12:31 PM
> *To:* Collin Strassburger 
> *Cc:* Open MPI Users ; Ralph Castain <
> r...@open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Interesting. Can you try:
>
>
>
> mpirun -np 128 --debug-daemons hostname
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 12:14 PM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> In relation to the multi-node attempt, I haven’t yet set that up yet as
> the per-node configuration doesn’t pass its tests (full node utilization,
> etc).
>
>
>
> Here are the results for the hostname test:
>
> Input: mpirun -np 128 hostname
>
>
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 12:06 PM
> *To:* Joshua Ladd 
> *Cc:* Ralph Castain ; Open MPI Users <
> users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Josh - if you read thru the thread, you will see that disabling
> Mellanox/IB drivers allows the program to run. It only fails when they are
> enabled.
>
>
>
>
>
> On Jan 28, 2020, at 8:49 AM, Joshua Ladd  wrote:
>
>
>
> I don't see how this can be diagnosed as a "problem with the Mellanox
> Software". This is on a single node. What happens when you try to launch on
> more than one node?
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> Here’s the I/O for these high local core count runs. (“xhpcg” is the
> standard hpcg benchmark)
>
>
>
> Run command: mpirun -np 128 bin/xhpcg
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node4
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 11:39 AM
> *To:* Open MPI Users 
> *Cc:* Collin Strassburger ; Ralph Castain <
> r...@open-mpi.org>; Artem Polyakov 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
Sorry, typo, try:

mpirun -np 128 --debug-daemons -mca plm rsh hostname

Josh

On Tue, Jan 28, 2020 at 12:45 PM Joshua Ladd  wrote:

> And if you try:
> mpirun -np 128 --debug-daemons -plm rsh hostname
>
> Josh
>
> On Tue, Jan 28, 2020 at 12:34 PM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
>> Input:   mpirun -np 128 --debug-daemons hostname
>>
>>
>>
>> Output:
>>
>> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received add_local_procs
>>
>> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received exit cmd
>>
>> [Gen2Node3:54023] [[16659,0],0] orted_cmd: all routes and children gone -
>> exiting
>>
>> --
>>
>> mpirun was unable to start the specified application as it encountered an
>>
>> error:
>>
>>
>>
>> Error code: 63
>>
>> Error name: (null)
>>
>> Node: Gen2Node3
>>
>>
>>
>> when attempting to start process rank 0.
>>
>> --
>>
>>
>>
>> Collin
>>
>>
>>
>> *From:* Joshua Ladd 
>> *Sent:* Tuesday, January 28, 2020 12:31 PM
>> *To:* Collin Strassburger 
>> *Cc:* Open MPI Users ; Ralph Castain <
>> r...@open-mpi.org>
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>
>> Interesting. Can you try:
>>
>>
>>
>> mpirun -np 128 --debug-daemons hostname
>>
>>
>>
>> Josh
>>
>>
>>
>> On Tue, Jan 28, 2020 at 12:14 PM Collin Strassburger <
>> cstrassbur...@bihrle.com> wrote:
>>
>> In relation to the multi-node attempt, I haven’t yet set that up yet as
>> the per-node configuration doesn’t pass its tests (full node utilization,
>> etc).
>>
>>
>>
>> Here are the results for the hostname test:
>>
>> Input: mpirun -np 128 hostname
>>
>>
>>
>> Output:
>>
>> --
>>
>> mpirun was unable to start the specified application as it encountered an
>>
>> error:
>>
>>
>>
>> Error code: 63
>>
>> Error name: (null)
>>
>> Node: Gen2Node3
>>
>>
>>
>> when attempting to start process rank 0.
>>
>> --
>>
>> 128 total processes failed to start
>>
>>
>>
>>
>>
>> Collin
>>
>>
>>
>>
>>
>> *From:* users  *On Behalf Of *Ralph
>> Castain via users
>> *Sent:* Tuesday, January 28, 2020 12:06 PM
>> *To:* Joshua Ladd 
>> *Cc:* Ralph Castain ; Open MPI Users <
>> users@lists.open-mpi.org>
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>
>> Josh - if you read thru the thread, you will see that disabling
>> Mellanox/IB drivers allows the program to run. It only fails when they are
>> enabled.
>>
>>
>>
>>
>>
>> On Jan 28, 2020, at 8:49 AM, Joshua Ladd  wrote:
>>
>>
>>
>> I don't see how this can be diagnosed as a "problem with the Mellanox
>> Software". This is on a single node. What happens when you try to launch on
>> more than one node?
>>
>>
>>
>> Josh
>>
>>
>>
>> On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
>> cstrassbur...@bihrle.com> wrote:
>>
>> Here’s the I/O for these high local core count runs. (“xhpcg” is the
>> standard hpcg benchmark)
>>
>>
>>
>> Run command: mpirun -np 128 bin/xhpcg
>>
>> Output:
>>
>> --
>>
>> mpirun was unable to start the specified application as it encountered an
>>
>> error:
>>
>>
>>
>> Error code: 63
>>
>> Error name: (null)
>>
>> Node: Gen2Node4
>>
>>
>>
>> when attempting to start process rank 0.
>>
>> --
>>
>> 128 total processes failed to start
>>
>>
>>
>>
>>
>> Collin
>>
>>
>>
>> *From:* Joshua Ladd 
>> *Sent:* Tuesday, January 28, 2020 11:39 AM
>> *To:* Open MPI Users 
>> *Cc:* Collin Strassburger ; Ralph Castain <
>> r...@open-mpi.org>; Artem Polyakov 
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>
>> Can you send the output of a failed run including your command line.
>>
>>
>>
>> Josh
>>
>>
>>
>> On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
>> users@lists.open-mpi.org> wrote:
>>
>> Okay, so this is a problem with the Mellanox software - copying Artem.
>>
>>
>>
>> On Jan 28, 2020, at 8:15 AM, Collin Strassburger <
>> cstrassbur...@bihrle.com> wrote:
>>
>>
>>
>> I just tried that and it does indeed work with pbs and without Mellanox
>> (until a reboot makes it complain about Mellanox/IB related defaults as no
>> drivers were installed, etc).
>>
>>
>>
>> After installing the Mellanox drivers, I used
>>
>> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
>> --with-platform=contrib/platform/mellanox/optimized
>>
>>
>>
>> With the new compile it fails on the higher core counts.
>>
>>
>>
>>
>>
>> Collin
>>
>>
>>
>> *From:* users  *On Behalf Of *Ralph
>> Castain via users
>> *Sent:* Tuesday, January 28, 2020 11:02 AM
>> *To:* O

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
And if you try:
mpirun -np 128 --debug-daemons -plm rsh hostname

Josh

On Tue, Jan 28, 2020 at 12:34 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:

> Input:   mpirun -np 128 --debug-daemons hostname
>
>
>
> Output:
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received add_local_procs
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received exit cmd
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: all routes and children gone -
> exiting
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 12:31 PM
> *To:* Collin Strassburger 
> *Cc:* Open MPI Users ; Ralph Castain <
> r...@open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Interesting. Can you try:
>
>
>
> mpirun -np 128 --debug-daemons hostname
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 12:14 PM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> In relation to the multi-node attempt, I haven’t yet set that up yet as
> the per-node configuration doesn’t pass its tests (full node utilization,
> etc).
>
>
>
> Here are the results for the hostname test:
>
> Input: mpirun -np 128 hostname
>
>
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 12:06 PM
> *To:* Joshua Ladd 
> *Cc:* Ralph Castain ; Open MPI Users <
> users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Josh - if you read thru the thread, you will see that disabling
> Mellanox/IB drivers allows the program to run. It only fails when they are
> enabled.
>
>
>
>
>
> On Jan 28, 2020, at 8:49 AM, Joshua Ladd  wrote:
>
>
>
> I don't see how this can be diagnosed as a "problem with the Mellanox
> Software". This is on a single node. What happens when you try to launch on
> more than one node?
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> Here’s the I/O for these high local core count runs. (“xhpcg” is the
> standard hpcg benchmark)
>
>
>
> Run command: mpirun -np 128 bin/xhpcg
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node4
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 11:39 AM
> *To:* Open MPI Users 
> *Cc:* Collin Strassburger ; Ralph Castain <
> r...@open-mpi.org>; Artem Polyakov 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Can you send the output of a failed run including your command line.
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
> users@lists.open-mpi.org> wrote:
>
> Okay, so this is a problem with the Mellanox software - copying Artem.
>
>
>
> On Jan 28, 2020, at 8:15 AM, Collin Strassburger 
> wrote:
>
>
>
> I just tried that and it does indeed work with pbs and without Mellanox
> (until a reboot makes it complain about Mellanox/IB related defaults as no
> drivers were installed, etc).
>
>
>
> After installing the Mellanox drivers, I used
>
> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
> --with-platform=contrib/platform/mellanox/optimized
>
>
>
> With the new compile it fails on the higher core counts.
>
>
>
>
>
> Collin
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 11:02 AM
> *To:* Open MPI Users 
> *Cc:* Ralph Castain 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Does it work with pbs but not Mellanox? Just trying to isolate the problem.
>
>
>
>
>
> On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
>
>
> Hello,
>
>
>
> I have done some additional testing and I

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
Interesting. Can you try:

mpirun -np 128 --debug-daemons hostname

Josh

On Tue, Jan 28, 2020 at 12:14 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:

> In relation to the multi-node attempt, I haven’t yet set that up yet as
> the per-node configuration doesn’t pass its tests (full node utilization,
> etc).
>
>
>
> Here are the results for the hostname test:
>
> Input: mpirun -np 128 hostname
>
>
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node3
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 12:06 PM
> *To:* Joshua Ladd 
> *Cc:* Ralph Castain ; Open MPI Users <
> users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Josh - if you read thru the thread, you will see that disabling
> Mellanox/IB drivers allows the program to run. It only fails when they are
> enabled.
>
>
>
>
>
> On Jan 28, 2020, at 8:49 AM, Joshua Ladd  wrote:
>
>
>
> I don't see how this can be diagnosed as a "problem with the Mellanox
> Software". This is on a single node. What happens when you try to launch on
> more than one node?
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> Here’s the I/O for these high local core count runs. (“xhpcg” is the
> standard hpcg benchmark)
>
>
>
> Run command: mpirun -np 128 bin/xhpcg
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node4
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 11:39 AM
> *To:* Open MPI Users 
> *Cc:* Collin Strassburger ; Ralph Castain <
> r...@open-mpi.org>; Artem Polyakov 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Can you send the output of a failed run including your command line.
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
> users@lists.open-mpi.org> wrote:
>
> Okay, so this is a problem with the Mellanox software - copying Artem.
>
>
>
> On Jan 28, 2020, at 8:15 AM, Collin Strassburger 
> wrote:
>
>
>
> I just tried that and it does indeed work with pbs and without Mellanox
> (until a reboot makes it complain about Mellanox/IB related defaults as no
> drivers were installed, etc).
>
>
>
> After installing the Mellanox drivers, I used
>
> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
> --with-platform=contrib/platform/mellanox/optimized
>
>
>
> With the new compile it fails on the higher core counts.
>
>
>
>
>
> Collin
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 11:02 AM
> *To:* Open MPI Users 
> *Cc:* Ralph Castain 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Does it work with pbs but not Mellanox? Just trying to isolate the problem.
>
>
>
>
>
> On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
>
>
> Hello,
>
>
>
> I have done some additional testing and I can say that it works correctly
> with gcc8 and no mellanox or pbs installed.
>
>
>
> I am have done two runs with Mellanox and pbs installed.  One run includes
> the actual run options I will be using while the other includes a truncated
> set which still compiles but fails to execute correctly.  As the option
> with the actual run options results in a smaller config log, I am including
> it here.
>
>
>
> Version: 4.0.2
>
> The config log is available at
> https://gist.github.com/BTemp1282020/fedca1aeed3b57296b8f21688ccae31c and
> the ompi dump is available athttps://pastebin.com/md3HwTUR.
>
>
>
> The IB network information (which is not being explicitly operated across):
>
> Packages: MLNX_OFED and Mellanox HPC-X, both are current versions
> (MLNX_OFED_LINUX-4.7-3.2.9.0-rhel8.1-x86_64 and
> hpcx-v2.5.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat8.1-x86_64)
>
> Ulimit -l = unlimited
>
> Ibv_devinfo:
>
> hca_id: mlx4_0
>
> transport:  InfiniBand (0)
>
> fw_ver: 2.42.5000
>
> …
>
> vendor_id:  0x02c9
>
> v

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
Also, can you try running:

mpirun -np 128 hostname

Josh

On Tue, Jan 28, 2020 at 11:49 AM Joshua Ladd  wrote:

> I don't see how this can be diagnosed as a "problem with the Mellanox
> Software". This is on a single node. What happens when you try to launch on
> more than one node?
>
> Josh
>
> On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
>> Here’s the I/O for these high local core count runs. (“xhpcg” is the
>> standard hpcg benchmark)
>>
>>
>>
>> Run command: mpirun -np 128 bin/xhpcg
>>
>> Output:
>>
>> --
>>
>> mpirun was unable to start the specified application as it encountered an
>>
>> error:
>>
>>
>>
>> Error code: 63
>>
>> Error name: (null)
>>
>> Node: Gen2Node4
>>
>>
>>
>> when attempting to start process rank 0.
>>
>> --
>>
>> 128 total processes failed to start
>>
>>
>>
>>
>>
>> Collin
>>
>>
>>
>> *From:* Joshua Ladd 
>> *Sent:* Tuesday, January 28, 2020 11:39 AM
>> *To:* Open MPI Users 
>> *Cc:* Collin Strassburger ; Ralph Castain <
>> r...@open-mpi.org>; Artem Polyakov 
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>
>> Can you send the output of a failed run including your command line.
>>
>>
>>
>> Josh
>>
>>
>>
>> On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
>> users@lists.open-mpi.org> wrote:
>>
>> Okay, so this is a problem with the Mellanox software - copying Artem.
>>
>>
>>
>> On Jan 28, 2020, at 8:15 AM, Collin Strassburger <
>> cstrassbur...@bihrle.com> wrote:
>>
>>
>>
>> I just tried that and it does indeed work with pbs and without Mellanox
>> (until a reboot makes it complain about Mellanox/IB related defaults as no
>> drivers were installed, etc).
>>
>>
>>
>> After installing the Mellanox drivers, I used
>>
>> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
>> --with-platform=contrib/platform/mellanox/optimized
>>
>>
>>
>> With the new compile it fails on the higher core counts.
>>
>>
>>
>>
>>
>> Collin
>>
>>
>>
>> *From:* users  *On Behalf Of *Ralph
>> Castain via users
>> *Sent:* Tuesday, January 28, 2020 11:02 AM
>> *To:* Open MPI Users 
>> *Cc:* Ralph Castain 
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>
>> Does it work with pbs but not Mellanox? Just trying to isolate the
>> problem.
>>
>>
>>
>>
>>
>> On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users <
>> users@lists.open-mpi.org> wrote:
>>
>>
>>
>> Hello,
>>
>>
>>
>> I have done some additional testing and I can say that it works correctly
>> with gcc8 and no mellanox or pbs installed.
>>
>>
>>
>> I am have done two runs with Mellanox and pbs installed.  One run
>> includes the actual run options I will be using while the other includes a
>> truncated set which still compiles but fails to execute correctly.  As the
>> option with the actual run options results in a smaller config log, I am
>> including it here.
>>
>>
>>
>> Version: 4.0.2
>>
>> The config log is available at
>> https://gist.github.com/BTemp1282020/fedca1aeed3b57296b8f21688ccae31c and
>> the ompi dump is available athttps://pastebin.com/md3HwTUR.
>>
>>
>>
>> The IB network information (which is not being explicitly operated
>> across):
>>
>> Packages: MLNX_OFED and Mellanox HPC-X, both are current versions
>> (MLNX_OFED_LINUX-4.7-3.2.9.0-rhel8.1-x86_64 and
>> hpcx-v2.5.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat8.1-x86_64)
>>
>> Ulimit -l = unlimited
>>
>> Ibv_devinfo:
>>
>> hca_id: mlx4_0
>>
>> transport:  InfiniBand (0)
>>
>> fw_ver: 2.42.5000
>>
>> …
>>
>> vendor_id:  0x02c9
>>
>> vendor_part_id: 4099
>>
>> hw_ver: 0x1
>>
>> board_id:   MT_1100120019
>>
>> phys_port_cnt:  1
>>
>> Device ports:
>>
>> port:   1
>>
>> state:  PORT_ACTIVE (4)
>>
>> max_mtu:4096 (5)
>>
>> active_mtu: 4096 (5)
>>
>> sm_lid: 1
>>
>> port_lid:   12
>>
>> port_lmc:   0x00
>>
>> link_layer: InfiniBand
>>
>> It looks like the rest of the IB information is in the config file.
>>
>>
>>
>> I hope this helps,
>>
>> Collin
>>
>>
>>
>>
>>
>>
>>
>> *From:* Jeff Squyres (jsquyres) 
>> *Sent:* Monday, January 27, 2020 3:40 PM
>> *To:* Open MPI User's List 
>> *Cc:* Collin Strassburger 
>> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
>> 7742 when utilizing 100+ processors per node
>>
>>
>>

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
I don't see how this can be diagnosed as a "problem with the Mellanox
Software". This is on a single node. What happens when you try to launch on
more than one node?

Josh

On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:

> Here’s the I/O for these high local core count runs. (“xhpcg” is the
> standard hpcg benchmark)
>
>
>
> Run command: mpirun -np 128 bin/xhpcg
>
> Output:
>
> --
>
> mpirun was unable to start the specified application as it encountered an
>
> error:
>
>
>
> Error code: 63
>
> Error name: (null)
>
> Node: Gen2Node4
>
>
>
> when attempting to start process rank 0.
>
> --
>
> 128 total processes failed to start
>
>
>
>
>
> Collin
>
>
>
> *From:* Joshua Ladd 
> *Sent:* Tuesday, January 28, 2020 11:39 AM
> *To:* Open MPI Users 
> *Cc:* Collin Strassburger ; Ralph Castain <
> r...@open-mpi.org>; Artem Polyakov 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Can you send the output of a failed run including your command line.
>
>
>
> Josh
>
>
>
> On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
> users@lists.open-mpi.org> wrote:
>
> Okay, so this is a problem with the Mellanox software - copying Artem.
>
>
>
> On Jan 28, 2020, at 8:15 AM, Collin Strassburger 
> wrote:
>
>
>
> I just tried that and it does indeed work with pbs and without Mellanox
> (until a reboot makes it complain about Mellanox/IB related defaults as no
> drivers were installed, etc).
>
>
>
> After installing the Mellanox drivers, I used
>
> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
> --with-platform=contrib/platform/mellanox/optimized
>
>
>
> With the new compile it fails on the higher core counts.
>
>
>
>
>
> Collin
>
>
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 11:02 AM
> *To:* Open MPI Users 
> *Cc:* Ralph Castain 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Does it work with pbs but not Mellanox? Just trying to isolate the problem.
>
>
>
>
>
> On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
>
>
> Hello,
>
>
>
> I have done some additional testing and I can say that it works correctly
> with gcc8 and no mellanox or pbs installed.
>
>
>
> I am have done two runs with Mellanox and pbs installed.  One run includes
> the actual run options I will be using while the other includes a truncated
> set which still compiles but fails to execute correctly.  As the option
> with the actual run options results in a smaller config log, I am including
> it here.
>
>
>
> Version: 4.0.2
>
> The config log is available at
> https://gist.github.com/BTemp1282020/fedca1aeed3b57296b8f21688ccae31c and
> the ompi dump is available athttps://pastebin.com/md3HwTUR.
>
>
>
> The IB network information (which is not being explicitly operated across):
>
> Packages: MLNX_OFED and Mellanox HPC-X, both are current versions
> (MLNX_OFED_LINUX-4.7-3.2.9.0-rhel8.1-x86_64 and
> hpcx-v2.5.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat8.1-x86_64)
>
> Ulimit -l = unlimited
>
> Ibv_devinfo:
>
> hca_id: mlx4_0
>
> transport:  InfiniBand (0)
>
> fw_ver: 2.42.5000
>
> …
>
> vendor_id:  0x02c9
>
> vendor_part_id: 4099
>
> hw_ver: 0x1
>
> board_id:   MT_1100120019
>
> phys_port_cnt:  1
>
> Device ports:
>
> port:   1
>
> state:  PORT_ACTIVE (4)
>
> max_mtu:4096 (5)
>
> active_mtu: 4096 (5)
>
> sm_lid: 1
>
> port_lid:   12
>
> port_lmc:   0x00
>
> link_layer: InfiniBand
>
> It looks like the rest of the IB information is in the config file.
>
>
>
> I hope this helps,
>
> Collin
>
>
>
>
>
>
>
> *From:* Jeff Squyres (jsquyres) 
> *Sent:* Monday, January 27, 2020 3:40 PM
> *To:* Open MPI User's List 
> *Cc:* Collin Strassburger 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
>
> Can you please send all the information listed here:
>
>
>
> https://www.open-mpi.org/community/help/
>
>
>
> Thanks!
>
>
>
>
>
> On Jan 27, 2020, at 12:00 PM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
>
>
> Hello,
>
>
>
> I had initially thought the same thing about the streams, but I have 2
> sockets with 64 cores each.  Additional

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-28 Thread Joshua Ladd via users
Can you send the output of a failed run including your command line.

Josh

On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
users@lists.open-mpi.org> wrote:

> Okay, so this is a problem with the Mellanox software - copying Artem.
>
> On Jan 28, 2020, at 8:15 AM, Collin Strassburger 
> wrote:
>
> I just tried that and it does indeed work with pbs and without Mellanox
> (until a reboot makes it complain about Mellanox/IB related defaults as no
> drivers were installed, etc).
>
> After installing the Mellanox drivers, I used
> ./configure --prefix=/usr/ --with-tm=/opt/pbs/ --with-slurm=no --with-ucx
> --with-platform=contrib/platform/mellanox/optimized
>
> With the new compile it fails on the higher core counts.
>
>
> Collin
>
> *From:* users  *On Behalf Of *Ralph
> Castain via users
> *Sent:* Tuesday, January 28, 2020 11:02 AM
> *To:* Open MPI Users 
> *Cc:* Ralph Castain 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
> Does it work with pbs but not Mellanox? Just trying to isolate the problem.
>
>
>
> On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
> Hello,
>
> I have done some additional testing and I can say that it works correctly
> with gcc8 and no mellanox or pbs installed.
>
> I am have done two runs with Mellanox and pbs installed.  One run includes
> the actual run options I will be using while the other includes a truncated
> set which still compiles but fails to execute correctly.  As the option
> with the actual run options results in a smaller config log, I am including
> it here.
>
> Version: 4.0.2
> The config log is available at
> https://gist.github.com/BTemp1282020/fedca1aeed3b57296b8f21688ccae31c and
> the ompi dump is available athttps://pastebin.com/md3HwTUR.
>
> The IB network information (which is not being explicitly operated across):
> Packages: MLNX_OFED and Mellanox HPC-X, both are current versions
> (MLNX_OFED_LINUX-4.7-3.2.9.0-rhel8.1-x86_64 and
> hpcx-v2.5.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat8.1-x86_64)
> Ulimit -l = unlimited
> Ibv_devinfo:
> hca_id: mlx4_0
> transport:  InfiniBand (0)
> fw_ver: 2.42.5000
> …
> vendor_id:  0x02c9
> vendor_part_id: 4099
> hw_ver: 0x1
> board_id:   MT_1100120019
> phys_port_cnt:  1
> Device ports:
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 1
> port_lid:   12
> port_lmc:   0x00
> link_layer: InfiniBand
> It looks like the rest of the IB information is in the config file.
>
> I hope this helps,
> Collin
>
>
>
> *From:* Jeff Squyres (jsquyres) 
> *Sent:* Monday, January 27, 2020 3:40 PM
> *To:* Open MPI User's List 
> *Cc:* Collin Strassburger 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
> Can you please send all the information listed here:
>
> https://www.open-mpi.org/community/help/
>
> Thanks!
>
>
>
>
> On Jan 27, 2020, at 12:00 PM, Collin Strassburger via users <
> users@lists.open-mpi.org> wrote:
>
> Hello,
>
> I had initially thought the same thing about the streams, but I have 2
> sockets with 64 cores each.  Additionally, I have not yet turned
> multithreading off, so lscpu reports a total of 256 logical cores and 128
> physical cores.  As such, I don’t see how it could be running out of
> streams unless something is being passed incorrectly.
>
> Collin
>
> *From:* users  *On Behalf Of *Ray
> Sheppard via users
> *Sent:* Monday, January 27, 2020 11:53 AM
> *To:* users@lists.open-mpi.org
> *Cc:* Ray Sheppard 
> *Subject:* Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD
> 7742 when utilizing 100+ processors per node
>
>
> Hi All,
>   Just my two cents, I think error code 63 is saying it is running out of
> streams to use.  I think you have only 64 cores, so at 100, you are
> overloading most of them.  It feels like you are running out of resources
> trying to swap in and out ranks on physical cores.
>Ray
> On 1/27/2020 11:29 AM, Collin Strassburger via users wrote:
>
> This message was sent from a non-IU address. Please exercise caution when
> clicking links or opening attachments from external sources.
>
> Hello Howard,
>
> To remove potential interactions, I have found that the issue persists
> without ucx and hcoll support.
>
> Run command: mpirun -np 128 bin/xhpcg
> Output:
> --
> mpirun was unable to start t

Re: [OMPI users] CUDA-aware codes not using GPU

2019-09-06 Thread Joshua Ladd via users
Did you build UCX with CUDA support (--with-cuda) ?

Josh

On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users <
users@lists.open-mpi.org> wrote:

> Hello OpenMPI Team,
>
> I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU
> and the code runs on the CPUs. I've tried different software but will focus
> on the OSU benchmarks (collective and pt2pt communications). Let me provide
> some data about the configuration of the system:
>
> -OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card
> with MOFED a few days ago and found the same issue)
>
> -CUDA v10.1
>
> -gdrcopy v1.3
>
> -UCX 1.6.0
>
> -OpenMPI 4.0.1
>
> Everything looks like good (CUDA programs work fine, MPI programs run on
> the CPUs without any problem), and the ompi_info outputs what I was
> expecting (but maybe I'm missing something):
>
>
> mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support
>
> mca:mpi:base:param:mpi_built_with_cuda_support:value:true
>
> mca:mpi:base:param:mpi_built_with_cuda_support:source:default
>
> mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only
>
> mca:mpi:base:param:mpi_built_with_cuda_support:level:4
>
> mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU
> buffer support is built into library or not
>
> mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false
>
> mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true
>
> mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no
>
> mca:mpi:base:param:mpi_built_with_cuda_support:type:bool
>
>
> mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support
>
> mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false
>
> The available btls are the usual self, openib, tcp & vader plus smcuda,
> uct & usnic. The full output from ompi_info is attached. If I try the flag
> '--mca opal_cuda_verbose 10,' it doesn't output anything, which seems to
> agree with the lack of GPU use. If I try with '--mca btl smcuda,' it makes
> no difference. I have also tried to specify the program to use host and
> device (e.g. mpirun -np 2 ./osu_latency D H) but the same result. I am
> probably missing something but not sure where else to look at or what else
> to try.
>
> Thank you,
>
> AFernandez
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE

2019-08-26 Thread Joshua Ladd via users
**apropos  :-)

On Mon, Aug 26, 2019 at 9:19 PM Joshua Ladd  wrote:

> Hi, Paul
>
> I must say, this is eerily appropo. I've just sent a request for Wombat
> last week as I was planning to have my group start looking at the
> performance of UCX OSC on IB. We are most interested in ensuring UCX OSC MT
> performs well on Wombat. The bitbucket you're referencing; is this the
> source code? Can we build and run it?
>
>
> Best,
>
> Josh
>
> On Fri, Aug 23, 2019 at 9:37 PM Paul Edmon via users <
> users@lists.open-mpi.org> wrote:
>
>> I forgot to include that we have not rebuilt this OpenMPI 4.0.1 against
>> 1.6.0 of UCX but rather 1.5.1.  When we upgraded to 1.6.0 everything seemed
>> to be working for OpenMPI when we swapped the UCX version with out
>> recompiling (at least in normal rank level MPI as we had to do the upgrade
>> to UCX to get MPI_THREAD_MULTIPLE to work at all).
>>
>> -Paul Edmon-
>> On 8/23/2019 9:31 PM, Paul Edmon wrote:
>>
>> Sure.  The code I'm using is the latest version of Wombat (
>> https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an
>> unreleased updated version as I know the devs).  I'm using
>> OMP_THREAD_NUM=12 and the command line is:
>>
>> mpirun -np 16 --hostfile hosts ./wombat
>>
>> Where the host file lists 4 machines, so 4 ranks per machine and 12
>> threads per rank.  Each node has 48 Intel Cascade Lake cores. I've also
>> tried using the Slurm scheduler version which is:
>>
>> srun -n 16 -c 12 --mpi=pmix ./wombat
>>
>> Which also hangs.  It works if I constrain to one or two nodes but any
>> greater than that hangs.  As for network hardware:
>>
>> [root@holy7c02101 ~]# ibstat
>> CA 'mlx5_0'
>> CA type: MT4119
>> Number of ports: 1
>> Firmware version: 16.25.6000
>> Hardware version: 0
>> Node GUID: 0xb8599f0300158f20
>> System image GUID: 0xb8599f0300158f20
>> Port 1:
>> State: Active
>> Physical state: LinkUp
>> Rate: 100
>> Base lid: 808
>> LMC: 1
>> SM lid: 584
>> Capability mask: 0x2651e848
>> Port GUID: 0xb8599f0300158f20
>> Link layer: InfiniBand
>>
>> [root@holy7c02101 ~]# lspci | grep Mellanox
>> 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family
>> [ConnectX-5]
>>
>> As for IB RDMA kernel stack we are using the default drivers that come
>> with CentOS 7.6.1810 which is rdma core 17.2-3.
>>
>> I will note that I successfully ran an old version of Wombat on all
>> 30,000 cores of this system using OpenMPI 3.1.3 and regular IB Verbs with
>> no problem earlier this week, though that was pure MPI ranks with no
>> threads.  Nonetheless the fabric itself is healthy and in good shape.  It
>> seems to be this edge case using the latest OpenMPI with UCX and threads
>> that is causing the hang ups.  To be sure the latest version of Wombat (as
>> I believe the public version does as well) uses many of the state of the
>> art MPI RMA direct calls, so its definitely pushing the envelope in ways
>> our typical user base here will not.  Still it would be good to iron out
>> this kink so if users do hit it we have a solution.  As noted UCX is very
>> new to us and thus it is entirely possible that we are missing something in
>> its interaction with OpenMPI.  Our MPI is compiled thusly:
>>
>>
>> https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec
>>
>> I will note that when I built this it was built using the default version
>> of UCX that comes with EPEL (1.5.1).  We only built 1.6.0 as the version
>> provided by EPEL did not build with MT enabled, which to me seems strange
>> as I don't see any reason not to build with MT enabled.  Anyways that's the
>> deeper context.
>>
>> -Paul Edmon-
>> On 8/23/2019 5:49 PM, Joshua Ladd via users wrote:
>>
>> Paul,
>>
>> Can you provide a repro and command line, please. Also, what network
>> hardware are you using?
>>
>> Josh
>>
>> On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users <
>> users@lists.open-mpi.org> wrote:
>>
>>> I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm
>>> using OpenMPI 4.0.1.  Since 4.0.1 requires UCX I have it installed with
>>> MT on (1.6.0 build).  The thing is that the code ke

Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE

2019-08-26 Thread Joshua Ladd via users
Hi, Paul

I must say, this is eerily appropo. I've just sent a request for Wombat
last week as I was planning to have my group start looking at the
performance of UCX OSC on IB. We are most interested in ensuring UCX OSC MT
performs well on Wombat. The bitbucket you're referencing; is this the
source code? Can we build and run it?


Best,

Josh

On Fri, Aug 23, 2019 at 9:37 PM Paul Edmon via users <
users@lists.open-mpi.org> wrote:

> I forgot to include that we have not rebuilt this OpenMPI 4.0.1 against
> 1.6.0 of UCX but rather 1.5.1.  When we upgraded to 1.6.0 everything seemed
> to be working for OpenMPI when we swapped the UCX version with out
> recompiling (at least in normal rank level MPI as we had to do the upgrade
> to UCX to get MPI_THREAD_MULTIPLE to work at all).
>
> -Paul Edmon-
> On 8/23/2019 9:31 PM, Paul Edmon wrote:
>
> Sure.  The code I'm using is the latest version of Wombat (
> https://bitbucket.org/pmendygral/wombat-public/wiki/Home , I'm using an
> unreleased updated version as I know the devs).  I'm using
> OMP_THREAD_NUM=12 and the command line is:
>
> mpirun -np 16 --hostfile hosts ./wombat
>
> Where the host file lists 4 machines, so 4 ranks per machine and 12
> threads per rank.  Each node has 48 Intel Cascade Lake cores. I've also
> tried using the Slurm scheduler version which is:
>
> srun -n 16 -c 12 --mpi=pmix ./wombat
>
> Which also hangs.  It works if I constrain to one or two nodes but any
> greater than that hangs.  As for network hardware:
>
> [root@holy7c02101 ~]# ibstat
> CA 'mlx5_0'
> CA type: MT4119
> Number of ports: 1
> Firmware version: 16.25.6000
> Hardware version: 0
> Node GUID: 0xb8599f0300158f20
> System image GUID: 0xb8599f0300158f20
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 100
> Base lid: 808
> LMC: 1
> SM lid: 584
> Capability mask: 0x2651e848
> Port GUID: 0xb8599f0300158f20
> Link layer: InfiniBand
>
> [root@holy7c02101 ~]# lspci | grep Mellanox
> 58:00.0 Infiniband controller: Mellanox Technologies MT27800 Family
> [ConnectX-5]
>
> As for IB RDMA kernel stack we are using the default drivers that come
> with CentOS 7.6.1810 which is rdma core 17.2-3.
>
> I will note that I successfully ran an old version of Wombat on all 30,000
> cores of this system using OpenMPI 3.1.3 and regular IB Verbs with no
> problem earlier this week, though that was pure MPI ranks with no threads.
> Nonetheless the fabric itself is healthy and in good shape.  It seems to be
> this edge case using the latest OpenMPI with UCX and threads that is
> causing the hang ups.  To be sure the latest version of Wombat (as I
> believe the public version does as well) uses many of the state of the art
> MPI RMA direct calls, so its definitely pushing the envelope in ways our
> typical user base here will not.  Still it would be good to iron out this
> kink so if users do hit it we have a solution.  As noted UCX is very new to
> us and thus it is entirely possible that we are missing something in its
> interaction with OpenMPI.  Our MPI is compiled thusly:
>
>
> https://github.com/fasrc/helmod/blob/master/rpmbuild/SPECS/centos7/openmpi-4.0.1-fasrc01.spec
>
> I will note that when I built this it was built using the default version
> of UCX that comes with EPEL (1.5.1).  We only built 1.6.0 as the version
> provided by EPEL did not build with MT enabled, which to me seems strange
> as I don't see any reason not to build with MT enabled.  Anyways that's the
> deeper context.
>
> -Paul Edmon-
> On 8/23/2019 5:49 PM, Joshua Ladd via users wrote:
>
> Paul,
>
> Can you provide a repro and command line, please. Also, what network
> hardware are you using?
>
> Josh
>
> On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users <
> users@lists.open-mpi.org> wrote:
>
>> I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm
>> using OpenMPI 4.0.1.  Since 4.0.1 requires UCX I have it installed with
>> MT on (1.6.0 build).  The thing is that the code keeps stalling out when
>> I go above a couple of nodes.  UCX is new to our environment as
>> previously we have just used the regular IB Verbs with no problem.  My
>> guess is that there is either some option in OpenMPI I am missing or
>> some variable in UCX I am not setting.  Any insight on what could be
>> causing the stalls?
>>
>> -Paul Edmon-
>>
>> ___
>>

Re: [OMPI users] UCX and MPI_THREAD_MULTIPLE

2019-08-23 Thread Joshua Ladd via users
Paul,

Can you provide a repro and command line, please. Also, what network
hardware are you using?

Josh

On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users <
users@lists.open-mpi.org> wrote:

> I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm
> using OpenMPI 4.0.1.  Since 4.0.1 requires UCX I have it installed with
> MT on (1.6.0 build).  The thing is that the code keeps stalling out when
> I go above a couple of nodes.  UCX is new to our environment as
> previously we have just used the regular IB Verbs with no problem.  My
> guess is that there is either some option in OpenMPI I am missing or
> some variable in UCX I am not setting.  Any insight on what could be
> causing the stalls?
>
> -Paul Edmon-
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Joshua Ladd via users
Hi, Noam

Can you try your original command line with the following addition:

mpirun —mca pml ucx —mca btl ^vader,tcp,openib -*mca osc ucx  *

I think we're seeing some conflict between UCX PML and UCT OSC.

Josh

On Wed, Jun 19, 2019 at 4:36 PM Noam Bernstein via users <
users@lists.open-mpi.org> wrote:

> On Jun 19, 2019, at 2:44 PM, George Bosilca  wrote:
>
> To completely disable UCX you need to disable the UCX MTL and not only the
> BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1”.
>
>
> Thanks for the pointer.  Disabling ucx this way _does_ seem to fix the
> memory issue.  That’s a very helpful workaround, if nothing else.
>
> Using ucx 1.5.1 downloaded from the ucx web site at runtime (just by
> inserting it into LD_LIBRARY_PATH, without recompiling openmpi) doesn’t
> seem to fix the problem.
>
>
> As you have a gdb session on the processes you can try to break on some of
> the memory allocations function (malloc, realloc, calloc).
>
>
> Good idea.  I set breakpoints on all 3 of those, then did “c” 3 times.
> Does this mean anything to anyone?  I’m investigating the upstream calls
> (not included below) that generate these calls to mpi_bcast, but given that
> it works on other types of nodes, I doubt those are problematic.
>
> #0  0x2b9e5303e160 in malloc () from /lib64/libc.so.6
> #1  0x2b9e651f358a in ucs_rcache_create_region
> (region_p=0x7fff82806da0, arg=0x7fff82806d9c, prot=3, length=131072,
> address=0x2b9e76102070, rcache=0xb341a50) at sys/rcache.c:500
> #2  ucs_rcache_get (rcache=0xb341a50, address=0x2b9e76102070,
> length=131072, prot=prot@entry=3, arg=arg@entry=0x7fff82806d9c,
> region_p=region_p@entry=0x7fff82806da0) at sys/rcache.c:612
> #3  0x2b9e64f7a3d4 in uct_ib_mem_rcache_reg (uct_md=,
> address=, length=, flags=96,
> memh_p=0xbc409b0) at ib/base/ib_md.c:990
> #4  0x2b9e64d245e2 in ucp_mem_rereg_mds (context=,
> reg_md_map=4, address=address@entry=0x2b9e76102070, length= out>, uct_flags=uct_flags@entry=96,
> alloc_md=alloc_md@entry=0x0, mem_type=mem_type@entry
> =UCT_MD_MEM_TYPE_HOST, alloc_md_memh_p=alloc_md_memh_p@entry=0x0,
> uct_memh=uct_memh@entry=0xbc409b0, md_map_p=md_map_p@entry=0xbc409a8)
> at core/ucp_mm.c:100
> #5  0x2b9e64d260f0 in ucp_request_memory_reg (context=0xb340800,
> md_map=4, buffer=0x2b9e76102070, length=131072, datatype=128,
> state=state@entry=0xbc409a0, mem_type=UCT_MD_MEM_TYPE_HOST,
> req_dbg=req_dbg@entry=0xbc40940, uct_flags=,
> uct_flags@entry=0) at core/ucp_request.c:218
> #6  0x2b9e64d3716b in ucp_request_send_buffer_reg (md_map= out>, req=0xbc40940)
> at 
> /home_tin/bernadm/configuration/330_OFED/ucx-1.5.1/src/ucp/core/ucp_request.inl:343
> #7  ucp_tag_send_start_rndv (sreq=sreq@entry=0xbc40940) at tag/rndv.c:153
> #8  0x2b9e64d3abb9 in ucp_tag_send_req (enable_zcopy=1,
> proto=0x2b9e64f569c0 , cb=0x2b9e64467350
> , rndv_am_thresh=,
> rndv_rma_thresh=, msg_config=0xb3ea278, dt_count=8192,
> req=) at tag/tag_send.c:78
> #9  ucp_tag_send_nb (ep=, buffer=,
> count=8192, datatype=, tag=,
> cb=0x2b9e64467350 ) at tag/tag_send.c:203
> #10 0x2b9e64465fa6 in mca_pml_ucx_isend ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_pml_ucx.so
> #11 0x2b9e52211900 in ompi_coll_base_bcast_intra_generic ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #12 0x2b9e52211d4b in ompi_coll_base_bcast_intra_pipeline ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #13 0x2b9e673bc384 in ompi_coll_tuned_bcast_intra_dec_fixed ()
> from /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/openmpi/mca_coll_tuned.so
> #14 0x2b9e521dbb79 in PMPI_Bcast () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi.so.40
> #15 0x2b9e51f623df in pmpi_bcast__ () from
> /share/apps/mpi/openmpi/4.0.1/ib/gnu/lib/libmpi_mpifh.so.40
>
>
> #0  0x2b9e5303e160 in malloc () from /lib64/libc.so.6
> #1  0x2b9e651ed684 in ucs_pgt_dir_alloc (pgtable=0xb341ab8) at
> datastruct/pgtable.c:69
> #2  ucs_pgtable_insert_page (region=0xc6919d0, order=12,
> address=47959585718272, pgtable=0xb341ab8) at datastruct/pgtable.c:299
> #3  ucs_pgtable_insert (pgtable=pgtable@entry=0xb341ab8,
> region=region@entry=0xc6919d0) at datastruct/pgtable.c:403
> #4  0x2b9e651f35bc in ucs_rcache_create_region
> (region_p=0x7fff82806da0, arg=0x7fff82806d9c, prot=3, length=131072,
> address=0x2b9e76102070, rcache=0xb341a50) at sys/rcache.c:511
> #5  ucs_rcache_get (rcache=0xb341a50, address=0x2b9e76102070,
> length=131072, prot=prot@entry=3, arg=arg@entry=0x7fff82806d9c,
> region_p=region_p@entry=0x7fff82806da0) at sys/rcache.c:612
> #6  0x2b9e64f7a3d4 in uct_ib_mem_rcache_reg (uct_md=,
> address=, length=, flags=96,
> memh_p=0xbc409b0) at ib/base/ib_md.c:990
> #7  0x2b9e64d245e2 in ucp_mem_rereg_mds (context=,
> reg_md_map=4, address=address@entry=0x2b9e76102070, length= out>, uct_flags=uct_flags@entry=96,
> alloc_md=alloc_md@entry=0x0, mem_type=mem_type@entry
> =UCT_MD_MEM_TYP