Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Shrader, David Lee via users
I opened an issue, and a fix looks like it went in to the 4.1.2 release branch 
already. I tested the patch on my 4.1.1 release tarball, and the error no 
longer occurs.


Here is the link to the issue:


https://github.com/open-mpi/ompi/issues/9617


Thanks,

David



From: users  on behalf of Michael Di Domenico 
via users 
Sent: Wednesday, November 3, 2021 8:58 AM
Cc: Michael Di Domenico; Open MPI Users
Subject: Re: [OMPI users] [EXTERNAL] strange pml error

this seemed to help me as well, so far at least.  still have a lot
more testing to do

On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee  wrote:
>
> As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to 
> get around this issue. I'm not sure why this works, but perhaps there is 
> different initialization that happens such that the offending device search 
> problem doesn't occur?
>
>
> Thanks,
>
> David
>
>
>
> 
> From: Shrader, David Lee
> Sent: Tuesday, November 2, 2021 2:09 PM
> To: Open MPI Users
> Cc: Michael Di Domenico
> Subject: Re: [EXTERNAL] [OMPI users] strange pml error
>
>
> I too have been getting this using 4.1.1, but not with the master nightly 
> tarballs from mid-October. I still have it on my to-do list to open a github 
> issue. The problem seems to come from device detection in the ucx pml: on 
> some ranks, it fails to find a device and thus the ucx pml disqualifies 
> itself. Which then just leaves the ob1 pml.
>
>
> Thanks,
>
> David
>
>
>
> 
> From: users  on behalf of Michael Di 
> Domenico via users 
> Sent: Tuesday, November 2, 2021 1:35 PM
> To: Open MPI Users
> Cc: Michael Di Domenico
> Subject: [EXTERNAL] [OMPI users] strange pml error
>
> fairly frequently, but not everytime when trying to run xhpl on a new
> machine i'm bumping into this.  it happens with a single node or
> multiple nodes
>
> node1 selected pml ob1, but peer on node1 selected pml ucx
>
> if i rerun the exact same command a few minutes later, it works fine.
> the machine is new and i'm the only one using it so there are no user
> conflicts
>
> the software stack is
>
> slurm 21.8.2.1
> ompi 4.1.1
> pmix 3.2.3
> ucx 1.9.0
>
> the hardware is HPE w/ mellanox edr cards (but i doubt that matters)
>
> any thoughts?


Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Ralph Castain via users
Could you please ensure it was configured with --enable-debug and then add 
"--mca rmaps_base_verbose 5" to the mpirun cmd line?


On Nov 3, 2021, at 9:10 AM, Mccall, Kurt E. (MSFC-EV41) via users 
mailto:users@lists.open-mpi.org> > wrote:

Gilles and Ralph,
 I did build with -with-tm.   I tried Gilles workaround but the failure still 
occurred.    What do I need to provide you so that you can investigate this 
possible bug?
 Thanks,
Kurt
 From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Wednesday, November 3, 2021 8:45 AM
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Ralph Castain mailto:r...@open-mpi.org> >
Subject: [EXTERNAL] Re: [OMPI users] Reserving slots and filling them after job 
launch with MPI_Comm_spawn
 Sounds like a bug to me - regardless of configuration, if the hostfile 
contains an entry for each slot on a node, OMPI should have added those up.
 

On Nov 3, 2021, at 2:49 AM, Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org> > wrote:
 Kurt,
 Assuming you built Open MPI with tm support (default if tm is detected at 
configure time, but you can configure --with-tm to have it abort if tm support 
is not found), you should not need to use a hostfile.
 As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...
  Cheers,
 Gilles
 On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users 
mailto:users@lists.open-mpi.org> > wrote:
I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and compiled with 
Torque support.
 I want to reserve multiple slots on each node, and then launch a single 
manager process on each node.   The remaining slots would be filled up as the 
manager spawns new processes with MPI_Comm_spawn on its local node.
 Here is the abbreviated mpiexec command, which I assume is the source of the 
problem described below (?).   The hostfile was created by Torque and it 
contains many repeated node names, one for each slot that it reserved.   
 $ mpiexec --hostfile  MyHostFile  -np 21 -npernode 1  (etc.)
  When MPI_Comm_spawn is called, MPI is reporting that “All nodes which are 
allocated for this job are already filled."   They don’t appear to be filled as 
it also reports that only one slot is in use for each node:
 ==   ALLOCATED NODES   ==
    n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP
    n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
    n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
 Do you have any idea what I am doing wrong?   My Torque qsub arguments are 
unchanged from when I successfully launched this kind of job structure under 
MPICH.   The relevant argument to qsub is the resource list, which is “-l  
nodes=21:ppn=9”.



Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Mccall, Kurt E. (MSFC-EV41) via users
Gilles and Ralph,

I did build with -with-tm.   I tried Gilles workaround but the failure still 
occurred.What do I need to provide you so that you can investigate this 
possible bug?

Thanks,
Kurt

From: users  On Behalf Of Ralph Castain via 
users
Sent: Wednesday, November 3, 2021 8:45 AM
To: Open MPI Users 
Cc: Ralph Castain 
Subject: [EXTERNAL] Re: [OMPI users] Reserving slots and filling them after job 
launch with MPI_Comm_spawn

Sounds like a bug to me - regardless of configuration, if the hostfile contains 
an entry for each slot on a node, OMPI should have added those up.



On Nov 3, 2021, at 2:49 AM, Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>> wrote:

Kurt,

Assuming you built Open MPI with tm support (default if tm is detected at 
configure time, but you can configure --with-tm to have it abort if tm support 
is not found), you should not need to use a hostfile.

As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...


Cheers,

Gilles

On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users 
mailto:users@lists.open-mpi.org>> wrote:
I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and compiled with 
Torque support.

I want to reserve multiple slots on each node, and then launch a single manager 
process on each node.   The remaining slots would be filled up as the manager 
spawns new processes with MPI_Comm_spawn on its local node.

Here is the abbreviated mpiexec command, which I assume is the source of the 
problem described below (?).   The hostfile was created by Torque and it 
contains many repeated node names, one for each slot that it reserved.

$ mpiexec --hostfile  MyHostFile  -np 21 -npernode 1  (etc.)


When MPI_Comm_spawn is called, MPI is reporting that “All nodes which are 
allocated for this job are already filled."   They don’t appear to be filled as 
it also reports that only one slot is in use for each node:

==   ALLOCATED NODES   ==
n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP
n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

Do you have any idea what I am doing wrong?   My Torque qsub arguments are 
unchanged from when I successfully launched this kind of job structure under 
MPICH.   The relevant argument to qsub is the resource list, which is “-l  
nodes=21:ppn=9”.




Re: [OMPI users] [EXTERNAL] strange pml error

2021-11-03 Thread Michael Di Domenico via users
this seemed to help me as well, so far at least.  still have a lot
more testing to do

On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee  wrote:
>
> As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to 
> get around this issue. I'm not sure why this works, but perhaps there is 
> different initialization that happens such that the offending device search 
> problem doesn't occur?
>
>
> Thanks,
>
> David
>
>
>
> 
> From: Shrader, David Lee
> Sent: Tuesday, November 2, 2021 2:09 PM
> To: Open MPI Users
> Cc: Michael Di Domenico
> Subject: Re: [EXTERNAL] [OMPI users] strange pml error
>
>
> I too have been getting this using 4.1.1, but not with the master nightly 
> tarballs from mid-October. I still have it on my to-do list to open a github 
> issue. The problem seems to come from device detection in the ucx pml: on 
> some ranks, it fails to find a device and thus the ucx pml disqualifies 
> itself. Which then just leaves the ob1 pml.
>
>
> Thanks,
>
> David
>
>
>
> 
> From: users  on behalf of Michael Di 
> Domenico via users 
> Sent: Tuesday, November 2, 2021 1:35 PM
> To: Open MPI Users
> Cc: Michael Di Domenico
> Subject: [EXTERNAL] [OMPI users] strange pml error
>
> fairly frequently, but not everytime when trying to run xhpl on a new
> machine i'm bumping into this.  it happens with a single node or
> multiple nodes
>
> node1 selected pml ob1, but peer on node1 selected pml ucx
>
> if i rerun the exact same command a few minutes later, it works fine.
> the machine is new and i'm the only one using it so there are no user
> conflicts
>
> the software stack is
>
> slurm 21.8.2.1
> ompi 4.1.1
> pmix 3.2.3
> ucx 1.9.0
>
> the hardware is HPE w/ mellanox edr cards (but i doubt that matters)
>
> any thoughts?


Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Ralph Castain via users
Sounds like a bug to me - regardless of configuration, if the hostfile contains 
an entry for each slot on a node, OMPI should have added those up.


On Nov 3, 2021, at 2:49 AM, Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org> > wrote:

Kurt,

Assuming you built Open MPI with tm support (default if tm is detected at 
configure time, but you can configure --with-tm to have it abort if tm support 
is not found), you should not need to use a hostfile.

As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...


Cheers,

Gilles

On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users 
mailto:users@lists.open-mpi.org> > wrote:
I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and compiled with 
Torque support.

 
I want to reserve multiple slots on each node, and then launch a single manager 
process on each node.   The remaining slots would be filled up as the manager 
spawns new processes with MPI_Comm_spawn on its local node.

 
Here is the abbreviated mpiexec command, which I assume is the source of the 
problem described below (?).   The hostfile was created by Torque and it 
contains many repeated node names, one for each slot that it reserved.  

 
$ mpiexec --hostfile  MyHostFile  -np 21 -npernode 1  (etc.)

 
 
When MPI_Comm_spawn is called, MPI is reporting that “All nodes which are 
allocated for this job are already filled."   They don’t appear to be filled as 
it also reports that only one slot is in use for each node:

 
==   ALLOCATED NODES   ==

    n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP

    n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

    n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

 
Do you have any idea what I am doing wrong?   My Torque qsub arguments are 
unchanged from when I successfully launched this kind of job structure under 
MPICH.   The relevant argument to qsub is the resource list, which is “-l  
nodes=21:ppn=9”.

 



Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Gilles Gouaillardet via users
Kurt,

Assuming you built Open MPI with tm support (default if tm is detected at
configure time, but you can configure --with-tm to have it abort if tm
support is not found), you should not need to use a hostfile.

As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...


Cheers,

Gilles

On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users <
users@lists.open-mpi.org> wrote:

> I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and compiled
> with Torque support.
>
>
>
> I want to reserve multiple slots on each node, and then launch a single
> manager process on each node.   The remaining slots would be filled up as
> the manager spawns new processes with MPI_Comm_spawn on its local node.
>
>
>
> Here is the abbreviated mpiexec command, which I assume is the source of
> the problem described below (?).   The hostfile was created by Torque and
> it contains many repeated node names, one for each slot that it reserved.
>
>
>
> $ mpiexec --hostfile  MyHostFile  -np 21 -npernode 1  (etc.)
>
>
>
>
>
> When MPI_Comm_spawn is called, MPI is reporting that “All nodes which are
> allocated for this job are already filled."   They don’t appear to be
> filled as it also reports that only one slot is in use for each node:
>
>
>
> ==   ALLOCATED NODES   ==
>
> n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
> n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
>
>
>
> Do you have any idea what I am doing wrong?   My Torque qsub arguments are
> unchanged from when I successfully launched this kind of job structure
> under MPICH.   The relevant argument to qsub is the resource list, which is
> “-l  nodes=21:ppn=9”.
>
>
>


[OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Mccall, Kurt E. (MSFC-EV41) via users
I'm using OpenMPI 4.1.1 compiled with Nvidia's nvc++ 20.9, and compiled with 
Torque support.

I want to reserve multiple slots on each node, and then launch a single manager 
process on each node.   The remaining slots would be filled up as the manager 
spawns new processes with MPI_Comm_spawn on its local node.

Here is the abbreviated mpiexec command, which I assume is the source of the 
problem described below (?).   The hostfile was created by Torque and it 
contains many repeated node names, one for each slot that it reserved.

$ mpiexec --hostfile  MyHostFile  -np 21 -npernode 1  (etc.)


When MPI_Comm_spawn is called, MPI is reporting that "All nodes which are 
allocated for this job are already filled."   They don't appear to be filled as 
it also reports that only one slot is in use for each node:

==   ALLOCATED NODES   ==
n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP
n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP
n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP

Do you have any idea what I am doing wrong?   My Torque qsub arguments are 
unchanged from when I successfully launched this kind of job structure under 
MPICH.   The relevant argument to qsub is the resource list, which is "-l  
nodes=21:ppn=9".