Re: [OMPI users] [Help] Must orted exit after all spawned proecesses exit

2021-05-19 Thread Ralph Castain via users
To answer your specific questions:

The backend daemons (orted) will not exit until all locally spawned procs exit. 
This is not configurable - for one thing, OMPI procs will suicide if they see 
the daemon depart, so it makes no sense to have the daemon fail if a proc 
terminates. The logic behind this behavior spans multiple parts of the code 
base, I'm afraid.

On May 17, 2021, at 7:03 AM, Jeff Squyres (jsquyres) via users 
mailto:users@lists.open-mpi.org> > wrote:

FYI: general Open MPI questions are better sent to the user's mailing list.

Up through the v4.1.x series, the "orted" is a general helper process that Open 
MPI uses on the back-end.  It will not quit until all of its children have 
died.  Open MPI's run time is designed with the intent that some external 
helper will be there for the entire duration of the job; there is no option to 
run without one.

Two caveats:

1. In Open MPI v5.0.x, from the user's perspective, "orted" has been renamed to 
be "prted".  Since this is 99.999% behind the scenes, most users won't notice 
the difference.

2. You can run without "orted" (or "prted") if you use a different run-time 
environment (e.g., SLURM).  In this case, you'll use that environment's 
launcher (e.g., srun or sbatch in SLURM environments) to directly launch MPI 
processes -- you won't use "mpirun" at all.  Fittingly, this is called "direct 
launch" in Open MPI parlance (i.e., using another run-time's daemons to launch 
processes instead of first launching orteds (or prteds).



On May 16, 2021, at 8:34 AM, 叶安华 mailto:yean...@sensetime.com> > wrote:

Code snippet: 
 # sleep.sh
sleep 10001 &
/bin/sh son_sleep.sh
sleep 10002
 # son_sleep.sh
sleep 10003 &
sleep 10004 &
 thanks
Anhua
  From: 叶安华 mailto:yean...@sensetime.com> >
Date: Sunday, May 16, 2021 at 20:31
To: "jsquy...@cisco.com  " mailto:jsquy...@cisco.com> >
Subject: [Help] Must orted exit after all spawned proecesses exit
 Dear Jeff, 
 Sorry to bother you but I am really curious about the conditions on which 
orted exits in the below scenario, and I am looking forward to hearing from you.
 Scenario description:
· Step 1: start a remote process via "mpirun -np 1 -host 10.211.55.4 sh 
sleep.sh"
· Step 2: check pstree in the remote host:

· Step 3: the mpirun process in step 1 does not exit until I kill all 
the sleeping process, which are 15479 15481 15482 15483
 To conclude, my questions are as follows:
1.  Must orted wait until all spawned processes exit?
2.  Is this behavior configurable? What if I want orted to exit immediately 
after any one of the spawned proecess exits?
3.  I did not find the specific logic about orted waiting for spawned 
proecesses to exit, hope I can get some hint from you.
 PS (scripts):

  thanks
Anhua
 

-- 
Jeff Squyres
jsquy...@cisco.com  






Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Peter Kjellström via users
On Wed, 19 May 2021 15:53:50 +0200
Pavel Mezentsev via users  wrote:

> It took some time but my colleague was able to build OpenMPI and get
> it working with OmniPath, however the performance is quite
> disappointing. The configuration line used was the
> following: ./configure --prefix=$INSTALL_PATH
> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
> --enable-shared --with-hwloc=$EBROOTHWLOC --with-psm2
> --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
> --without-orte --disable-oshmem --with-gpfs --with-slurm
> --with-pmix=external --with-libevent=external --with-ompi-pmix-rte
> 
> /usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2
> xenv -L Architecture/KNL -L GCC -L OpenMPI env
> OMPI_MCA_btl_base_verbose="99" OMPI_MCA_mtl_base_verbose="99" numactl
> --physcpubind=1 ./osu_bw ...
...
> # OSU MPI Bandwidth Test v5.7
> # Size  Bandwidth (MB/s)
> 1   0.05
> 2   0.10
> 4   0.20
...
> 2097152   226.93
> 4194304   227.62

I don't know for sure what kind of performance you should get on KNL
with a single rank (but I would expect at least several GB/s) but on
non-KNL OpenMPI gets wirespeed on omnipath (11GB/s) in the above
benchmark.

Note that ib_write_bw runs on Verbs (same as openmpi btl openib). Verbs
is not fast on Omnipath (especially not on KNL).

As far as possible I would suggest PSM2 and disable all the ofi. But
that may not be possible with this specific OpenMPI version (I've only
used it with other versions).

/Peter K

> If I test directly with `ib_write_bw` I get
> #bytes #iterationsBW peak[MB/sec]BW average[MB/sec]
> MsgRate[Mpps]
> Conflicting CPU frequency values detected: 1498.727000 !=
> 1559.017000. CPU Frequency is not max.
>  65536  5000 2421.042064.33
> 0.033029


Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
Right. there was a reference counting issue in OMPI that required a change to 
PSM2 to properly fix. There's a configuration option to disable the reference 
count check at build time, although  I don't recall what the option is off the 
top of my head.

From: Carlson, Timothy S 
Sent: Wednesday, May 19, 2021 11:31 AM
To: Open MPI Users 
Cc: Heinz, Michael William 
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath

Just some more data from my OminPath based cluster.

There certainly was a change from 4.0.x to 4.1.x

With 4.0.1 I woud build openmpi with


./configure --with-psm2 --with-slurm --with-pmi=/usr



And while srun would spit out a warning, the performance was as expected.



srun -N 2 --ntasks-per-node=1 -A ops -p short mpi/pt2pt/osu_latency

-cut-

WARNING: There was an error initializing an OpenFabrics device.



  Local host:   n0005

  Local device: hfi1_0

--

# OSU MPI Latency Test v5.5

# Size  Latency (us)

0   1.13

1   1.13

2   1.13

4   1.13

8   1.13

16  1.49

32  1.49

64  1.39



-cut-



Similarly for bandwidth



327686730.96

655369801.56

131072  11887.62

262144  11959.18

524288  12062.57

1048576 12038.13

2097152 12048.90

4194304 12112.04



With 4.1.x it appears I need to upgrade my psm2 installation from what I have 
now



# rpm -qa | grep psm2

libpsm2-11.2.89-1.x86_64

libpsm2-devel-11.2.89-1.x86_64

libpsm2-compat-11.2.89-1.x86_64

libfabric-psm2-1.7.1-10.x86_64



because configure spit out this warning



WARNING: PSM2 needs to be version 11.2.173 or later. Disabling MTL



The above cluster is running IntelOPA 10.9.2



Tim

From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of "Heinz, Michael William via users" 
mailto:users@lists.open-mpi.org>>
Reply-To: Open MPI Users 
mailto:users@lists.open-mpi.org>>
Date: Wednesday, May 19, 2021 at 7:57 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: "Heinz, Michael William" 
mailto:michael.william.he...@cornelisnetworks.com>>
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath

Check twice before you click! This email originated from outside PNNL.

After thinking about this for a few more minutes, it occurred to me that you 
might be able to "fake" the required UUID support by passing it as a shell 
variable. For example:

export OMPI_MCA_orte_precondition_transports="0123456789ABCDEF-0123456789ABCDEF"

would probably do it. However, note that the format of the string must be 16 
hex digits, a hyphen, then 16 more hex digits. anything else will be rejected. 
Also, I have never tried doing this, YMMV.

From: Heinz, Michael William
Sent: Wednesday, May 19, 2021 10:35 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: RE: [OMPI users] unable to launch a job on a system with OmniPath

So, the bad news is that the PSM2 MTL requires ORTE - ORTE generates a UUID to 
identify the job across all nodes in the fabric, allowing processes to find 
each other over OPA at init time.

I believe the reason this works when you use OFI/libfabric is that libfabrice 
generates its own UUIDs.

From: users 
mailto:users-boun...@lists.open-mpi.org>> On 
Behalf Of Ralph Castain via users
Sent: Wednesday, May 19, 2021 10:19 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath

The original configure line is correct ("--without-orte") - just a typo in the 
later text.

You may be running into some issues with Slurm's built-in support for OMPI. Try 
running it with OMPI's "mpirun" instead and see if you get better performance. 
You'll have to reconfigure to remove the "--without-orte" and 
"--with-ompi-pmix-rte" options. I would also recommend removing the 
"--with-pmix=external --with-libevent=external --with-hwloc=xxx 
--with-libevent=xxx" entries.

In other words, get down to a vanilla installation so we know what we are 
dealing with - otherwise, it gets very hard to help you.


On May 19, 2021, at 7:09 AM, Jorge D'Elia via users 
mailto:users@lists.open-mpi.org>> wrote:

- Mensaje original -
De: "Pavel Mezentsev via users" 
mailto:users@lists.open-mpi.org>>
Para: users@lists.open-mpi.org
CC: "Pavel Mezentsev" 
mailto:pavel.mezent...@gmail.com>>
Enviado: Miércoles, 19 de Mayo 2021 10:53:50
Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath

It took some time but my colleague was able to build OpenMPI and get it
working with OmniPath, however the performance is quite disappointing.
The 

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
After thinking about this for a few more minutes, it occurred to me that you 
might be able to "fake" the required UUID support by passing it as a shell 
variable. For example:

export OMPI_MCA_orte_precondition_transports="0123456789ABCDEF-0123456789ABCDEF"

would probably do it. However, note that the format of the string must be 16 
hex digits, a hyphen, then 16 more hex digits. anything else will be rejected. 
Also, I have never tried doing this, YMMV.

From: Heinz, Michael William
Sent: Wednesday, May 19, 2021 10:35 AM
To: Open MPI Users 
Cc: Ralph Castain 
Subject: RE: [OMPI users] unable to launch a job on a system with OmniPath

So, the bad news is that the PSM2 MTL requires ORTE - ORTE generates a UUID to 
identify the job across all nodes in the fabric, allowing processes to find 
each other over OPA at init time.

I believe the reason this works when you use OFI/libfabric is that libfabrice 
generates its own UUIDs.

From: users 
mailto:users-boun...@lists.open-mpi.org>> On 
Behalf Of Ralph Castain via users
Sent: Wednesday, May 19, 2021 10:19 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath

The original configure line is correct ("--without-orte") - just a typo in the 
later text.

You may be running into some issues with Slurm's built-in support for OMPI. Try 
running it with OMPI's "mpirun" instead and see if you get better performance. 
You'll have to reconfigure to remove the "--without-orte" and 
"--with-ompi-pmix-rte" options. I would also recommend removing the 
"--with-pmix=external --with-libevent=external --with-hwloc=xxx 
--with-libevent=xxx" entries.

In other words, get down to a vanilla installation so we know what we are 
dealing with - otherwise, it gets very hard to help you.


On May 19, 2021, at 7:09 AM, Jorge D'Elia via users 
mailto:users@lists.open-mpi.org>> wrote:

- Mensaje original -
De: "Pavel Mezentsev via users" 
mailto:users@lists.open-mpi.org>>
Para: users@lists.open-mpi.org
CC: "Pavel Mezentsev" 
mailto:pavel.mezent...@gmail.com>>
Enviado: Miércoles, 19 de Mayo 2021 10:53:50
Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath

It took some time but my colleague was able to build OpenMPI and get it
working with OmniPath, however the performance is quite disappointing.
The configuration line used was the following: ./configure
--prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
--with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
--without-orte --disable-oshmem --with-gpfs --with-slurm
--with-pmix=external --with-libevent=external --with-ompi-pmix-rte

/usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
...
[node:18318] select: init of component ofi returned success
[node:18318] mca: base: components_register: registering framework mtl
components
[node:18318] mca: base: components_register: found loaded component ofi

[node:18318] mca: base: components_register: component ofi register
function successful
[node:18318] mca: base: components_open: opening mtl components

[node:18318] mca: base: components_open: found loaded component ofi

[node:18318] mca: base: components_open: component ofi open function
successful
[node:18318] mca:base:select: Auto-selecting mtl components
[node:18318] mca:base:select:(  mtl) Querying component [ofi]

[node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
to 25
[node:18318] mca:base:select:(  mtl) Selected component [ofi]

[node:18318] select: initializing mtl component ofi
[node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
...
# OSU MPI Bandwidth Test v5.7
# Size  Bandwidth (MB/s)
1   0.05
2   0.10
4   0.20
8   0.41
16  0.77
32  1.54
64  3.10
128 6.09
25612.39
51224.23
1024   46.85
2048   87.99
4096  100.72
8192  139.91
16384 173.67
32768 197.82
65536 210.15
131072215.76
262144214.39
524288219.23
1048576   223.53
2097152   226.93
4194304   227.62

If I test directly with `ib_write_bw` I get
#bytes #iterationsBW peak[MB/sec]BW average[MB/sec]
MsgRate[Mpps]
Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
Frequency is not max.
65536  5000 2421.042064.330.033029

I also tried adding `OMPI_MCA_mtl="psm2"` 

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
So, the bad news is that the PSM2 MTL requires ORTE - ORTE generates a UUID to 
identify the job across all nodes in the fabric, allowing processes to find 
each other over OPA at init time.

I believe the reason this works when you use OFI/libfabric is that libfabrice 
generates its own UUIDs.

From: users  On Behalf Of Ralph Castain via 
users
Sent: Wednesday, May 19, 2021 10:19 AM
To: Open MPI Users 
Cc: Ralph Castain 
Subject: Re: [OMPI users] unable to launch a job on a system with OmniPath

The original configure line is correct ("--without-orte") - just a typo in the 
later text.

You may be running into some issues with Slurm's built-in support for OMPI. Try 
running it with OMPI's "mpirun" instead and see if you get better performance. 
You'll have to reconfigure to remove the "--without-orte" and 
"--with-ompi-pmix-rte" options. I would also recommend removing the 
"--with-pmix=external --with-libevent=external --with-hwloc=xxx 
--with-libevent=xxx" entries.

In other words, get down to a vanilla installation so we know what we are 
dealing with - otherwise, it gets very hard to help you.



On May 19, 2021, at 7:09 AM, Jorge D'Elia via users 
mailto:users@lists.open-mpi.org>> wrote:

- Mensaje original -

De: "Pavel Mezentsev via users" 
mailto:users@lists.open-mpi.org>>
Para: users@lists.open-mpi.org
CC: "Pavel Mezentsev" 
mailto:pavel.mezent...@gmail.com>>
Enviado: Miércoles, 19 de Mayo 2021 10:53:50
Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath

It took some time but my colleague was able to build OpenMPI and get it
working with OmniPath, however the performance is quite disappointing.
The configuration line used was the following: ./configure
--prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
--with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
--without-orte --disable-oshmem --with-gpfs --with-slurm
--with-pmix=external --with-libevent=external --with-ompi-pmix-rte

/usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
...
[node:18318] select: init of component ofi returned success
[node:18318] mca: base: components_register: registering framework mtl
components
[node:18318] mca: base: components_register: found loaded component ofi

[node:18318] mca: base: components_register: component ofi register
function successful
[node:18318] mca: base: components_open: opening mtl components

[node:18318] mca: base: components_open: found loaded component ofi

[node:18318] mca: base: components_open: component ofi open function
successful
[node:18318] mca:base:select: Auto-selecting mtl components
[node:18318] mca:base:select:(  mtl) Querying component [ofi]

[node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
to 25
[node:18318] mca:base:select:(  mtl) Selected component [ofi]

[node:18318] select: initializing mtl component ofi
[node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
...
# OSU MPI Bandwidth Test v5.7
# Size  Bandwidth (MB/s)
1   0.05
2   0.10
4   0.20
8   0.41
16  0.77
32  1.54
64  3.10
128 6.09
25612.39
51224.23
1024   46.85
2048   87.99
4096  100.72
8192  139.91
16384 173.67
32768 197.82
65536 210.15
131072215.76
262144214.39
524288219.23
1048576   223.53
2097152   226.93
4194304   227.62

If I test directly with `ib_write_bw` I get
#bytes #iterationsBW peak[MB/sec]BW average[MB/sec]
MsgRate[Mpps]
Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
Frequency is not max.
65536  5000 2421.042064.330.033029

I also tried adding `OMPI_MCA_mtl="psm2"` however the job crashes in that
case:
```
Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in

the environment).
```
Which is a bit puzzling considering that OpenMPI was build with
`--witout-orte`

Dear Pavel, I can't help you but just in case in the text:


Which is a bit puzzling considering that OpenMPI was build with
`--witout-orte`

it should be `--without-orte` ??


Regards, Jorge D' Elia.
--
CIMEC (UNL-CONICET), 

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Ralph Castain via users
The original configure line is correct ("--without-orte") - just a typo in the 
later text.

You may be running into some issues with Slurm's built-in support for OMPI. Try 
running it with OMPI's "mpirun" instead and see if you get better performance. 
You'll have to reconfigure to remove the "--without-orte" and 
"--with-ompi-pmix-rte" options. I would also recommend removing the 
"--with-pmix=external --with-libevent=external --with-hwloc=xxx 
--with-libevent=xxx" entries.

In other words, get down to a vanilla installation so we know what we are 
dealing with - otherwise, it gets very hard to help you.


On May 19, 2021, at 7:09 AM, Jorge D'Elia via users mailto:users@lists.open-mpi.org> > wrote:

- Mensaje original -
De: "Pavel Mezentsev via users" mailto:users@lists.open-mpi.org> >
Para: users@lists.open-mpi.org  

CC: "Pavel Mezentsev" mailto:pavel.mezent...@gmail.com> >
Enviado: Miércoles, 19 de Mayo 2021 10:53:50
Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath

It took some time but my colleague was able to build OpenMPI and get it
working with OmniPath, however the performance is quite disappointing.
The configuration line used was the following: ./configure
--prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC

--with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
--without-orte --disable-oshmem --with-gpfs --with-slurm
--with-pmix=external --with-libevent=external --with-ompi-pmix-rte

/usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
...
[node:18318] select: init of component ofi returned success
[node:18318] mca: base: components_register: registering framework mtl
components
[node:18318] mca: base: components_register: found loaded component ofi

[node:18318] mca: base: components_register: component ofi register
function successful
[node:18318] mca: base: components_open: opening mtl components

[node:18318] mca: base: components_open: found loaded component ofi

[node:18318] mca: base: components_open: component ofi open function
successful
[node:18318] mca:base:select: Auto-selecting mtl components
[node:18318] mca:base:select:(  mtl) Querying component [ofi]

[node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
to 25
[node:18318] mca:base:select:(  mtl) Selected component [ofi]

[node:18318] select: initializing mtl component ofi
[node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
...
# OSU MPI Bandwidth Test v5.7
# Size  Bandwidth (MB/s)
1   0.05
2   0.10
4   0.20
8   0.41
16  0.77
32  1.54
64  3.10
128 6.09
256    12.39
512    24.23
1024   46.85
2048   87.99
4096  100.72
8192  139.91
16384 173.67
32768 197.82
65536 210.15
131072    215.76
262144    214.39
524288    219.23
1048576   223.53
2097152   226.93
4194304   227.62

If I test directly with `ib_write_bw` I get
#bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]
MsgRate[Mpps]
Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
Frequency is not max.
65536  5000 2421.04    2064.33    0.033029

I also tried adding `OMPI_MCA_mtl="psm2"` however the job crashes in that
case:
```
Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in

the environment).
```
Which is a bit puzzling considering that OpenMPI was build with
`--witout-orte`

Dear Pavel, I can't help you but just in case in the text:

Which is a bit puzzling considering that OpenMPI was build with
`--witout-orte`

it should be `--without-orte` ??


Regards, Jorge D' Elia.
--
CIMEC (UNL-CONICET), http://www.cimec.org.ar/  
Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, 
Paraje El Pozo, 3000, Santa Fe, ARGENTINA. 
Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169


What am I missing and how can I improve the performance?

Regards, Pavel Mezentsev.

On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William <
michael.william.he...@cornelisnetworks.com 
 > wrote:

*That warning is an annoying bit of cruft from the openib / verbs provider
that can be ignored. (Actually, I recommend using “—btl ^openib” to
suppress the warning.)*



*That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I’m
not sure that that’s the problem you’re hitting, though, because you really
haven’t 

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Jorge D'Elia via users
- Mensaje original -
> De: "Pavel Mezentsev via users" 
> Para: users@lists.open-mpi.org
> CC: "Pavel Mezentsev" 
> Enviado: Miércoles, 19 de Mayo 2021 10:53:50
> Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath
>
> It took some time but my colleague was able to build OpenMPI and get it
> working with OmniPath, however the performance is quite disappointing.
> The configuration line used was the following: ./configure
> --prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
> --host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
> --with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
> --without-orte --disable-oshmem --with-gpfs --with-slurm
> --with-pmix=external --with-libevent=external --with-ompi-pmix-rte
> 
> /usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
> Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
> OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
> ...
> [node:18318] select: init of component ofi returned success
> [node:18318] mca: base: components_register: registering framework mtl
> components
> [node:18318] mca: base: components_register: found loaded component ofi
> 
> [node:18318] mca: base: components_register: component ofi register
> function successful
> [node:18318] mca: base: components_open: opening mtl components
> 
> [node:18318] mca: base: components_open: found loaded component ofi
> 
> [node:18318] mca: base: components_open: component ofi open function
> successful
> [node:18318] mca:base:select: Auto-selecting mtl components
> [node:18318] mca:base:select:(  mtl) Querying component [ofi]
> 
> [node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
> to 25
> [node:18318] mca:base:select:(  mtl) Selected component [ofi]
> 
> [node:18318] select: initializing mtl component ofi
> [node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
> ...
> # OSU MPI Bandwidth Test v5.7
> # Size  Bandwidth (MB/s)
> 1   0.05
> 2   0.10
> 4   0.20
> 8   0.41
> 16  0.77
> 32  1.54
> 64  3.10
> 128 6.09
> 25612.39
> 51224.23
> 1024   46.85
> 2048   87.99
> 4096  100.72
> 8192  139.91
> 16384 173.67
> 32768 197.82
> 65536 210.15
> 131072215.76
> 262144214.39
> 524288219.23
> 1048576   223.53
> 2097152   226.93
> 4194304   227.62
> 
> If I test directly with `ib_write_bw` I get
> #bytes #iterationsBW peak[MB/sec]BW average[MB/sec]
> MsgRate[Mpps]
> Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
> Frequency is not max.
> 65536  5000 2421.042064.330.033029
> 
> I also tried adding `OMPI_MCA_mtl="psm2"` however the job crashes in that
> case:
> ```
> Error obtaining unique transport key from ORTE
> (orte_precondition_transports not present in
> 
> the environment).
> ```
> Which is a bit puzzling considering that OpenMPI was build with
> `--witout-orte`

Dear Pavel, I can't help you but just in case in the text:

> Which is a bit puzzling considering that OpenMPI was build with
> `--witout-orte`

it should be `--without-orte` ??


Regards, Jorge D' Elia.
--
CIMEC (UNL-CONICET), http://www.cimec.org.ar/
Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, 
Paraje El Pozo, 3000, Santa Fe, ARGENTINA. 
Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169


> What am I missing and how can I improve the performance?
> 
> Regards, Pavel Mezentsev.
> 
> On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William <
> michael.william.he...@cornelisnetworks.com> wrote:
> 
>> *That warning is an annoying bit of cruft from the openib / verbs provider
>> that can be ignored. (Actually, I recommend using “—btl ^openib” to
>> suppress the warning.)*
>>
>>
>>
>> *That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I’m
>> not sure that that’s the problem you’re hitting, though, because you really
>> haven’t provided a lot of information.*
>>
>>
>>
>> *I would suggest trying the following to see what happens:*
>>
>>
>>
>> *${PATH_TO_OMPI}/mpirun -mca mtl psm2 -mca btl ^openib -mca
>> mtl_base_verbose 99 -mca btl_base_verbose 99 -n ${N} -H ${HOSTS}
>> my_application*
>>
>>
>>
>> *This should give you detailed information on what transports were
>> selected and what happened next.*
>>
>>
>>
>> *Oh – and make sure your fabric is up with an opainfo or opareport
>> command, just to make sure.*
>>
>>
>>
>> *From:* users  *On Behalf Of *Pavel
>> Mezentsev via users
>> *Sent:* Monday, May 10, 2021 8:41 AM
>> *To:* users@lists.open-mpi.org
>> *Cc:* Pavel Mezentsev 
>> *Subject:* [OMPI users] 

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Pavel Mezentsev via users
It took some time but my colleague was able to build OpenMPI and get it
working with OmniPath, however the performance is quite disappointing.
The configuration line used was the following: ./configure
--prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
 --host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
--with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
--without-orte --disable-oshmem --with-gpfs --with-slurm
--with-pmix=external --with-libevent=external --with-ompi-pmix-rte

/usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
...
[node:18318] select: init of component ofi returned success
[node:18318] mca: base: components_register: registering framework mtl
components
[node:18318] mca: base: components_register: found loaded component ofi

[node:18318] mca: base: components_register: component ofi register
function successful
[node:18318] mca: base: components_open: opening mtl components

[node:18318] mca: base: components_open: found loaded component ofi

[node:18318] mca: base: components_open: component ofi open function
successful
[node:18318] mca:base:select: Auto-selecting mtl components
[node:18318] mca:base:select:(  mtl) Querying component [ofi]

[node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
to 25
[node:18318] mca:base:select:(  mtl) Selected component [ofi]

[node:18318] select: initializing mtl component ofi
[node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
...
# OSU MPI Bandwidth Test v5.7
# Size  Bandwidth (MB/s)
1   0.05
2   0.10
4   0.20
8   0.41
16  0.77
32  1.54
64  3.10
128 6.09
25612.39
51224.23
1024   46.85
2048   87.99
4096  100.72
8192  139.91
16384 173.67
32768 197.82
65536 210.15
131072215.76
262144214.39
524288219.23
1048576   223.53
2097152   226.93
4194304   227.62

If I test directly with `ib_write_bw` I get
#bytes #iterationsBW peak[MB/sec]BW average[MB/sec]
MsgRate[Mpps]
Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
Frequency is not max.
 65536  5000 2421.042064.330.033029

I also tried adding `OMPI_MCA_mtl="psm2"` however the job crashes in that
case:
```
Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in

the environment).
```
Which is a bit puzzling considering that OpenMPI was build with
`--witout-orte`

What am I missing and how can I improve the performance?

Regards, Pavel Mezentsev.

On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William <
michael.william.he...@cornelisnetworks.com> wrote:

> *That warning is an annoying bit of cruft from the openib / verbs provider
> that can be ignored. (Actually, I recommend using “—btl ^openib” to
> suppress the warning.)*
>
>
>
> *That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I’m
> not sure that that’s the problem you’re hitting, though, because you really
> haven’t provided a lot of information.*
>
>
>
> *I would suggest trying the following to see what happens:*
>
>
>
> *${PATH_TO_OMPI}/mpirun -mca mtl psm2 -mca btl ^openib -mca
> mtl_base_verbose 99 -mca btl_base_verbose 99 -n ${N} -H ${HOSTS}
> my_application*
>
>
>
> *This should give you detailed information on what transports were
> selected and what happened next.*
>
>
>
> *Oh – and make sure your fabric is up with an opainfo or opareport
> command, just to make sure.*
>
>
>
> *From:* users  *On Behalf Of *Pavel
> Mezentsev via users
> *Sent:* Monday, May 10, 2021 8:41 AM
> *To:* users@lists.open-mpi.org
> *Cc:* Pavel Mezentsev 
> *Subject:* [OMPI users] unable to launch a job on a system with OmniPath
>
>
>
> Hi!
>
> I'm working on a system with KNL and OmniPath and I'm trying to launch a
> job but it fails. Could someone please advise what parameters I need to add
> to make it work properly? At first I need to make it work within one node,
> however later I need to use multiple nodes and eventually I may need to
> switch to TCP to run a hybrid job where some nodes are connected via
> Infiniband and some nodes are connected via OmniPath.
>
>
>
> So far without any extra parameters I get:
> ```
> By default, for Open MPI 4.0 and later, infiniband ports on a device
> are not used by default.  The intent is to use UCX for these devices.
> You can override this policy by setting the btl_openib_allow_ib MCA
> parameter
> to true.
>
>   Local host:  XX
>   Local adapter:   hfi1_0