Thank you very much for all the suggestions.

1) Sadly setting
`OMPI_MCA_orte_precondition_transports=”0123456789ABCDEF-0123456789ABCDEF`
did not help, still got the same error about not getting this piece of info
from ORTE

2) I rebuilt OpenMPI without slurm. Don't remember the exact message but
`configure` told me that libevent is necessary for PSM2 so I had to add it,
ended up with the following options:
Configure command line: '--prefix=${BUILD_PATH}'
                          '--build=x86_64-pc-linux-gnu'


                          '--host=x86_64-pc-linux-gnu' '--enable-shared'
                          '--with-hwloc=${HWLOC_PATH}'
                          '--with-psm2' '--disable-oshmem' '--with-gpfs'
                          '--with-libevent=${LIBEVENT_PATH}'

With this MPI I was able to achieve expected results:
mpirun -np 2 --hostfile hostfile --map-by node ./osu_bw

--------------------------------------------------------------------------

WARNING: There was an error initializing an OpenFabrics device.



  Local host:   node01

  Local device: hfi1_0

--------------------------------------------------------------------------

# OSU MPI Bandwidth Test v5.7

# Size      Bandwidth (MB/s)

1                       1.12

2                       2.24

4                       4.59

8                       8.92

16                     16.54

32                     33.78

64                     71.79

128                   138.35

256                   249.74

512                   421.12

1024                  719.52

2048                 1258.79

4096                 2034.16

8192                 2021.40

16384                2269.95

32768                2573.85

65536                2749.05

131072               3178.84

262144               7150.88

524288               9027.82

1048576             10586.48

2097152             11828.10

4194304             11910.87

[jrlogin01.jureca:11482] 1 more process has sent help message
help-mpi-btl-openib.txt / error in device init




[jrlogin01.jureca:11482] Set MCA parameter "orte_base_help_aggregate" to 0
to see all help / error messages

Latency also improved from 29.69 to 2.63

3) It would be nice to get it working with a build that supports both Slurm
and PSM2 but for the time being doing it without slurm support is also an
option for me.

4) This one is slightly off the original topic. Now I need to run an
application across two partitions: one has an Infiniband fabric and the
other has OmniPath, they are connected via gateways that have HCAs of both
types so they can route IP traffic from one network to another. What would
be the proper way to launch a job across the nodes that are connected to
different types of networks?
In the old days I would just specify `OMPI_MCA_btl=tcp,sm,self`. However
these days it's a bit more complicated. For each partition I have a
different build of OpenMPI: for IB I have one with UCX, for OmniPath I have
the one that I built recently. For UCX I could set `USX_TLS="tcbp,sm"` and
something to accomplish the same on the OmniPath side. Would the processes
launched by these two builds of OpenMPI be able to communicate with each
other?
Another idea that came to mind was to get an OpenMPI build that would not
have any high performance fabric support and would only work via TCP. So
any advice on how to accomplish my goal would be appreciated.

I realize that performance-wise that is going to be quite... sad. But
currently that's not the main concern.

Regards, Pavel Mezentsev.

On Wed, May 19, 2021 at 5:40 PM Heinz, Michael William via users <
users@lists.open-mpi.org> wrote:

> *Right. there was a reference counting issue in OMPI that required a
> change to PSM2 to properly fix. There’s a configuration option to disable
> the reference count check at build time, although  I don’t recall what the
> option is off the top of my head.*
>
>
>
> *From:* Carlson, Timothy S <timothy.carl...@pnnl.gov>
> *Sent:* Wednesday, May 19, 2021 11:31 AM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Cc:* Heinz, Michael William <michael.william.he...@cornelisnetworks.com>
> *Subject:* Re: [OMPI users] unable to launch a job on a system with
> OmniPath
>
>
>
> Just some more data from my OminPath based cluster.
>
>
>
> There certainly was a change from 4.0.x to 4.1.x
>
>
>
> With 4.0.1 I woud build openmpi with
>
>
>
> ./configure --with-psm2 --with-slurm --with-pmi=/usr
>
>
>
> And while srun would spit out a warning, the performance was as expected.
>
>
>
> srun -N 2 --ntasks-per-node=1 -A ops -p short mpi/pt2pt/osu_latency
>
> -cut-
>
> WARNING: There was an error initializing an OpenFabrics device.
>
>
>
>   Local host:   n0005
>
>   Local device: hfi1_0
>
> --------------------------------------------------------------------------
>
> # OSU MPI Latency Test v5.5
>
> # Size          Latency (us)
>
> 0                       1.13
>
> 1                       1.13
>
> 2                       1.13
>
> 4                       1.13
>
> 8                       1.13
>
> 16                      1.49
>
> 32                      1.49
>
> 64                      1.39
>
>
>
> -cut-
>
>
>
> Similarly for bandwidth
>
>
>
> 32768                6730.96
>
> 65536                9801.56
>
> 131072              11887.62
>
> 262144              11959.18
>
> 524288              12062.57
>
> 1048576             12038.13
>
> 2097152             12048.90
>
> 4194304             12112.04
>
>
>
> With 4.1.x it appears I need to upgrade my psm2 installation from what I
> have now
>
>
>
> # rpm -qa | grep psm2
>
> lib*psm2*-11.2.89-1.x86_64
>
> lib*psm2*-devel-11.2.89-1.x86_64
>
> lib*psm2*-compat-11.2.89-1.x86_64
>
> libfabric-*psm2*-1.7.1-10.x86_64
>
>
>
> because configure spit out this warning
>
>
>
> WARNING: PSM2 needs to be version *11*.2.173 or later. Disabling MTL
>
>
>
> The above cluster is running IntelOPA *10.9.2*
>
>
>
> Tim
>
>
>
> *From: *users <users-boun...@lists.open-mpi.org> on behalf of "Heinz,
> Michael William via users" <users@lists.open-mpi.org>
> *Reply-To: *Open MPI Users <users@lists.open-mpi.org>
> *Date: *Wednesday, May 19, 2021 at 7:57 AM
> *To: *Open MPI Users <users@lists.open-mpi.org>
> *Cc: *"Heinz, Michael William" <michael.william.he...@cornelisnetworks.com
> >
> *Subject: *Re: [OMPI users] unable to launch a job on a system with
> OmniPath
>
>
>
> Check twice before you click! This email originated from outside PNNL.
>
>
>
> *After thinking about this for a few more minutes, it occurred to me that
> you might be able to “fake” the required UUID support by passing it as a
> shell variable. For example:*
>
>
>
> *export
> OMPI_MCA_orte_precondition_transports=”0123456789ABCDEF-0123456789ABCDEF” *
>
>
>
> *would probably do it. However, note that the format of the string must be
> 16 hex digits, a hyphen, then 16 more hex digits. anything else will be
> rejected. Also, I have never tried doing this, YMMV.*
>
>
>
> *From:* Heinz, Michael William
> *Sent:* Wednesday, May 19, 2021 10:35 AM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Cc:* Ralph Castain <r...@open-mpi.org>
> *Subject:* RE: [OMPI users] unable to launch a job on a system with
> OmniPath
>
>
>
> *So, the bad news is that the PSM2 MTL requires ORTE – ORTE generates a
> UUID to identify the job across all nodes in the fabric, allowing processes
> to find each other over OPA at init time.*
>
>
>
> *I believe the reason this works when you use OFI/libfabric is that
> libfabrice generates its own UUIDs.*
>
>
>
> *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Ralph
> Castain via users
> *Sent:* Wednesday, May 19, 2021 10:19 AM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Cc:* Ralph Castain <r...@open-mpi.org>
> *Subject:* Re: [OMPI users] unable to launch a job on a system with
> OmniPath
>
>
>
> The original configure line is correct ("--without-orte") - just a typo in
> the later text.
>
>
>
> You may be running into some issues with Slurm's built-in support for
> OMPI. Try running it with OMPI's "mpirun" instead and see if you get better
> performance. You'll have to reconfigure to remove the "--without-orte" and
> "--with-ompi-pmix-rte" options. I would also recommend removing the
> "--with-pmix=external --with-libevent=external --with-hwloc=xxx
> --with-libevent=xxx" entries.
>
>
>
> In other words, get down to a vanilla installation so we know what we are
> dealing with - otherwise, it gets very hard to help you.
>
>
>
>
>
> On May 19, 2021, at 7:09 AM, Jorge D'Elia via users <
> users@lists.open-mpi.org> wrote:
>
>
>
> ----- Mensaje original -----
>
> De: "Pavel Mezentsev via users" <users@lists.open-mpi.org>
> Para: users@lists.open-mpi.org
> CC: "Pavel Mezentsev" <pavel.mezent...@gmail.com>
> Enviado: Miércoles, 19 de Mayo 2021 10:53:50
> Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath
>
> It took some time but my colleague was able to build OpenMPI and get it
> working with OmniPath, however the performance is quite disappointing.
> The configuration line used was the following: ./configure
> --prefix=$INSTALL_PATH  --build=x86_64-pc-linux-gnu
> --host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
> --with-psm2 --with-ofi=$EBROOTLIBFABRIC --with-libevent=$EBROOTLIBEVENT
> --without-orte --disable-oshmem --with-gpfs --with-slurm
> --with-pmix=external --with-libevent=external --with-ompi-pmix-rte
>
> /usr/bin/srun --cpu-bind=none --mpi=pspmix --ntasks-per-node 1 -n 2 xenv -L
> Architecture/KNL -L GCC -L OpenMPI env OMPI_MCA_btl_base_verbose="99"
> OMPI_MCA_mtl_base_verbose="99" numactl --physcpubind=1 ./osu_bw
> ...
> [node:18318] select: init of component ofi returned success
> [node:18318] mca: base: components_register: registering framework mtl
> components
> [node:18318] mca: base: components_register: found loaded component ofi
>
> [node:18318] mca: base: components_register: component ofi register
> function successful
> [node:18318] mca: base: components_open: opening mtl components
>
> [node:18318] mca: base: components_open: found loaded component ofi
>
> [node:18318] mca: base: components_open: component ofi open function
> successful
> [node:18318] mca:base:select: Auto-selecting mtl components
> [node:18318] mca:base:select:(  mtl) Querying component [ofi]
>
> [node:18318] mca:base:select:(  mtl) Query of component [ofi] set priority
> to 25
> [node:18318] mca:base:select:(  mtl) Selected component [ofi]
>
> [node:18318] select: initializing mtl component ofi
> [node:18318] mtl_ofi_component.c:378: mtl:ofi:provider: hfi1_0
> ...
> # OSU MPI Bandwidth Test v5.7
> # Size      Bandwidth (MB/s)
> 1                       0.05
> 2                       0.10
> 4                       0.20
> 8                       0.41
> 16                      0.77
> 32                      1.54
> 64                      3.10
> 128                     6.09
> 256                    12.39
> 512                    24.23
> 1024                   46.85
> 2048                   87.99
> 4096                  100.72
> 8192                  139.91
> 16384                 173.67
> 32768                 197.82
> 65536                 210.15
> 131072                215.76
> 262144                214.39
> 524288                219.23
> 1048576               223.53
> 2097152               226.93
> 4194304               227.62
>
> If I test directly with `ib_write_bw` I get
> #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
> MsgRate[Mpps]
> Conflicting CPU frequency values detected: 1498.727000 != 1559.017000. CPU
> Frequency is not max.
> 65536      5000             2421.04            2064.33            0.033029
>
> I also tried adding `OMPI_MCA_mtl="psm2"` however the job crashes in that
> case:
> ```
> Error obtaining unique transport key from ORTE
> (orte_precondition_transports not present in
>
> the environment).
> ```
> Which is a bit puzzling considering that OpenMPI was build with
> `--witout-orte`
>
>
> Dear Pavel, I can't help you but just in case in the text:
>
> Which is a bit puzzling considering that OpenMPI was build with
> `--witout-orte`
>
>
> it should be `--without-orte` ??
>
>
> Regards, Jorge D' Elia.
> --
> CIMEC (UNL-CONICET), http://www.cimec.org.ar/
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cimec.org.ar%2F&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C716218ae48c54cdf446a08d91adb2859%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637570350863467474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=B7l2zPvC9%2B%2B5gn4t4F%2FAXzWsGhkiCc%2FfFxGbY34Pc34%3D&reserved=0>
> Predio CONICET-Santa Fe, Colec. Ruta Nac. 168,
> Paraje El Pozo, 3000, Santa Fe, ARGENTINA.
> Tel +54-342-4511594/95 ext 7062, fax: +54-342-4511169
>
>
> What am I missing and how can I improve the performance?
>
> Regards, Pavel Mezentsev.
>
> On Mon, May 10, 2021 at 6:20 PM Heinz, Michael William <
> michael.william.he...@cornelisnetworks.com> wrote:
>
> *That warning is an annoying bit of cruft from the openib / verbs provider
> that can be ignored. (Actually, I recommend using “—btl ^openib” to
> suppress the warning.)*
>
>
>
> *That said, there is a known issue with selecting PSM2 and OMPI 4.1.0. I’m
> not sure that that’s the problem you’re hitting, though, because you really
> haven’t provided a lot of information.*
>
>
>
> *I would suggest trying the following to see what happens:*
>
>
>
> *${PATH_TO_OMPI}/mpirun -mca mtl psm2 -mca btl ^openib -mca
> mtl_base_verbose 99 -mca btl_base_verbose 99 -n ${N} -H ${HOSTS}
> my_application*
>
>
>
> *This should give you detailed information on what transports were
> selected and what happened next.*
>
>
>
> *Oh – and make sure your fabric is up with an opainfo or opareport
> command, just to make sure.*
>
>
>
> *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Pavel
> Mezentsev via users
> *Sent:* Monday, May 10, 2021 8:41 AM
> *To:* users@lists.open-mpi.org
> *Cc:* Pavel Mezentsev <pavel.mezent...@gmail.com>
> *Subject:* [OMPI users] unable to launch a job on a system with OmniPath
>
>
>
> Hi!
>
> I'm working on a system with KNL and OmniPath and I'm trying to launch a
> job but it fails. Could someone please advise what parameters I need to add
> to make it work properly? At first I need to make it work within one node,
> however later I need to use multiple nodes and eventually I may need to
> switch to TCP to run a hybrid job where some nodes are connected via
> Infiniband and some nodes are connected via OmniPath.
>
>
>
> So far without any extra parameters I get:
> ```
> By default, for Open MPI 4.0 and later, infiniband ports on a device
> are not used by default.  The intent is to use UCX for these devices.
> You can override this policy by setting the btl_openib_allow_ib MCA
> parameter
> to true.
>
>  Local host:              XXXXXX
>  Local adapter:           hfi1_0
>  Local port:              1
> ```
>
> If I add `OMPI_MCA_btl_openib_allow_ib="true"` then I get:
> ```
> Error obtaining unique transport key from ORTE
> (orte_precondition_transports not present in
> the environment).
>
>  Local host: XXXXXX
>
> ```
> Then I tried adding OMPI_MCA_mtl="psm2" or OMPI_MCA_mtl="ofi" to make it
> use omnipath or OMPI_MCA_btl="sm,self" to make it use only shared memory.
> But these parameters did not make any difference.
> There does not seem to be much omni-path related documentation, at least I
> was not able to find anything that would help me but perhaps I missed
> something:
> https://www.open-mpi.org/faq/?category=running#opa-support
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Drunning%23opa-support&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C716218ae48c54cdf446a08d91adb2859%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637570350863467474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=u%2Beae5QdATAEN%2BtZ8FU8krXyVGlaYoE%2BzX6hhfRGtd0%3D&reserved=0>
> <
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Drunning%23opa-support&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C57fa32f71d054ebd6a5a08d913cd8fbf%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637562595871907805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kJ830bXfZmIMEg4hJkdEw8D6lw66aooAjHMpLL7NZ8c%3D&reserved=0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Drunning%23opa-support&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C716218ae48c54cdf446a08d91adb2859%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637570350863477424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Au4jkZX4Q5hbtRCoV7sDWbjOZMCbhfGOeVXRyrzvrXs%3D&reserved=0>
> >
> https://www.open-mpi.org/faq/?category=opa
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Dopa&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C716218ae48c54cdf446a08d91adb2859%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637570350863477424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WczRN4S6LfXBr6%2BUKMCe1L8TgjjXGL4PoEhCxnD1%2FFg%3D&reserved=0>
> <
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Dopa&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C57fa32f71d054ebd6a5a08d913cd8fbf%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637562595871907805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SavN0pUsMxdufMBzrTyqSNCNHTVRMA1EUqlcWUMDcBo%3D&reserved=0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-mpi.org%2Ffaq%2F%3Fcategory%3Dopa&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7C716218ae48c54cdf446a08d91adb2859%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637570350863477424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WczRN4S6LfXBr6%2BUKMCe1L8TgjjXGL4PoEhCxnD1%2FFg%3D&reserved=0>
> >
>
>
>
> This is the `configure` line:
>
> ```
> ./configure --prefix=XXXXX --build=x86_64-pc-linux-gnu
> --host=x86_64-pc-linux-gnu --enable-shared --with-hwloc=$EBROOTHWLOC
> --with-psm2 --with-libevent=$EBROOTLIBEVENT --without-orte --disable-oshmem
> --with-cuda=$EBROOTCUDA --with-gpfs --with-slurm --with-pmix=external
> --with-libevent=external --with-ompi-pmix-rte
>
> ```
>
> Which also raises another question: if it was built with `--without-orte`
> then why do I get an error about failing to get something from ORTE.
>
> The OpenMPI version is `4.1.0rc1` built with `gcc-9.3.0`.
>
>
>
> Thank you in advance!
>
> Regards, Pavel Mezentsev.
>
>
>

Reply via email to