[OMPI users] Special Hybrid placement in OpenMPI 4.0.5

2021-04-07 Thread Koren, Gabriel via users
Hello,
I'll like to do a special placement of an hybrid code in a cluster.

  1.  I'll like to select specific cores (0-5, 7-12, 14-16,18-21) etc
  2.  Scatter the MPI ranks evenly
  3.  Attach OMP's to each rank.
Using -cpu-set $cplist -npernode XX -cpus-per-rank YY is the right way? Will 
cpus-per-rank place the OMP's close to the corresponding rank MPI?
Thank you very much.
Gabriel


Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
Sorry – I did actually send a thank you to Gilles and John @ 8:48 local time 
but it looks like at some point in my conversation with Gilles we stopped 
CC’ing the list – which means John never saw my thank you.

So, “Thanks for the help, John!”

From: users  On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Wednesday, April 7, 2021 10:28 AM
To: John Hearns 
Cc: Jeff Squyres (jsquyres) ; Open MPI User's List 

Subject: Re: [OMPI users] Building Open-MPI with Intel C

:-)

For the web archives: Mike confirmed to me off-list that the non-interactive 
login setup was, indeed, the issue, and he's now good to go.



On Apr 7, 2021, at 10:09 AM, John Hearns 
mailto:hear...@gmail.com>> wrote:

Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis 
Networks.

On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH 
probably doesn't include the location of the Intel runtime.

E.g.

ssh othernode ldd /path/to/orted

Your shell startup files may well differentiate between interactive and 
non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc. 
differently).



On Apr 7, 2021, at 7:21 AM, John Hearns via users 
mailto:users@lists.open-mpi.org>> wrote:

Manually log into one of your nodes. Load the modules you use in a batch job. 
Run 'ldd' on your executable.
Start at the bottom and work upwards...

By the way, have you looked at using Easybuild? Would be good to have your 
input there maybe.


On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users 
mailto:users@lists.open-mpi.org>> wrote:
I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, 
installation goes fine, compiling test apps (the OSU benchmarks) goes fine…

but when I go to actually run an MPI app I get:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

Looking at orted, it does seem like the binary is linking correctly:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
ess_env_module.c at line 135
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 107
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 346
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
base/ess_base_std_orted.c at line 264
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--

and…

[awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
linux-vdso.so.1 (0x7fffc2ebf000)
libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
(0x7fdaa6404000)
libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
(0x7fdaa60bd000)
libopen-orted-mpir.so => 
/usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
libimf.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
 (0x7fdaa4494000)
libsvml.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
 (0x7fdaa29c4000)
libirng.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
 (0x7fdaa2659000)
libintlc.so.5 => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
 (0x7fdaa23e1000)

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Jeff Squyres (jsquyres) via users
:-)

For the web archives: Mike confirmed to me off-list that the non-interactive 
login setup was, indeed, the issue, and he's now good to go.


On Apr 7, 2021, at 10:09 AM, John Hearns 
mailto:hear...@gmail.com>> wrote:

Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis 
Networks.

On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH 
probably doesn't include the location of the Intel runtime.

E.g.

ssh othernode ldd /path/to/orted

Your shell startup files may well differentiate between interactive and 
non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc. 
differently).


On Apr 7, 2021, at 7:21 AM, John Hearns via users 
mailto:users@lists.open-mpi.org>> wrote:

Manually log into one of your nodes. Load the modules you use in a batch job. 
Run 'ldd' on your executable.
Start at the bottom and work upwards...

By the way, have you looked at using Easybuild? Would be good to have your 
input there maybe.


On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users 
mailto:users@lists.open-mpi.org>> wrote:
I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, 
installation goes fine, compiling test apps (the OSU benchmarks) goes fine…

but when I go to actually run an MPI app I get:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

Looking at orted, it does seem like the binary is linking correctly:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
ess_env_module.c at line 135
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 107
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 346
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
base/ess_base_std_orted.c at line 264
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--

and…

[awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
linux-vdso.so.1 (0x7fffc2ebf000)
libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
(0x7fdaa6404000)
libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
(0x7fdaa60bd000)
libopen-orted-mpir.so => 
/usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
libimf.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
 (0x7fdaa4494000)
libsvml.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
 (0x7fdaa29c4000)
libirng.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
 (0x7fdaa2659000)
libintlc.so.5 => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
 (0x7fdaa23e1000)
/lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)

Can anyone suggest what I’m forgetting to do?

---
Michael Heinz
Fabric Software Engineer, Cornelis Networks



--
Jeff Squyres
jsquy...@cisco.com



--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread John Hearns via users
Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis
Networks.

On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) 
wrote:

> Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH
> probably doesn't include the location of the Intel runtime.
>
> E.g.
>
> ssh othernode ldd /path/to/orted
>
> Your shell startup files may well differentiate between interactive and
> non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc.
> differently).
>
>
> On Apr 7, 2021, at 7:21 AM, John Hearns via users <
> users@lists.open-mpi.org> wrote:
>
> Manually log into one of your nodes. Load the modules you use in a batch
> job. Run 'ldd' on your executable.
> Start at the bottom and work upwards...
>
> By the way, have you looked at using Easybuild? Would be good to have your
> input there maybe.
>
>
> On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users <
> users@lists.open-mpi.org> wrote:
>
>> I’m having a heck of a time building OMPI with Intel C. Compilation goes
>> fine, installation goes fine, compiling test apps (the OSU benchmarks) goes
>> fine…
>>
>>
>>
>> but when I go to actually run an MPI app I get:
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2
>> -H awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x
>> LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
>>
>> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
>> libimf.so: cannot open shared object file: No such file or directory
>>
>> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
>> libimf.so: cannot open shared object file: No such file or directory
>>
>>
>>
>> Looking at orted, it does seem like the binary is linking correctly:
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ess_env_module.c at line 135
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file util/session_dir.c at line 107
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file util/session_dir.c at line 346
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file base/ess_base_std_orted.c at line 264
>>
>> --
>>
>> It looks like orte_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during orte_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>
>>   orte_session_dir failed
>>
>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>>
>> --
>>
>>
>>
>> and…
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
>>
>> linux-vdso.so.1 (0x7fffc2ebf000)
>>
>> libopen-rte.so.40 =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 (0x7fdaa6404000)
>>
>> libopen-pal.so.40 =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 (0x7fdaa60bd000)
>>
>> libopen-orted-mpir.so =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
>>
>> libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
>>
>> librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
>>
>> libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
>>
>> libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
>>
>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
>>
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
>>
>> libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
>>
>> libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
>>
>> libimf.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
>> (0x7fdaa4494000)
>>
>> libsvml.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
>> (0x7fdaa29c4000)
>>
>> libirng.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
>> (0x7fdaa2659000)
>>
>> libintlc.so.5 =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
>> (0x7fdaa23e1000)
>>
>> /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)
>>
>>
>>
>> Can anyone suggest what I’m forgetting to do?
>>
>>
>>
>> ---
>>
>> Michael Heinz
>> Fabric Software Engineer, Cornelis Networks
>>
>>
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>


Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Jeff Squyres (jsquyres) via users
Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH 
probably doesn't include the location of the Intel runtime.

E.g.

ssh othernode ldd /path/to/orted

Your shell startup files may well differentiate between interactive and 
non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc. 
differently).


On Apr 7, 2021, at 7:21 AM, John Hearns via users 
mailto:users@lists.open-mpi.org>> wrote:

Manually log into one of your nodes. Load the modules you use in a batch job. 
Run 'ldd' on your executable.
Start at the bottom and work upwards...

By the way, have you looked at using Easybuild? Would be good to have your 
input there maybe.


On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users 
mailto:users@lists.open-mpi.org>> wrote:
I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, 
installation goes fine, compiling test apps (the OSU benchmarks) goes fine…

but when I go to actually run an MPI app I get:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

Looking at orted, it does seem like the binary is linking correctly:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
ess_env_module.c at line 135
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 107
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 346
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
base/ess_base_std_orted.c at line 264
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--

and…

[awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
linux-vdso.so.1 (0x7fffc2ebf000)
libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
(0x7fdaa6404000)
libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
(0x7fdaa60bd000)
libopen-orted-mpir.so => 
/usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
libimf.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
 (0x7fdaa4494000)
libsvml.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
 (0x7fdaa29c4000)
libirng.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
 (0x7fdaa2659000)
libintlc.so.5 => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
 (0x7fdaa23e1000)
/lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)

Can anyone suggest what I’m forgetting to do?

---
Michael Heinz
Fabric Software Engineer, Cornelis Networks



--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread John Hearns via users
Manually log into one of your nodes. Load the modules you use in a batch
job. Run 'ldd' on your executable.
Start at the bottom and work upwards...

By the way, have you looked at using Easybuild? Would be good to have your
input there maybe.


On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users <
users@lists.open-mpi.org> wrote:

> I’m having a heck of a time building OMPI with Intel C. Compilation goes
> fine, installation goes fine, compiling test apps (the OSU benchmarks) goes
> fine…
>
>
>
> but when I go to actually run an MPI app I get:
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2
> -H awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x
> LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
>
> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
> libimf.so: cannot open shared object file: No such file or directory
>
> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
> libimf.so: cannot open shared object file: No such file or directory
>
>
>
> Looking at orted, it does seem like the binary is linking correctly:
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> ess_env_module.c at line 135
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
> util/session_dir.c at line 107
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
> util/session_dir.c at line 346
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
> base/ess_base_std_orted.c at line 264
>
> --
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>
>   orte_session_dir failed
>
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>
> --
>
>
>
> and…
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
>
> linux-vdso.so.1 (0x7fffc2ebf000)
>
> libopen-rte.so.40 =>
> /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 (0x7fdaa6404000)
>
> libopen-pal.so.40 =>
> /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 (0x7fdaa60bd000)
>
> libopen-orted-mpir.so =>
> /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
>
> libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
>
> librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
>
> libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
>
> libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
>
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
>
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
>
> libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
>
> libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
>
> libimf.so =>
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
> (0x7fdaa4494000)
>
> libsvml.so =>
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
> (0x7fdaa29c4000)
>
> libirng.so =>
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
> (0x7fdaa2659000)
>
> libintlc.so.5 =>
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
> (0x7fdaa23e1000)
>
> /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)
>
>
>
> Can anyone suggest what I’m forgetting to do?
>
>
>
> ---
>
> Michael Heinz
> Fabric Software Engineer, Cornelis Networks
>
>
>


Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Heinz, Michael William via users
Giles,

I’ll double check - but the intel runtime is installed on all machines in the 
fabric.

-
Michael Heinz
michael.william.he...@cornelisnetworks.com

On Apr 7, 2021, at 2:42 AM, Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>> wrote:

Michael,

orted is able to find its dependencies to the Intel runtime on the
host where you sourced the environment.
However, it is unlikely able to do it on a remote host
For example
ssh ... ldd `which opted`
will likely fail.

An option is to use -rpath (and add the path to the Intel runtime).
IIRC, there is also an option in the Intel compiler to statically link
to the runtime.

Cheers,

Gilles

On Wed, Apr 7, 2021 at 9:00 AM Heinz, Michael William via users
mailto:users@lists.open-mpi.org>> wrote:

I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, 
installation goes fine, compiling test apps (the OSU benchmarks) goes fine…



but when I go to actually run an MPI app I get:



[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname

/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory



Looking at orted, it does seem like the binary is linking correctly:



[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted

[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
ess_env_module.c at line 135

[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 107

[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 346

[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
base/ess_base_std_orted.c at line 264

--

It looks like orte_init failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems.  This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):



 orte_session_dir failed

 --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS

--



and…



[awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted

   linux-vdso.so.1 (0x7fffc2ebf000)

   libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
(0x7fdaa6404000)

   libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
(0x7fdaa60bd000)

   libopen-orted-mpir.so => 
/usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)

   libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)

   librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)

   libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)

   libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)

   libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)

   libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)

   libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)

   libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)

   libimf.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
 (0x7fdaa4494000)

   libsvml.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
 (0x7fdaa29c4000)

   libirng.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
 (0x7fdaa2659000)

   libintlc.so.5 => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
 (0x7fdaa23e1000)

   /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)



Can anyone suggest what I’m forgetting to do?



---

Michael Heinz
Fabric Software Engineer, Cornelis Networks





Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Gilles Gouaillardet via users
Michael,

orted is able to find its dependencies to the Intel runtime on the
host where you sourced the environment.
However, it is unlikely able to do it on a remote host
For example
ssh ... ldd `which opted`
will likely fail.

An option is to use -rpath (and add the path to the Intel runtime).
IIRC, there is also an option in the Intel compiler to statically link
to the runtime.

Cheers,

Gilles

On Wed, Apr 7, 2021 at 9:00 AM Heinz, Michael William via users
 wrote:
>
> I’m having a heck of a time building OMPI with Intel C. Compilation goes 
> fine, installation goes fine, compiling test apps (the OSU benchmarks) goes 
> fine…
>
>
>
> but when I go to actually run an MPI app I get:
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
> awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
> LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
>
> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
> libimf.so: cannot open shared object file: No such file or directory
>
> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
> libimf.so: cannot open shared object file: No such file or directory
>
>
>
> Looking at orted, it does seem like the binary is linking correctly:
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
> ess_env_module.c at line 135
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> util/session_dir.c at line 107
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> util/session_dir.c at line 346
>
> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> base/ess_base_std_orted.c at line 264
>
> --
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>
>   orte_session_dir failed
>
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>
> --
>
>
>
> and…
>
>
>
> [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
>
> linux-vdso.so.1 (0x7fffc2ebf000)
>
> libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
> (0x7fdaa6404000)
>
> libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
> (0x7fdaa60bd000)
>
> libopen-orted-mpir.so => 
> /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x7fdaa5ebb000)
>
> libm.so.6 => /lib64/libm.so.6 (0x7fdaa5b39000)
>
> librt.so.1 => /lib64/librt.so.1 (0x7fdaa5931000)
>
> libutil.so.1 => /lib64/libutil.so.1 (0x7fdaa572d000)
>
> libz.so.1 => /lib64/libz.so.1 (0x7fdaa5516000)
>
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fdaa52fe000)
>
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7fdaa50de000)
>
> libc.so.6 => /lib64/libc.so.6 (0x7fdaa4d1b000)
>
> libdl.so.2 => /lib64/libdl.so.2 (0x7fdaa4b17000)
>
> libimf.so => 
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
>  (0x7fdaa4494000)
>
> libsvml.so => 
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
>  (0x7fdaa29c4000)
>
> libirng.so => 
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
>  (0x7fdaa2659000)
>
> libintlc.so.5 => 
> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
>  (0x7fdaa23e1000)
>
> /lib64/ld-linux-x86-64.so.2 (0x7fdaa66d6000)
>
>
>
> Can anyone suggest what I’m forgetting to do?
>
>
>
> ---
>
> Michael Heinz
> Fabric Software Engineer, Cornelis Networks
>
>