Re: [OMPI users] libnuma.so error

2023-07-20 Thread Gus Correa via users
Hi Luis

That's awkward, because if the numa/libnuma packages were properly
installed,
the softlink should have been created.
Maybe check with "yum list |grep numa", then if something is missing use
"yum installl ...".
[Anyway, maybe the compute nodes use a different mechanism to pull their
system image, separate from yum/dnf/apt/]

Gus

On Thu, Jul 20, 2023 at 4:00 AM Luis Cebamanos via users <
users@lists.open-mpi.org> wrote:

> Hi Gus,
>
> Yeap, I can see softlink is missing on the compute nodes.
>
> Thanks!
> Luis
>
> On 19/07/2023 17:42, Gus Correa via users wrote:
>
> If it is installed, libunuma should be in:
> /usr/lib64/libnuma.so
> as a softlink to the actual number-versioned  library.
> In general the loader is configured to search for shared libraries
> in /usr/lib64 ("ldd " may shed some light here).
>
> You can check if the numa packages are installed with:
> yum list | grep numa (CentOS 7, RHEL 7)
> dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc)
> apt list | grep numa (Debian, Ubuntu)
>
> If not, you can install (or ask the system administrator to do it).
>
> I hope this helps,
> Gus Correa
>
>
> On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users <
> users@lists.open-mpi.org> wrote:
>
>> It's not clear if that message is being emitted by Open MPI.
>>
>> It does say it's falling back to a different behavior if libnuma.so is
>> not found, so it appears if it's treating it as a warning, not an error.
>> --
>> *From:* users  on behalf of Luis
>> Cebamanos via users 
>> *Sent:* Wednesday, July 19, 2023 10:09 AM
>> *To:* users@lists.open-mpi.org 
>> *Cc:* Luis Cebamanos 
>> *Subject:* [OMPI users] libnuma.so error
>>
>> Hello,
>>
>> I was wondering if anyone has ever seen the following runtime error:
>>
>> mpirun -np 32 ./hello
>> .
>> [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file
>> or directory
>> [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET
>> manual.
>> .
>>
>> The funny thing is that the binary is executed despite the errors.
>> What could be causing it?
>>
>> Regards,
>> Lusi
>>
>
>


Re: [OMPI users] libnuma.so error

2023-07-20 Thread Luis Cebamanos via users

Hi Gus,

Yeap, I can see softlink is missing on the compute nodes.

Thanks!
Luis

On 19/07/2023 17:42, Gus Correa via users wrote:

If it is installed, libunuma should be in:
/usr/lib64/libnuma.so
as a softlink to the actual number-versioned  library.
In general the loader is configured to search for shared libraries
in /usr/lib64 ("ldd " may shed some light here).

You can check if the numa packages are installed with:
yum list | grep numa (CentOS 7, RHEL 7)
dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc)
apt list | grep numa (Debian, Ubuntu)

If not, you can install (or ask the system administrator to do it).

I hope this helps,
Gus Correa


On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users 
 wrote:


It's not clear if that message is being emitted by Open MPI.

It does say it's falling back to a different behavior if
libnuma.so is not found, so it appears if it's treating it as a
warning, not an error.

*From:* users  on behalf of Luis
Cebamanos via users 
*Sent:* Wednesday, July 19, 2023 10:09 AM
*To:* users@lists.open-mpi.org 
*Cc:* Luis Cebamanos 
*Subject:* [OMPI users] libnuma.so error
Hello,

I was wondering if anyone has ever seen the following runtime error:

mpirun -np 32 ./hello
.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such
file
or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to
GROUP_BY_SOCKET
manual.
.

The funny thing is that the binary is executed despite the errors.
What could be causing it?

Regards,
Lusi



Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gus Correa via users
If it is installed, libunuma should be in:
/usr/lib64/libnuma.so
as a softlink to the actual number-versioned  library.
In general the loader is configured to search for shared libraries
in /usr/lib64 ("ldd " may shed some light here).

You can check if the numa packages are installed with:
yum list | grep numa (CentOS 7, RHEL 7)
dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc)
apt list | grep numa (Debian, Ubuntu)

If not, you can install (or ask the system administrator to do it).

I hope this helps,
Gus Correa


On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:

> It's not clear if that message is being emitted by Open MPI.
>
> It does say it's falling back to a different behavior if libnuma.so is not
> found, so it appears if it's treating it as a warning, not an error.
> --
> *From:* users  on behalf of Luis
> Cebamanos via users 
> *Sent:* Wednesday, July 19, 2023 10:09 AM
> *To:* users@lists.open-mpi.org 
> *Cc:* Luis Cebamanos 
> *Subject:* [OMPI users] libnuma.so error
>
> Hello,
>
> I was wondering if anyone has ever seen the following runtime error:
>
> mpirun -np 32 ./hello
> .
> [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file
> or directory
> [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET
> manual.
> .
>
> The funny thing is that the binary is executed despite the errors.
> What could be causing it?
>
> Regards,
> Lusi
>


Re: [OMPI users] libnuma.so error

2023-07-19 Thread Jeff Squyres (jsquyres) via users
It's not clear if that message is being emitted by Open MPI.

It does say it's falling back to a different behavior if libnuma.so is not 
found, so it appears if it's treating it as a warning, not an error.

From: users  on behalf of Luis Cebamanos via 
users 
Sent: Wednesday, July 19, 2023 10:09 AM
To: users@lists.open-mpi.org 
Cc: Luis Cebamanos 
Subject: [OMPI users] libnuma.so error

Hello,

I was wondering if anyone has ever seen the following runtime error:

mpirun -np 32 ./hello
.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file
or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET
manual.
.

The funny thing is that the binary is executed despite the errors.
What could be causing it?

Regards,
Lusi


Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gilles Gouaillardet via users
Luis,

That can happen if a component is linked with libnuma.so:
Open MPI will fail to open it and try to fallback on an other one.

You can run ldd on the mca_*.so components in the /.../lib/openmpi directory
to figure out which is using libnuma.so and assess if it is needed or not.


Cheers,

Gilles

On Wed, Jul 19, 2023 at 11:36 PM Luis Cebamanos via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> I was wondering if anyone has ever seen the following runtime error:
>
> mpirun -np 32 ./hello
> .
> [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file
> or directory
> [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET
> manual.
> .
>
> The funny thing is that the binary is executed despite the errors.
> What could be causing it?
>
> Regards,
> Lusi
>


[OMPI users] libnuma.so error

2023-07-19 Thread Luis Cebamanos via users

Hello,

I was wondering if anyone has ever seen the following runtime error:

mpirun -np 32 ./hello
.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file 
or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET 
manual.

.

The funny thing is that the binary is executed despite the errors.
What could be causing it?

Regards,
Lusi