Re: [OMPI users] libnuma.so error
Hi Luis That's awkward, because if the numa/libnuma packages were properly installed, the softlink should have been created. Maybe check with "yum list |grep numa", then if something is missing use "yum installl ...". [Anyway, maybe the compute nodes use a different mechanism to pull their system image, separate from yum/dnf/apt/] Gus On Thu, Jul 20, 2023 at 4:00 AM Luis Cebamanos via users < users@lists.open-mpi.org> wrote: > Hi Gus, > > Yeap, I can see softlink is missing on the compute nodes. > > Thanks! > Luis > > On 19/07/2023 17:42, Gus Correa via users wrote: > > If it is installed, libunuma should be in: > /usr/lib64/libnuma.so > as a softlink to the actual number-versioned library. > In general the loader is configured to search for shared libraries > in /usr/lib64 ("ldd " may shed some light here). > > You can check if the numa packages are installed with: > yum list | grep numa (CentOS 7, RHEL 7) > dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc) > apt list | grep numa (Debian, Ubuntu) > > If not, you can install (or ask the system administrator to do it). > > I hope this helps, > Gus Correa > > > On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users < > users@lists.open-mpi.org> wrote: > >> It's not clear if that message is being emitted by Open MPI. >> >> It does say it's falling back to a different behavior if libnuma.so is >> not found, so it appears if it's treating it as a warning, not an error. >> -- >> *From:* users on behalf of Luis >> Cebamanos via users >> *Sent:* Wednesday, July 19, 2023 10:09 AM >> *To:* users@lists.open-mpi.org >> *Cc:* Luis Cebamanos >> *Subject:* [OMPI users] libnuma.so error >> >> Hello, >> >> I was wondering if anyone has ever seen the following runtime error: >> >> mpirun -np 32 ./hello >> . >> [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file >> or directory >> [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET >> manual. >> . >> >> The funny thing is that the binary is executed despite the errors. >> What could be causing it? >> >> Regards, >> Lusi >> > >
Re: [OMPI users] libnuma.so error
Hi Gus, Yeap, I can see softlink is missing on the compute nodes. Thanks! Luis On 19/07/2023 17:42, Gus Correa via users wrote: If it is installed, libunuma should be in: /usr/lib64/libnuma.so as a softlink to the actual number-versioned library. In general the loader is configured to search for shared libraries in /usr/lib64 ("ldd " may shed some light here). You can check if the numa packages are installed with: yum list | grep numa (CentOS 7, RHEL 7) dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc) apt list | grep numa (Debian, Ubuntu) If not, you can install (or ask the system administrator to do it). I hope this helps, Gus Correa On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users wrote: It's not clear if that message is being emitted by Open MPI. It does say it's falling back to a different behavior if libnuma.so is not found, so it appears if it's treating it as a warning, not an error. *From:* users on behalf of Luis Cebamanos via users *Sent:* Wednesday, July 19, 2023 10:09 AM *To:* users@lists.open-mpi.org *Cc:* Luis Cebamanos *Subject:* [OMPI users] libnuma.so error Hello, I was wondering if anyone has ever seen the following runtime error: mpirun -np 32 ./hello . [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual. . The funny thing is that the binary is executed despite the errors. What could be causing it? Regards, Lusi
Re: [OMPI users] libnuma.so error
If it is installed, libunuma should be in: /usr/lib64/libnuma.so as a softlink to the actual number-versioned library. In general the loader is configured to search for shared libraries in /usr/lib64 ("ldd " may shed some light here). You can check if the numa packages are installed with: yum list | grep numa (CentOS 7, RHEL 7) dnf list | grep numa (CentOS 8, RHEL 8, RockyLinux 8, Fedora, etc) apt list | grep numa (Debian, Ubuntu) If not, you can install (or ask the system administrator to do it). I hope this helps, Gus Correa On Wed, Jul 19, 2023 at 11:55 AM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > It's not clear if that message is being emitted by Open MPI. > > It does say it's falling back to a different behavior if libnuma.so is not > found, so it appears if it's treating it as a warning, not an error. > -- > *From:* users on behalf of Luis > Cebamanos via users > *Sent:* Wednesday, July 19, 2023 10:09 AM > *To:* users@lists.open-mpi.org > *Cc:* Luis Cebamanos > *Subject:* [OMPI users] libnuma.so error > > Hello, > > I was wondering if anyone has ever seen the following runtime error: > > mpirun -np 32 ./hello > . > [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file > or directory > [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET > manual. > . > > The funny thing is that the binary is executed despite the errors. > What could be causing it? > > Regards, > Lusi >
Re: [OMPI users] libnuma.so error
It's not clear if that message is being emitted by Open MPI. It does say it's falling back to a different behavior if libnuma.so is not found, so it appears if it's treating it as a warning, not an error. From: users on behalf of Luis Cebamanos via users Sent: Wednesday, July 19, 2023 10:09 AM To: users@lists.open-mpi.org Cc: Luis Cebamanos Subject: [OMPI users] libnuma.so error Hello, I was wondering if anyone has ever seen the following runtime error: mpirun -np 32 ./hello . [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual. . The funny thing is that the binary is executed despite the errors. What could be causing it? Regards, Lusi
Re: [OMPI users] libnuma.so error
Luis, That can happen if a component is linked with libnuma.so: Open MPI will fail to open it and try to fallback on an other one. You can run ldd on the mca_*.so components in the /.../lib/openmpi directory to figure out which is using libnuma.so and assess if it is needed or not. Cheers, Gilles On Wed, Jul 19, 2023 at 11:36 PM Luis Cebamanos via users < users@lists.open-mpi.org> wrote: > Hello, > > I was wondering if anyone has ever seen the following runtime error: > > mpirun -np 32 ./hello > . > [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file > or directory > [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET > manual. > . > > The funny thing is that the binary is executed despite the errors. > What could be causing it? > > Regards, > Lusi >
[OMPI users] libnuma.so error
Hello, I was wondering if anyone has ever seen the following runtime error: mpirun -np 32 ./hello . [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual. . The funny thing is that the binary is executed despite the errors. What could be causing it? Regards, Lusi