Re: [R] dyn.load(now = FALSE) not actually lazy?

2023-02-02 Thread Ivan Krylov
В Thu, 2 Feb 2023 13:35:39 +1100
Michael Milton  пишет:

> The gist of it is that, indeed, R starts to look for dependent
> libraries even if now=FALSE.

I don't think that R does that. Here's a program that should do the
same thing that R does when you call dyn.load(now = FALSE):

#include 
#include 

int main(int argc, char ** argv) {
if (argc != 2) {
printf("Usage: %s path/to/library.so\n", argv[0]);
return -1;
}

void * ptr = dlopen(argv[1], RTLD_LAZY);
if (!ptr) {
printf("%s\n", dlerror());
return 1;
}

dlclose(ptr);
return 0;
}

Link it with -ldl. If you launch it with LD_DEBUG=all (see also: the
output when you launch something with LD_DEBUG=help), it should give you
even more information about the shared object loading process.

> /stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so:
> error: symbol lookup error: undefined symbol: work_mem (fatal)

I think I have an explanation for this. work_mem turns out to be a
global variable, not a function:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/miscadmin.h;h=96b3a1e1a0770bdbc25701b5a41e2001a7222b3b;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l261
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/init/globals.c;h=1b1d8142548aaf1234fcf1fc92753b9a8b8b3537;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l125

Indeed, libpqwalreceiver includes the header where the variable is
declared and accesses it:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/libpqwalreceiver/libpqwalreceiver.c;h=560ec974fa71444f1cb90e5098cef3c712e5b758;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l968

I guess that there's no way for the dynamic loader to postpone the
resolution of an import that's a variable, because it has to be a
pointer where the code should be able to read and write data. For
functions, there's stubs in the procedure linkage table that can
resolve the symbol later because they are functions and can do anything.

I could be wrong about my explanation. If you'd like to learn more
about this topic, one way to do that would be to start with Ulrich
Drepper's articles: https://akkadia.org/drepper/dsohowto.pdf

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dyn.load(now = FALSE) not actually lazy?

2023-02-01 Thread Michael Milton
Hi Ivan,

Thanks for the suggestion.

I found that libpqwalreceiver.so, which is part of pgsql-15.1, made for a
simpler example of my issue. So, using your advice, I ran the following in
my shell:
LD_DEBUG=libs R -e
'dyn.load("/stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so",
now=FALSE)'

I have attached the relevant part of the output from this command below.
The gist of it is that, indeed, R starts to look for dependent libraries
even if now=FALSE. Maybe it is the case that, as Duncan suggests, this is
intended behaviour? But I think it's not ideal behaviour because otherwise
it requires you to topologically sort all library dependencies and load
them in the optimal order, which is very difficult to do. Is there someone
involved in R development who can explain this behaviour to me?

Cheers.

***

>
dyn.load("/stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so",
now=FALSE)
  1655: find library=libpq.so.5 [0]; searching
  1655:  search
path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib
(LD_LIBRARY_PATH)
  1655:   trying
file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libpq.so.5
  1655:   trying
file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libpq.so.5
  1655:   trying
file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libpq.so.5
  1655:   trying
file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libpq.so.5
  1655:   trying file=/usr/local/lib64/libpq.so.5
  1655:   trying
file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libpq.so.5
  1655:   trying
file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libpq.so.5
  1655:  search
path=/usr/pgsql-15/lib/tls/x86_64:/usr/pgsql-15/lib/tls:/usr/pgsql-15/lib/x86_64:/usr/pgsql-15/lib
 (RUNPATH from file
/stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so)
  1655:   trying file=/usr/pgsql-15/lib/tls/x86_64/libpq.so.5
  1655:   trying file=/usr/pgsql-15/lib/tls/libpq.so.5
  1655:   trying file=/usr/pgsql-15/lib/x86_64/libpq.so.5
  1655:   trying file=/usr/pgsql-15/lib/libpq.so.5
  1655:  search cache=/etc/ld.so.cache
  1655:   trying file=/lib64/libpq.so.5
  1655:
  1655: find library=libssl.so.10 [0]; searching
  1655:  search
path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib
(LD_LIBRARY_PATH)
  1655:   trying
file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libssl.so.10
  1655:   trying
file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libssl.so.10
  1655:   trying
file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libssl.so.10
  1655:   trying
file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libssl.so.10
  1655:   trying file=/usr/local/lib64/libssl.so.10
  1655:   trying
file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libssl.so.10
  1655:   trying
file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libssl.so.10
  1655:  search cache=/etc/ld.so.cache
  1655:   trying file=/lib64/libssl.so.10
  1655:
  1655: find library=libcrypto.so.10 [0]; searching
  1655:  search
path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib
(LD_LIBRARY_PATH)
  1655:   trying
file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libcrypto.so.10
  1655:   trying
file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libcrypto.so.10
  1655:   trying
file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libcrypto.so.10
  1655:   trying
file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libcrypto.so.10
  1655:   trying file=/usr/local/lib64/libcrypto.so.10
  1655:   trying
file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libcrypto.so.10
  1655:   trying
file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libcrypto.so.10
  1655:  search cache=/etc/ld.so.cache
  1655:   trying 

Re: [R] dyn.load(now = FALSE) not actually lazy?

2023-02-01 Thread Ivan Krylov
В Wed, 1 Feb 2023 14:16:54 +1100
Michael Milton  пишет:

> Is this a bug in the `dyn.load` implementation for R? If not, why is
> it behaving like this? What should I do about it?

On Unix-like systems, dyn.load forwards its arguments to dlopen(). It
should be possible to confirm with a debugger that R passes RTLD_NOW to
dlopen() when calling dyn.load(now = TRUE) and RTLD_LAZY when calling
dyn.load(now = FALSE).

I don't know for sure why the symbols are being resolved despite you
asked the linker not to. Did something in the system set the
LD_BIND_NOW environment variable? Do any of the libraries in the
dependency tree have any constructors (C++ or
__attribute__((constructor)) or otherwise mentioned in .ini* sections)
that rely on MKL being available at initialisation time?

If you launch R with the environment variable LD_DEBUG=libs set, the
debugging output may shine some light on the problem.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dyn.load(now = FALSE) not actually lazy?

2023-02-01 Thread Duncan Murdoch
According to the help page for dyn.load, I think this is within the 
allowed behaviour:


"now:
a logical controlling whether all symbols are resolved (and relocated) 
immediately the library is loaded or deferred until they are used. This 
control is useful for developers testing whether a library is complete 
and has all the necessary symbols, and for users to ignore missing 
symbols. Whether this has any effect is system-dependent."


It appears to be intended for a DLL that doesn't define all the symbols 
that your program will use, not for a DLL that has external references 
that can't be resolved.  And there's that last sentence.


I think for what you want, you'd have to write the DLL (i.e. libtorch) 
in such a way that it does delayed loading of its dependencies.


Duncan Murdoch

On 31/01/2023 10:16 p.m., Michael Milton wrote:

On Linux, if I have a .so file that has a dependency on another .so, and I
`dyn.load(now=FALSE)` the first one, R seems to try to resolve the symbols
immediately, causing the load to fail.

For example, I have `libtorch` installed on my HPC. Note that it links to
various libs such as `libcudart.so` and `libmkl_intel_lp64.so.2` which
aren't currently in my library path:

➜  ~ ldd
/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so
 linux-vdso.so.1 =>  (0x7ffcab58c000)
 libgomp.so.1 =>
/stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgomp.so.1
(0x7f8cb22bf000)
 libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8cb20a3000)
 libc10.so =>
/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libc10.so
(0x7f8cb1e2d000)
 libnuma.so.1 => /lib64/libnuma.so.1 (0x7f8cb1c21000)
 librt.so.1 => /lib64/librt.so.1 (0x7f8cb1a19000)
 libgcc_s.so.1 =>
/stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgcc_s.so.1
(0x7f8cb1801000)
 libdl.so.2 => /lib64/libdl.so.2 (0x7f8cb15fd000)
 libmkl_intel_lp64.so.2 => not found
 libmkl_gnu_thread.so.2 => not found
 libmkl_core.so.2 => not found
 libm.so.6 => /lib64/libm.so.6 (0x7f8cb12fb000)
 libcudart.so.11.0 => not found

Then in R, if I try to load that same file:




dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so",
now=FALSE)
Error in
dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so",
  :
   unable to load shared object
'/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so':
   libmkl_intel_lp64.so.2: cannot open shared object file: No such file or
directory

Is this a bug in the `dyn.load` implementation for R? If not, why is it
behaving like this? What should I do about it?

For reference, I'm on CentOS 7, with Linux
kernel 3.10.0-1160.81.1.el7.x86_64.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dyn.load(now = FALSE) not actually lazy?

2023-01-31 Thread Michael Milton
On Linux, if I have a .so file that has a dependency on another .so, and I
`dyn.load(now=FALSE)` the first one, R seems to try to resolve the symbols
immediately, causing the load to fail.

For example, I have `libtorch` installed on my HPC. Note that it links to
various libs such as `libcudart.so` and `libmkl_intel_lp64.so.2` which
aren't currently in my library path:

➜  ~ ldd
/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so
linux-vdso.so.1 =>  (0x7ffcab58c000)
libgomp.so.1 =>
/stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgomp.so.1
(0x7f8cb22bf000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8cb20a3000)
libc10.so =>
/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libc10.so
(0x7f8cb1e2d000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x7f8cb1c21000)
librt.so.1 => /lib64/librt.so.1 (0x7f8cb1a19000)
libgcc_s.so.1 =>
/stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgcc_s.so.1
(0x7f8cb1801000)
libdl.so.2 => /lib64/libdl.so.2 (0x7f8cb15fd000)
libmkl_intel_lp64.so.2 => not found
libmkl_gnu_thread.so.2 => not found
libmkl_core.so.2 => not found
libm.so.6 => /lib64/libm.so.6 (0x7f8cb12fb000)
libcudart.so.11.0 => not found

Then in R, if I try to load that same file:

>
dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so",
now=FALSE)
Error in
dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so",
 :
  unable to load shared object
'/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so':
  libmkl_intel_lp64.so.2: cannot open shared object file: No such file or
directory

Is this a bug in the `dyn.load` implementation for R? If not, why is it
behaving like this? What should I do about it?

For reference, I'm on CentOS 7, with Linux
kernel 3.10.0-1160.81.1.el7.x86_64.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.