Re: [R] dyn.load(now = FALSE) not actually lazy?
В Thu, 2 Feb 2023 13:35:39 +1100 Michael Milton пишет: > The gist of it is that, indeed, R starts to look for dependent > libraries even if now=FALSE. I don't think that R does that. Here's a program that should do the same thing that R does when you call dyn.load(now = FALSE): #include #include int main(int argc, char ** argv) { if (argc != 2) { printf("Usage: %s path/to/library.so\n", argv[0]); return -1; } void * ptr = dlopen(argv[1], RTLD_LAZY); if (!ptr) { printf("%s\n", dlerror()); return 1; } dlclose(ptr); return 0; } Link it with -ldl. If you launch it with LD_DEBUG=all (see also: the output when you launch something with LD_DEBUG=help), it should give you even more information about the shared object loading process. > /stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so: > error: symbol lookup error: undefined symbol: work_mem (fatal) I think I have an explanation for this. work_mem turns out to be a global variable, not a function: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/miscadmin.h;h=96b3a1e1a0770bdbc25701b5a41e2001a7222b3b;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l261 https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/init/globals.c;h=1b1d8142548aaf1234fcf1fc92753b9a8b8b3537;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l125 Indeed, libpqwalreceiver includes the header where the variable is declared and accesses it: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/libpqwalreceiver/libpqwalreceiver.c;h=560ec974fa71444f1cb90e5098cef3c712e5b758;hb=117d2604c2a59eb853e894410b94b4c453f8bd43#l968 I guess that there's no way for the dynamic loader to postpone the resolution of an import that's a variable, because it has to be a pointer where the code should be able to read and write data. For functions, there's stubs in the procedure linkage table that can resolve the symbol later because they are functions and can do anything. I could be wrong about my explanation. If you'd like to learn more about this topic, one way to do that would be to start with Ulrich Drepper's articles: https://akkadia.org/drepper/dsohowto.pdf -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dyn.load(now = FALSE) not actually lazy?
Hi Ivan, Thanks for the suggestion. I found that libpqwalreceiver.so, which is part of pgsql-15.1, made for a simpler example of my issue. So, using your advice, I ran the following in my shell: LD_DEBUG=libs R -e 'dyn.load("/stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so", now=FALSE)' I have attached the relevant part of the output from this command below. The gist of it is that, indeed, R starts to look for dependent libraries even if now=FALSE. Maybe it is the case that, as Duncan suggests, this is intended behaviour? But I think it's not ideal behaviour because otherwise it requires you to topologically sort all library dependencies and load them in the optimal order, which is very difficult to do. Is there someone involved in R development who can explain this behaviour to me? Cheers. *** > dyn.load("/stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so", now=FALSE) 1655: find library=libpq.so.5 [0]; searching 1655: search path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib (LD_LIBRARY_PATH) 1655: trying file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libpq.so.5 1655: trying file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libpq.so.5 1655: trying file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libpq.so.5 1655: trying file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libpq.so.5 1655: trying file=/usr/local/lib64/libpq.so.5 1655: trying file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libpq.so.5 1655: trying file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libpq.so.5 1655: search path=/usr/pgsql-15/lib/tls/x86_64:/usr/pgsql-15/lib/tls:/usr/pgsql-15/lib/x86_64:/usr/pgsql-15/lib (RUNPATH from file /stornext/System/data/tools/pgsql/pgsql-15.1/lib/libpqwalreceiver.so) 1655: trying file=/usr/pgsql-15/lib/tls/x86_64/libpq.so.5 1655: trying file=/usr/pgsql-15/lib/tls/libpq.so.5 1655: trying file=/usr/pgsql-15/lib/x86_64/libpq.so.5 1655: trying file=/usr/pgsql-15/lib/libpq.so.5 1655: search cache=/etc/ld.so.cache 1655: trying file=/lib64/libpq.so.5 1655: 1655: find library=libssl.so.10 [0]; searching 1655: search path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib (LD_LIBRARY_PATH) 1655: trying file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libssl.so.10 1655: trying file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libssl.so.10 1655: trying file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libssl.so.10 1655: trying file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libssl.so.10 1655: trying file=/usr/local/lib64/libssl.so.10 1655: trying file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libssl.so.10 1655: trying file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libssl.so.10 1655: search cache=/etc/ld.so.cache 1655: trying file=/lib64/libssl.so.10 1655: 1655: find library=libcrypto.so.10 [0]; searching 1655: search path=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64:/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib:/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib:/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib:/usr/local/lib64:/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server:/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib (LD_LIBRARY_PATH) 1655: trying file=/stornext/System/data/apps/gcc/gcc-11.1.0/lib64/libcrypto.so.10 1655: trying file=/stornext/System/data/apps/pcre2/pcre2-10.36-gcc-11.1.0/lib/libcrypto.so.10 1655: trying file=/stornext/System/data/tools/openSSL/openSSL-1.1.1n/lib/libcrypto.so.10 1655: trying file=/stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libcrypto.so.10 1655: trying file=/usr/local/lib64/libcrypto.so.10 1655: trying file=/stornext/System/data/tools/openjdk/openjdk-13.0.2/lib/server/libcrypto.so.10 1655: trying file=/stornext/System/data/apps/hdf5/hdf5-1.8.20/lib/libcrypto.so.10 1655: search cache=/etc/ld.so.cache 1655: trying
Re: [R] dyn.load(now = FALSE) not actually lazy?
В Wed, 1 Feb 2023 14:16:54 +1100 Michael Milton пишет: > Is this a bug in the `dyn.load` implementation for R? If not, why is > it behaving like this? What should I do about it? On Unix-like systems, dyn.load forwards its arguments to dlopen(). It should be possible to confirm with a debugger that R passes RTLD_NOW to dlopen() when calling dyn.load(now = TRUE) and RTLD_LAZY when calling dyn.load(now = FALSE). I don't know for sure why the symbols are being resolved despite you asked the linker not to. Did something in the system set the LD_BIND_NOW environment variable? Do any of the libraries in the dependency tree have any constructors (C++ or __attribute__((constructor)) or otherwise mentioned in .ini* sections) that rely on MKL being available at initialisation time? If you launch R with the environment variable LD_DEBUG=libs set, the debugging output may shine some light on the problem. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dyn.load(now = FALSE) not actually lazy?
According to the help page for dyn.load, I think this is within the allowed behaviour: "now: a logical controlling whether all symbols are resolved (and relocated) immediately the library is loaded or deferred until they are used. This control is useful for developers testing whether a library is complete and has all the necessary symbols, and for users to ignore missing symbols. Whether this has any effect is system-dependent." It appears to be intended for a DLL that doesn't define all the symbols that your program will use, not for a DLL that has external references that can't be resolved. And there's that last sentence. I think for what you want, you'd have to write the DLL (i.e. libtorch) in such a way that it does delayed loading of its dependencies. Duncan Murdoch On 31/01/2023 10:16 p.m., Michael Milton wrote: On Linux, if I have a .so file that has a dependency on another .so, and I `dyn.load(now=FALSE)` the first one, R seems to try to resolve the symbols immediately, causing the load to fail. For example, I have `libtorch` installed on my HPC. Note that it links to various libs such as `libcudart.so` and `libmkl_intel_lp64.so.2` which aren't currently in my library path: ➜ ~ ldd /stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so linux-vdso.so.1 => (0x7ffcab58c000) libgomp.so.1 => /stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgomp.so.1 (0x7f8cb22bf000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8cb20a3000) libc10.so => /stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libc10.so (0x7f8cb1e2d000) libnuma.so.1 => /lib64/libnuma.so.1 (0x7f8cb1c21000) librt.so.1 => /lib64/librt.so.1 (0x7f8cb1a19000) libgcc_s.so.1 => /stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgcc_s.so.1 (0x7f8cb1801000) libdl.so.2 => /lib64/libdl.so.2 (0x7f8cb15fd000) libmkl_intel_lp64.so.2 => not found libmkl_gnu_thread.so.2 => not found libmkl_core.so.2 => not found libm.so.6 => /lib64/libm.so.6 (0x7f8cb12fb000) libcudart.so.11.0 => not found Then in R, if I try to load that same file: dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so", now=FALSE) Error in dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so", : unable to load shared object '/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so': libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory Is this a bug in the `dyn.load` implementation for R? If not, why is it behaving like this? What should I do about it? For reference, I'm on CentOS 7, with Linux kernel 3.10.0-1160.81.1.el7.x86_64. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dyn.load(now = FALSE) not actually lazy?
On Linux, if I have a .so file that has a dependency on another .so, and I `dyn.load(now=FALSE)` the first one, R seems to try to resolve the symbols immediately, causing the load to fail. For example, I have `libtorch` installed on my HPC. Note that it links to various libs such as `libcudart.so` and `libmkl_intel_lp64.so.2` which aren't currently in my library path: ➜ ~ ldd /stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so linux-vdso.so.1 => (0x7ffcab58c000) libgomp.so.1 => /stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgomp.so.1 (0x7f8cb22bf000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f8cb20a3000) libc10.so => /stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libc10.so (0x7f8cb1e2d000) libnuma.so.1 => /lib64/libnuma.so.1 (0x7f8cb1c21000) librt.so.1 => /lib64/librt.so.1 (0x7f8cb1a19000) libgcc_s.so.1 => /stornext/System/data/apps/gcc/gcc-11.2.0/lib64/libgcc_s.so.1 (0x7f8cb1801000) libdl.so.2 => /lib64/libdl.so.2 (0x7f8cb15fd000) libmkl_intel_lp64.so.2 => not found libmkl_gnu_thread.so.2 => not found libmkl_core.so.2 => not found libm.so.6 => /lib64/libm.so.6 (0x7f8cb12fb000) libcudart.so.11.0 => not found Then in R, if I try to load that same file: > dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so", now=FALSE) Error in dyn.load("/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so", : unable to load shared object '/stornext/System/data/nvidia/libtorch-gpu/libtorch-gpu-1.12.1/lib/libtorch_cpu.so': libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory Is this a bug in the `dyn.load` implementation for R? If not, why is it behaving like this? What should I do about it? For reference, I'm on CentOS 7, with Linux kernel 3.10.0-1160.81.1.el7.x86_64. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.