Re: Trivial program size inflation
> I'm sorta curious why linking against one of our small malloc > implementations still pulls in jemalloc: Using my shell on ftp.n.o (9.0_STABLE): cc -v says the ld line for cc -o null null.c -lgnumalloc is ld -plugin /usr/libexec/liblto_plugin.so -plugin-opt=/usr/libexec/lto-wrapper -plugin-opt=-fresolution=/tmp//ccCNGMXD.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -dc -dp -e _start -static -o null /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbeginT.o /tmp//ccU1uPbn.o -lgnumalloc -lgcc -lc -lgcc /usr/lib/crtend.o /usr/lib/crtn.o Expecting that LTO would be unnecessary here, I scrapped all that and manually ran it with -Map= to generate a link map, which among other things describes why each archive member is brought in: % ld -Map=null.map -dc -dp -e _start -static -o null /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbeginT.o null.o -lgnumalloc -lgcc -lc -lgcc /usr/lib/crtend.o /usr/lib/crtn.o Looking at the resulting null.map, I see, among many other lines, /usr/lib/libc.a(jemalloc.o) /usr/lib/libc.a(tls.o) (calloc) which makes sense: if nothing in null.o, crt0.o, crti.o, or crtbeginT.o refers to anything in the libgnumalloc file containing calloc, nothing will have brought it in from libgnumalloc. Then, when libc refers to it internally, -lgnumalloc is past and thus unavailable for resolving it, so it comes from the default malloc in libc. So I tried, instead, % ld -Map=null.map -dc -dp -e _start -static -o null /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbeginT.o null.o --whole-archive -lgnumalloc --no-whole-archive -lgcc -lc -lgcc /usr/lib/crtend.o /usr/lib/crtn.o to force all of libgnumalloc to be brought in. Sure enough, this time, "je" does not appear in the link map, and the executable is significantly smaller. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Trivial program size inflation
Taylor R Campbell wrote: > A quicker way to address most of it is to just define your own malloc: > > $ cat null.o > #include > void *malloc(size_t n) { return NULL; } > void *realloc(void *p, size_t n) { return NULL; } > void *calloc(size_t n, size_t sz) { return NULL; } > void free(void *p) {} > int main(void) { return 0; } > $ cc -g -O2 -static -o null null.c > $ size null >text data bss dec hex filename > 26724 32083184 33116815c null I'm sorta curious why linking against one of our small malloc implementations still pulls in jemalloc: thoreau 3633> echo 'int main(void) { return 0; }' > null.c thoreau 3634> cc -g -O2 -static -o null null.c -lgnumalloc thoreau 3635> size null textdata bss dec hex filename 582263 28928 2176553 2787744 2a89a0 null thoreau 3636> nm null | grep -c je_malloc 30 thoreau 3637> cc -g -O2 -static -o null null.c -lbsdmalloc thoreau 3638> size null textdata bss dec hex filename 582263 28928 2176553 2787744 2a89a0 null thoreau 3639> cat >> malloc.c #include void *malloc(size_t n) { return NULL; } void *realloc(void *p, size_t n) { return NULL; } void *calloc(size_t n, size_t sz) { return NULL; } void free(void *p) {} thoreau 3640> cc -g -O2 -static -o null null.c malloc.c thoreau 3641> size null textdata bss dec hex filename 330963224 3152 394729a30 null Is this something to do with the jemalloc constructor still getting called somehow? Using cc -v shows that libbsdmalloc is being linked against before libc: thoreau 3648> cc -v -g -O2 -static -o null null.c -lbsdmalloc ... ld -plugin ... -lbsdmalloc -lgcc -lc -lgcc ... Cheers, Simon.
Re: Trivial program size inflation
> Date: Sat, 1 Jul 2023 15:11:56 - (UTC) > From: mlel...@serpens.de (Michael van Elst) > > crt0 pulls in > - atexit > - environment > - static TLS > - stack guard > > which all more or less pull in jemalloc, stdio and string functions. > > You need to replace these with dummies (compile with -fno-common) > and of course, your program must not make use of the functionality... A quicker way to address most of it is to just define your own malloc: $ cat null.o #include void *malloc(size_t n) { return NULL; } void *realloc(void *p, size_t n) { return NULL; } void *calloc(size_t n, size_t sz) { return NULL; } void free(void *p) {} int main(void) { return 0; } $ cc -g -O2 -static -o null null.c $ size null textdata bss dec hex filename 2672432083184 33116815c null This still has printf, rbtree, string, atomic, , but not jemalloc, giving a ~20x size reduction from half a megabyte to 25 KB or so. If someone really wants to do the work to reduce the overhead without providing an alternative malloc, or reduce more than you get with an alternative malloc, here are some avenues that might be worth pursuing without incurring too much overhead: > int atexit(void) { return 0; }; The runtime startup logic, csu, relies on atexit. But perhaps csu could use an internal __atexit that reserves 4 or 5 static slots, and the libc atexit uses the last one to call handlers in slots that are dynamically allocated by malloc. As long as your program doesn't call atexit, this only uses a fixed amount of space from csu and won't bring in malloc. > char *__allocenvvar() { return 0; }; > bool __canoverwriteenvvar() { return true; }; > size_t __envvarnamelen() { return 0; }; > void *__findenv() { return 0; }; > void *__findenvvar() { return 0; }; > void __freeenvvar() { }; > ssize_t __getenvslot() { return 0; }; > void __libc_env_init() { }; > bool __readlockenv() { return true; }; > bool __unlockenv() { return true; }; > bool __writelockenv() { return false; }; Programs that use only getenv don't need any of the machinery to allocate environment slots. The logic that getenv uses could be isolated to its own .c file with no allocation. This more or less requires splitting up __getenvslot into two separate functions, one for the allocate=true case and the other for the allocate=false case, with a .h file to mediate the global state between the two .c files. __libc_env_init (which is what pulls all this in even if you don't use getenv, setenv, ) could perhaps be a weak symbol with a strong alias in the .c file that does allocation and modification. > void __libc_rtld_tls_allocate() { }; > void __libc_rtld_tls_free() { }; > void __libc_static_tls_setup() { }; > void __libc_tls_get_addr() { }; I'm stumped about this one. In principle the linker has enough information to decide whether __libc_static_tls_setup is needed (which is what, in _libc_init, pulls all this in), but in practice I don't know of any path that would let us conditionalize its use on whether the object has any static TLS relocations. Maybe rtld could be responsible for mmapping the initial thread's static TLS space so libc is not but I'm not sure if that will work without a lot of effort. > void __chk_fail() { }; > void __guard_setup() { }; > void __stack_chk_fail() { }; > void __stack_chk_fail_local() { }; > int __stack_chk_guard; This calls syslog_ss, which brings in xsyslog.c. Not sure if that brings in malloc or anything exciting beyond vsnprintf_ss (which itself shouldn't malloc or be exciting, since it has to be async-signal-safe). But if it does, maybe the call in stack_protector.c __fail to syslog_ss could be defined in terms of some weak symbol __stack_chk_log which is defined by xsyslog.c using syslog machinery, with a fallback to write to STDERR_FILENO; that way it only even tries to use syslog if anything else in the program already uses syslog. (But I'm not going to do this work, and I'm not sure if there's going to be a good way to kick malloc out of the static TLS business without toolchain and/or rtld support.)
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 06:52:55PM -0400, Mouse wrote: > > The CSU code has pretty much no idea on what the rest of the world is > > going to do. It does: [...] > > > There is no way with ELF to decide at link time which of those > > features are used by the program and therefore no way to remove any > > of them. > > I don't think that's true. > > If the program dynamically loads code, yes, it's true (or close > enough). But, for a statically linked program (as pointed out > upthread, dlopen doesn't work when linked static), it can be done, and > with quite low otherwise-useless overhead. > > Using pthreads as an example, consider this. All pthreads .o modules > refer to (not "call") a specific function. For concreteness, let's > call it _pthreads_csu_init. _pthreads_csu_init is off in its own file; > that routine does the CSU initialization for pthreads. The CSU code > makes a weak reference to _pthreads_csu_init, calling it only if the > weak reference was satisfied by a real definition. > > Of course the dynamically-linked form of pthreads would always define > _pthreads_csu_init - or, alternatively, dynamic linking would use a > different CSU, one which unconditionally calls _pthreads_csu_init as an > ordinary reference. There is no way to trigger GC of code based on the presence of certain sections or relocation types. That rules out dropping code to handle ifunc, global constructors, destructors as well as the TLS code. Making the PIE support conditional would be possible, but IMO is not worth the effort. The amount of code you get for pthread init again is so small, not worth the effort. Joerg
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 11:16:12PM +0200, Joerg Sonnenberger wrote: > > At least in 9.3, dlopen() in a static binary does not work. Try using > > a NSS module from a statically lunked binary to check that. > > It does work in the sense that it always fails. It fails silently, there not even an unhelpful error message. -- Emmanuel Dreyfus m...@netbsd.org
Re: Trivial program size inflation
> The CSU code has pretty much no idea on what the rest of the world is > going to do. It does: [...] > There is no way with ELF to decide at link time which of those > features are used by the program and therefore no way to remove any > of them. I don't think that's true. If the program dynamically loads code, yes, it's true (or close enough). But, for a statically linked program (as pointed out upthread, dlopen doesn't work when linked static), it can be done, and with quite low otherwise-useless overhead. Using pthreads as an example, consider this. All pthreads .o modules refer to (not "call") a specific function. For concreteness, let's call it _pthreads_csu_init. _pthreads_csu_init is off in its own file; that routine does the CSU initialization for pthreads. The CSU code makes a weak reference to _pthreads_csu_init, calling it only if the weak reference was satisfied by a real definition. Of course the dynamically-linked form of pthreads would always define _pthreads_csu_init - or, alternatively, dynamic linking would use a different CSU, one which unconditionally calls _pthreads_csu_init as an ordinary reference. Is it worth doing? That's a separate question. But I have no real doubt that it could be done if desired. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 05:04:10PM +0200, Martin Husemann wrote: > The other things that we *might* look into (if someone volunteers) is to > better modularize the CSU code, but it is not immediately clear how > that could/should be done. The CSU code has pretty much no idea on what the rest of the world is going to do. It does: - support self-relocation for PIE on supported archs - set up profiling - handle ifunc support on supported architectures - support global destructors via atexit - call the libc init - SSP init - setup atomic emulation - setup TLS on supported architectures - initialize pthread (emulation) - initialize environment There is no way with ELF to decide at link time which of those features are used by the program and therefore no way to remove any of them. If you care about static executable size enough and for some reason have programs that don't use stdio, you might get away by replacing jemalloc with either the older version (HAVE_JEMALLOC=100) or not all (USE_JEMALLOC=no). Personally I consider that navel gazing as few programs actually qualify for that. Joerg
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 01:20:51AM +, Emmanuel Dreyfus wrote: > On Sat, Jul 01, 2023 at 02:25:03PM +, RVP wrote: > > Not to forget the code for C++ support. And, of course even static > > binaries may call dlopen() and friends. So that dl*() and the ELF bits > > right there. > > At least in 9.3, dlopen() in a static binary does not work. Try using > a NSS module from a statically lunked binary to check that. It does work in the sense that it always fails. I proposed a long time ago that we should support it in a limited form to be able to rip out a lot of complexity from NSS and PAM code: you can compile a list of modules in and have those register exported symbols for use with dlsym, but that's it. I certainly don't want to do the kind of dynamic linking emulation that glibc has in its static libc. Joerg
Re: Trivial program size inflation
mo...@rodents-montreal.org (Mouse) writes: >> Way more interesting than useless tech demo sizes would size >> inflation of a real world minimal program, when linked statically. >Why? If I'm looking at overhead size, I am most interested in just the >overhead size, which is exactly what a no-op program gives. For a no-op program it's overhead, but for a real world minimal program probably not, because it actually requires that code.
Re: Trivial program size inflation
> With most real world programs (hopefully) nearly 100% of what you see > as overhead now is actually needed - and it still may be bigger than > what we hope for due to suboptimal modularization. True. But this is not always as fixable as that wording implies. For example, a program that calls printf but never uses any floating-point values at all will not, in theory, need floating point support. But we do not have any mechanism by which anything can discover that no floating-point printf formats are used and thus bring in a printf variant that doesn't actually support floating point; this means that a bunch of floating-point stuff will be brought in even though it will never actually get used. I'm not sure how fixable this is. (I'm also not sure whether it's worth fixing, but for the purposes of this discussion I'm inclined to say that doesn't matter.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 01:49:57PM -0400, Mouse wrote: > Why? If I'm looking at overhead size, I am most interested in just the > overhead size, which is exactly what a no-op program gives. If I want > to look at the overhead of printf, or malloc, or something, I'd use a > program that just calls those. That might be interesting, but it's not > what I was doing here. I would define overhead as "anything that is not needed for what this concrete program does" and that is always ~100% in your test, but gets blurry quickly for all real programs. With most real world programs (hopefully) nearly 100% of what you see as overhead now is actually needed - and it still may be bigger than what we hope for due to suboptimal modularization. I am all for reducing sizes for real world things (linked either dynamic or static) wherever possible, and I'm pretty sure NetBSD will gladly accept patches to make them smaller if they are technically sound and don't impact standards compliance. Martin
Re: Trivial program size inflation
> Note that a "do nothing" binary is a useless tech demo I chose it because, to a good approximation, the resulting size is nothing but overhead. It is useless as far as its own functionality goes; it is not useless in that it very clearly measures pure overhead. I actually started out on Ubuntu, with a program #include int main(void) { printf("Hello\n"); return(0); } but, when my investigations took me to NetBSD, I started cutting it back (largely because the 1.4T/sparc version of that weighs in at 39836/440/3000 and I wanted to see how much of that was printf - about two-thirds, as it turns out). I first cut it back to and write(1,"Hello\n",6); and then finally all the way back to the one I first posted here about. > If you really want it, you could avoid libc and csu, and be down to a > few bytes. Yes. Writing it in asssembly I got it down to 12/0/0 on 1.4T/sparc (clr %o0; mov 1,%g1; ta 0); I don't know amd64 assembly enough to write the corresponding code there, but I would expect it to be comparable. > Way more interesting than useless tech demo sizes would size > inflation of a real world minimal program, when linked statically. Why? If I'm looking at overhead size, I am most interested in just the overhead size, which is exactly what a no-op program gives. If I want to look at the overhead of printf, or malloc, or something, I'd use a program that just calls those. That might be interesting, but it's not what I was doing here. Still, easy enough. I took 1.4T's /usr/src/games/random.c (picked by looking at executable file sizes in /usr/games and eliminating the shellscripts) - but I did remove the #ifdef lint / __COPYRIGHT / #endif lines, because the 1.4T's random.c's version of that produces an assembler-time error on 5.2/amd64 and 9.0_STABLE/amd64 (the \ns turn into literal newlines in the .s, causing the assembler to complain about unterminated strings). I'm using the resulting .c for all these tests, even on other OS versions. I don't know how close that is to your "real world iminimal program", but it seems reasonable to me. 1.4T/sparc: [Sparkle] 252> cc -o main main.c; size main textdatabss dec hex filename 4565248 280 509313e5main [Sparkle] 253> cc -static -o main main.c; size main textdatabss dec hex filename 49084 616 406853768 d208main [Sparkle] 254> 5.2/amd64: [Backstop] 133> cc -o main main.c; size main textdata bss dec hex filename 4069 632 4965197144d main [Backstop] 134> cc -static -o main main.c; size main textdata bss dec hex filename 1757414760 19064 199565 30b8d main [Backstop] 135> 9.0_STABLE/amd64 (ftp.n.o): % cc -o main main.c; size main textdata bss dec hex filename 4419 666 520560515e5 main % cc -static -o main main.c; size main textdata bss dec hex filename 590110 29416 2178840 2798366 2ab31e main % > The other things that we *might* look into (if someone volunteers) is > to better modularize the CSU code, but it is not immediately clear > how that could/should be done. I don't know. I'm going to be looking at it on my 1.4T and 5.2; if anyone is interested in my results, if-and-when I have any, I can post. But I don't know how applicable they will be to -current. > However, I personally disagree with Jason on the static linking > support Me too. In my opinion, there is a very important place for executables that depend on nothing outside themselves except the kernel. I've got two chroots where it is very nice to have executables that _don't_ need any other files to be present in order to work. All my own libraries, all the stuff in /local/src/lib*, I build .a libraries and no .so libraries. The only things I routinely link in dynamically are the ones that come with the system, and not always even those. > It does not come for free though. Nothing of value does. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Trivial program size inflation
On Sun, Jul 02, 2023 at 12:34:25PM +0200, tlaro...@polynum.com wrote: > It is curious that you react this way in a thread where, you as others, > have had your jaw drop seeing the size of a literally do_nothing > executable. This was unseen precisely because few use static linking. I'm not sure that is what we see here. As Jason said: we have one platform (sun2) where only static linking is available, and we do notice huge size explosions, e.g. when install media does not fit any more. This platform is part of the daily builds. Note that a "do nothing" binary is a useless tech demo. If you really want it, you could avoid libc and csu, and be down to a few bytes. The "jemalloc is big" details may be the only suprising thing here, but some did know that and we provide a special build option to use an older (and much smaller) version instead. Way more interesting than useless tech demo sizes would size inflation of a real world minimal program, when linked statically. The other things that we *might* look into (if someone volunteers) is to better modularize the CSU code, but it is not immediately clear how that could/should be done. However, I personally disagree with Jason on the static linking support and would prefer to keep it. I find it usefull every now and then in special situations, and I really like that I can test boot NetBSD kernels w/o swapping libc to a matching version. It does not come for free though. Martin
Re: Trivial program size inflation
> On Jul 1, 2023, at 6:20 PM, Emmanuel Dreyfus wrote: > > On Sat, Jul 01, 2023 at 02:25:03PM +, RVP wrote: >> Not to forget the code for C++ support. And, of course even static >> binaries may call dlopen() and friends. So that dl*() and the ELF bits >> right there. > > At least in 9.3, dlopen() in a static binary does not work. Try using > a NSS module from a statically lunked binary to check that. Right, because the dynamic linker is not loaded into static binaries. For dynamic binaries, the kernel maps it as the program’s interpreter. -- thorpej
Re: style, sysexits(3), and man RETURN VALUES for sys programs
Le Sat, Jul 01, 2023 at 06:39:32PM -, Christos Zoulas a écrit : > In article <20230603120221.0766b60...@jupiter.mumble.net>, > Taylor R Campbell wrote: > >> Date: Sat, 3 Jun 2023 13:45:44 +0200 > >> From: tlaro...@polynum.com > >> > >> So I suggest to add a mention of sysexits(7) to style. > > > >I don't think sysexits(7) is consistently used enough, or really > >useful enough, to warrant being a part of the style guide. Very few > >programs, even those in src, use it, and I don't think anything > >_relies_ on it for semantics in calling programs. > > I agree; nothing really uses sysexits except inside sendmail perhaps... > It has been around for more than 40 years: > > ^As 00062/0/0 > ^Ad D 1.1 81/10/15 20:29:54 eric 1 0 > ^Ac date and time created 81/10/15 20:29:54 by eric > > and one would think that if it was useful, it would have caught on by now. > Since you don't discuss anything particular to sysexits, your sentence is then a broad, general judgement. So let's see: "NetBSD 1996 -- 2023 It has been around for 28 years. And one would think that if it was useful, it would have caught on by now." If the former is true, the latter is true. Is this latter true? As far as I'm concerned, even if the latter is true, in numbers, it has absolutely no bearing on the usefulness of NetBSD. Only on the stupidity of the mob. And since you mention init(8), the funny thing is that we are discussing about a server that, generally, daemonizes and is hence reparented to init(8)... And when it does not daemonize, it is in debugging mode, and providing an information that the program has, and that will be lost when casting all errors to EXIT_FAILURE, is a debugging feature... Not to mention that if there was the user interface equivalent of strerror(3) (a sysstrerror(1)), one would not have to plague the programs with variable strings, more or less accurate. And, if a daemon was reparented to a daemon server, not closing stdin, stdout and stderr, but redirecting then, this will allow to pass commands to the server via its stdin, solving the problem that was discussed, incidentally, in the course of the inetd(8) thread. And this super daemon could make something of standardized return statuses if only for stats purposes. sysexits(3) is a good idea. And this is probably why it hadn't caught on. -- Thierry Laronde http://www.kergis.com/ http://kertex.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: Trivial program size inflation
Le Sat, Jul 01, 2023 at 12:04:25PM -0700, Jason Thorpe a écrit : > > > On Jul 1, 2023, at 8:20 AM, tlaro...@polynum.com wrote: > > > > This is also what I meant by "static seems to be considered deprecated". > > Honestly, I find the obsession with static linking hilariously quaint. > NetBSD already bends backwards to an extreme degree by ensuring that old > version of *system calls* work correctly *in the kernel*. Some other systems > provide the ABI contract at the libc boundary, and let the libc <-> kernel > interface be fluid (keep the compatibility stuff in user-space where it > belongs!) If it comes to this particular case, the binary format is ELF; and ELF has an interpreter field; wouldn't it be possible to have versionned ld_elf.so providing emulation? > > Obviously this is not feasible to do with static binaries, and we have one > platform that ONLY supports static, but for the rest, I honestly think we > should officially deprecate static linking in the general case (obviously it > has some uses in super-constrained environments, but in those cases we are > often also using crunch?d binaries as well). It is curious that you react this way in a thread where, you as others, have had your jaw drop seeing the size of a literally do_nothing executable. This was unseen precisely because few use static linking. Dshared encourages "inflation". Dshared is a way to hell---there is not only Windows DLLs Hell. Haven't you never had a third party application, from pkgsrc, needing to be updated, and because even trivial libraries of a few kilobytes are linked dynamically, the thing considers that the previous version is not "good enough"---while there is no API nor ABI change---and forces to upgrade it, rendering all the other programs linked against the previous version not executable (while dshared is advertized as allowing concurrent versions of a same library, generally only one version is allowed, the other one being desinstalled) forcing one to upgrade absolutely anything for, in fact, a library that generally simply implements "Hello, world!"---but with a ton of fat---? Not to mention all the security problems implied by the searching feature for an elf dshared with rpath so that without extra care one can not be sure that what will be executed is what was expected. Dshared is not exactly what I will call a panacea. But as has written Henri Poincar\'e: "To believe everything or to doubt about anything are two different ways of being equally superficial." (Tout croire ou douter de tout sont deux façons différentes d'être également superficiel.) I sometimes use totally static; frequently use partially static hence frequently use also partially dshared; and sometimes use totally dshared. There are uses for both static and dshared (if there was shared static, I will also use this in some cases). -- Thierry Laronde http://www.kergis.com/ http://kertex.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: Trivial program size inflation
On Sun, 2 Jul 2023, Emmanuel Dreyfus wrote: On Sat, Jul 01, 2023 at 02:25:03PM +, RVP wrote: Not to forget the code for C++ support. And, of course even static binaries may call dlopen() and friends. So that dl*() and the ELF bits right there. At least in 9.3, dlopen() in a static binary does not work. Try using a NSS module from a statically lunked binary to check that. Ah, that's right: dl*() + static binary doesn't work on -HEAD either (unlike in Glibc where this _does_ work). I should've remembered since I looked into this for pin@ some time back. iconv*(), unsurprisingly, errors out too and those __dl*() in static binaries are just stubs put there to return an error instead of just crashing if the binary tries to do a dlopen(). Thanks! -RVP