Hi, On 2024-02-09 15:27:57 -0800, Noah Misch wrote: > On Fri, Feb 09, 2024 at 10:24:32AM -0800, Andres Freund wrote: > > On 2024-01-26 07:42:33 +0100, Alvaro Herrera wrote: > > > This suggests that finding a way to make the ifunc stuff work (with good > > > performance) is critical to this work. > > > > Ifuncs are effectively implemented as a function call via a pointer, they're > > not magic, unfortunately. The sole trick they provide is that you don't > > manually have to use the function pointer. > > The IFUNC creators introduced it so glibc could use arch-specific memcpy with > the instruction sequence of a non-pointer, extern function call, not the > instruction sequence of a function pointer call.
My understanding is that the ifunc mechanism just avoid the need for repeated indirect calls/jumps to implement a single function call, not the use of indirect function calls at all. Calls into shared libraries, like libc, are indirected via the GOT / PLT, i.e. an indirect function call/jump. Without ifuncs, the target of the function call would then have to dispatch to the resolved function. Ifuncs allow to avoid this repeated dispatch by moving the dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to point to the right function. Thus ifuncs are an optimization when calling a function in a shared library that's then dispatched depending on the cpu capabilities. However, in our case, where the code is in the same binary, function calls implemented in the main binary directly (possibly via a static library) don't go through GOT/PLT. In such a case, use of ifuncs turns a normal direct function call into one going through the GOT/PLT, i.e. makes it indirect. The same is true for calls within a shared library if either explicit symbol visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore there's no efficiency gain of ifuncs over a call via function pointer. This isn't because ifunc is implemented badly or something - the reason for this is that dynamic relocations aren't typically implemented by patching all callsites (".text relocations"), which is what you would need to avoid the need for an indirect call to something that fundamentally cannot be a constant address at link time. The reason text relocations are disfavored is that they can make program startup quite slow, that they require allowing modifications to executable pages which are disliked due to the security implications, and that they make the code non-shareable, as the in-memory executable code has to differ from the on-disk code. I actually think ifuncs within the same binary are a tad *slower* than plain function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is called by 1) a direct call into the PLT, 2) loading the target address from the GOT, 3) making an an indirect jump to that address. Whereas a "plain indirect function call" is just 1) load target address from variable 2) making an indirect jump to that address. With -fno-plt the callsites themselves load the address from the GOT. Greetings, Andres Freund