[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-13 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #40 from nsz at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #22)
> BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed
> offsets in there), or are those call-saved?

call-saved.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #39 from Richard Biener  ---
(In reply to H.J. Lu from comment #32)
> (In reply to Michael Matz from comment #31)
> > (In reply to H.J. Lu from comment #30)
> > > (In reply to Michael Matz from comment #29)
> > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > clearly _does_ so :-)
> > > 
> > > ld.so can only call the malloc implementation internal to ld.so.
> > 
> > (And string functions for initializing that memory)  If that's ensured
> > already
> > everywhere: super.  Because I agree, that this is the best thing to do here.
> > From my perspective this is pure internal implementation details and hence
> > setting up thread-local areas should not be expected to be interposable by
> > users.
> > (a custom allocator that isn't malloc or doesn't interact with it also would
> > work)
> 
> Since ia32 ld.so in glibc is compiled with:
> 
> Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> 
> ia32 _dl_tlsdesc_dynamic is OK.

Maybe also use -minline-all-stringops to avoid using IFUNC accelerated
memset/memcpy?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #38 from H.J. Lu  ---
The new glibc patch set covers both i386 and x86-64:

https://patchwork.sourceware.org/project/glibc/list/?series=30854

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #37 from Andreas Schwab  ---
No, it uses whatever __rtld_malloc points at, which will be the normal malloc
after bootstrap.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #36 from H.J. Lu  ---
(In reply to Andreas Schwab from comment #35)
> ld.so use its internal malloc only during bootstrapping.

___tls_get_addr always uses the internal malloc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #35 from Andreas Schwab  ---
ld.so use its internal malloc only during bootstrapping.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #34 from H.J. Lu  ---
(In reply to H.J. Lu from comment #33)
> (In reply to H.J. Lu from comment #32)
> > (In reply to Michael Matz from comment #31)
> > > (In reply to H.J. Lu from comment #30)
> > > > (In reply to Michael Matz from comment #29)
> > > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > > clearly _does_ so :-)
> > > > 
> > > > ld.so can only call the malloc implementation internal to ld.so.
> > > 
> > > (And string functions for initializing that memory)  If that's ensured
> > > already
> > > everywhere: super.  Because I agree, that this is the best thing to do 
> > > here.
> > > From my perspective this is pure internal implementation details and hence
> > > setting up thread-local areas should not be expected to be interposable by
> > > users.
> > > (a custom allocator that isn't malloc or doesn't interact with it also 
> > > would
> > > work)
> > 
> > Since ia32 ld.so in glibc is compiled with:
> > 
> > Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> > 
> > ia32 _dl_tlsdesc_dynamic is OK.
> 
> 387 registers may be an issue.

I checked ld.so.  It doesn't use 387 registers.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #33 from H.J. Lu  ---
(In reply to H.J. Lu from comment #32)
> (In reply to Michael Matz from comment #31)
> > (In reply to H.J. Lu from comment #30)
> > > (In reply to Michael Matz from comment #29)
> > > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > > clearly _does_ so :-)
> > > 
> > > ld.so can only call the malloc implementation internal to ld.so.
> > 
> > (And string functions for initializing that memory)  If that's ensured
> > already
> > everywhere: super.  Because I agree, that this is the best thing to do here.
> > From my perspective this is pure internal implementation details and hence
> > setting up thread-local areas should not be expected to be interposable by
> > users.
> > (a custom allocator that isn't malloc or doesn't interact with it also would
> > work)
> 
> Since ia32 ld.so in glibc is compiled with:
> 
> Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387
> 
> ia32 _dl_tlsdesc_dynamic is OK.

387 registers may be an issue.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #32 from H.J. Lu  ---
(In reply to Michael Matz from comment #31)
> (In reply to H.J. Lu from comment #30)
> > (In reply to Michael Matz from comment #29)
> > > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > > clearly _does_ so :-)
> > 
> > ld.so can only call the malloc implementation internal to ld.so.
> 
> (And string functions for initializing that memory)  If that's ensured
> already
> everywhere: super.  Because I agree, that this is the best thing to do here.
> From my perspective this is pure internal implementation details and hence
> setting up thread-local areas should not be expected to be interposable by
> users.
> (a custom allocator that isn't malloc or doesn't interact with it also would
> work)

Since ia32 ld.so in glibc is compiled with:

Makefile:rtld-CFLAGS += -mno-sse -mno-mmx -mfpmath=387

ia32 _dl_tlsdesc_dynamic is OK.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread matz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #31 from Michael Matz  ---
(In reply to H.J. Lu from comment #30)
> (In reply to Michael Matz from comment #29)
> > It not only can call malloc.  As the backtrace of H.J. shows, it quite
> > clearly _does_ so :-)
> 
> ld.so can only call the malloc implementation internal to ld.so.

(And string functions for initializing that memory)  If that's ensured already
everywhere: super.  Because I agree, that this is the best thing to do here.
>From my perspective this is pure internal implementation details and hence
setting up thread-local areas should not be expected to be interposable by
users.
(a custom allocator that isn't malloc or doesn't interact with it also would
work)

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #30 from H.J. Lu  ---
(In reply to Michael Matz from comment #29)
> It not only can call malloc.  As the backtrace of H.J. shows, it quite
> clearly _does_ so :-)
> 

ld.so can only call the malloc implementation internal to ld.so.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread matz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #29 from Michael Matz  ---
It not only can call malloc.  As the backtrace of H.J. shows, it quite clearly
_does_ so :-)

That's why there is talk earlier in this report about potentially not using
malloc as one-time allocator for thread-local areas at all, or allocate the
memory at a different time that from __tls_get_addr.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #28 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #27)
> (In reply to H.J. Lu from comment #26)
> > Even if I compile ia32 glibc with -march=skylake, the _dl_tlsdesc_dynamic
> > slow
> > path doesn't touch XMM registers at all.
> 
> I thought Florian said it can call malloc and malloc can be user provided
> and can use SSE2, 387/MMX or whatever other call clobbered registers ia32
> has.

[hjl@gnu-cfl-3 elf]$ readelf -rW ld.so

Relocation section '.rel.dyn' at offset 0x9f8 contains 3 entries:
 Offset InfoTypeSym. Value  Symbol's Name
00032fe0  1a06 R_386_GLOB_DAT 00031ac0   __rseq_offset@@GLIBC_2.35
00032fe4  1f06 R_386_GLOB_DAT 00031ac4   __rseq_size@@GLIBC_2.35
00032b20  002a R_386_IRELATIVE   

Relocation section '.relr.dyn' at offset 0xa10 contains 3 entries:
  12 offsets
00031a60
00032ed0
00032ed8
00032f04
00032f08
00032f0c
00032f10
00032f14
00032f18
00032f1c
00032f20
00032f24
[hjl@gnu-cfl-3 elf]$ 

You can't use another malloc for the ld.so internal usage of malloc/calloc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #27 from Jakub Jelinek  ---
(In reply to H.J. Lu from comment #26)
> Even if I compile ia32 glibc with -march=skylake, the _dl_tlsdesc_dynamic
> slow
> path doesn't touch XMM registers at all.

I thought Florian said it can call malloc and malloc can be user provided and
can use SSE2, 387/MMX or whatever other call clobbered registers ia32 has.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #26 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #25)
> (In reply to H.J. Lu from comment #23)
> > > And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?
> > 
> > i386 doesn't preserve them in _dl_runtime_resolve nor _dl_tlsdesc_dynamic.
> 
> That is different.  _dl_runtime_resolve happens only at the start of calls
> to functions, if in all supported ia32 ABIs all of i387 state is unsupported
> upon entering functions, then there is no need to save anything.
> While _dl_tlsdesc_dynamic can happen anywhere from within functions and
> doesn't clobber any registers except ax which gets the value, so I think it
> needs to be saved for that case.

I couldn't find a test to show it is needed on i386:

#0  __GI___libc_malloc (bytes=3200) at malloc.c:3294
#1  0xf7fdb771 in malloc (size=) at ../include/rtld-malloc.h:56
#2  allocate_dtv_entry (size=, alignment=4) at dl-tls.c:679
#3  allocate_and_init (map=0xf6e00670) at dl-tls.c:704
#4  tls_get_addr_tail (ti=0xf6e00a30, dtv=0x5655fcd8, the_map=0xf6e00670)
at dl-tls.c:904
#5  0xf7fdf5d5 in _dl_tlsdesc_dynamic () at ../sysdeps/i386/dl-tlsdesc.S:129
#6  0xf7fb017b in apply_tls (p=0xf7a0037c) at tst-gnu2-tls2mod1.c:26
#7  0x5655769b in access_mod (i=1, sym=0x5655a026 "apply_tls")
at ../sysdeps/i386/i686/tst-gnu2-tls2-i686.c:55
#8  start (arg=0x0) at ../sysdeps/i386/i686/tst-gnu2-tls2-i686.c:70
#9  0xf7c96207 in start_thread (arg=) at pthread_create.c:447
#10 0xf7d3dc08 in clone3 () at ../sysdeps/unix/sysv/linux/i386/clone3.S:111

Even if I compile ia32 glibc with -march=skylake, the _dl_tlsdesc_dynamic slow
path doesn't touch XMM registers at all.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #25 from Jakub Jelinek  ---
(In reply to H.J. Lu from comment #23)
> > And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?
> 
> i386 doesn't preserve them in _dl_runtime_resolve nor _dl_tlsdesc_dynamic.

That is different.  _dl_runtime_resolve happens only at the start of calls to
functions, if in all supported ia32 ABIs all of i387 state is unsupported upon
entering functions, then there is no need to save anything.
While _dl_tlsdesc_dynamic can happen anywhere from within functions and doesn't
clobber any registers except ax which gets the value, so I think it needs to be
saved for that case.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|NEW |RESOLVED

--- Comment #24 from H.J. Lu  ---
Moved to glibc.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #23 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #22)
> BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed
> offsets in there), or are those call-saved?
> What about floating point registers in x86_64/dl-tlsdesc.S?

Floating point registers are preserved with my glibc patch.

> And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?

i386 doesn't preserve them in _dl_runtime_resolve nor _dl_tlsdesc_dynamic.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #22 from Jakub Jelinek  ---
BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed
offsets in there), or are those call-saved?
What about floating point registers in x86_64/dl-tlsdesc.S?
And i386/dl-tlsdesc.S needs to save/restore 387 and SSE regs?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #21 from H.J. Lu  ---
(In reply to Florian Weimer from comment #20)
> (In reply to H.J. Lu from comment #19)
> > (In reply to Florian Weimer from comment #9)
> > > (In reply to H.J. Lu from comment #7)
> > > > > The __tls_get_addr call with the default approach potentially needs 
> > > > > to solve
> > > > > the same problem, doesn't it?
> > > > 
> > > > Isn't __tls_get_addr called via the PLT entry?
> > > 
> > > I'm not sure if that matters? Even if the lazy binding trampoline is 
> > > active,
> > > it won't protect the actual call.
> > 
> > Non-GNU2 TLS has
> > 
> > 4000  00010007 R_X86_64_JUMP_SLOT 
> > __tls_get_addr + 1010
> > 
> > which calls _dl_runtime_resolve with lazy binding. _dl_runtime_resolve
> > preserves all caller-saved registers.
> 
> The dynamic linker preserves register contents during lazy binding and
> restores them before calling __tls_get_addr, so it doesn't help with
> __tls_get_addr register usage itself. And lazy binding happens only once per
> process and object, while we need to protect the first call on every thread.

Only called from _dl_tlsdesc_dynamic isn't protected.  My glibc patch:

https://patchwork.sourceware.org/project/glibc/list/?series=30800

fixes it.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #20 from Florian Weimer  ---
(In reply to H.J. Lu from comment #19)
> (In reply to Florian Weimer from comment #9)
> > (In reply to H.J. Lu from comment #7)
> > > > The __tls_get_addr call with the default approach potentially needs to 
> > > > solve
> > > > the same problem, doesn't it?
> > > 
> > > Isn't __tls_get_addr called via the PLT entry?
> > 
> > I'm not sure if that matters? Even if the lazy binding trampoline is active,
> > it won't protect the actual call.
> 
> Non-GNU2 TLS has
> 
> 4000  00010007 R_X86_64_JUMP_SLOT 
> __tls_get_addr + 1010
> 
> which calls _dl_runtime_resolve with lazy binding. _dl_runtime_resolve
> preserves all caller-saved registers.

The dynamic linker preserves register contents during lazy binding and restores
them before calling __tls_get_addr, so it doesn't help with __tls_get_addr
register usage itself. And lazy binding happens only once per process and
object, while we need to protect the first call on every thread.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #19 from H.J. Lu  ---
(In reply to Florian Weimer from comment #9)
> (In reply to H.J. Lu from comment #7)
> > > The __tls_get_addr call with the default approach potentially needs to 
> > > solve
> > > the same problem, doesn't it?
> > 
> > Isn't __tls_get_addr called via the PLT entry?
> 
> I'm not sure if that matters? Even if the lazy binding trampoline is active,
> it won't protect the actual call.

Non-GNU2 TLS has

4000  00010007 R_X86_64_JUMP_SLOT 
__tls_get_addr + 1010

which calls _dl_runtime_resolve with lazy binding. _dl_runtime_resolve
preserves
all caller-saved registers.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #18 from Florian Weimer  ---
(In reply to Richard Biener from comment #16)
> I do wonder why __tls_get_addr would have to call the overloaded malloc, can
> we just not force-bind it to the glibc local malloc (and make sure that's
> compiled with -mgeneral-regs-only)?

Using the glibc malloc just for some small TLS allocation is rather wasteful
because of its (mostly) per-thread data structures. Allocating from the main
arena potentially clashes with brk usage from the replacement malloc.

We'd need an alternative memory allocator (in addition to replacement string
functions), but that is known to break Thread Sanitizer and Leak Sanitizer.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #17 from Richard Biener  ---
(In reply to Richard Biener from comment #16)
> I do wonder why __tls_get_addr would have to call the overloaded malloc, can
> we just not force-bind it to the glibc local malloc (and make sure that's
> compiled with -mgeneral-regs-only)?

I realize we end up calling memset (but __mempcpy?) as well, that might
end up in an ifunc and thus using non-general regs as well (and be
overloaded of course).  So the whole __tls_get_addr path would need to
make sure it never goes out of glibc controlled sources.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #16 from Richard Biener  ---
I do wonder why __tls_get_addr would have to call the overloaded malloc, can
we just not force-bind it to the glibc local malloc (and make sure that's
compiled with -mgeneral-regs-only)?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #15 from Jakub Jelinek  ---
Because right now it also means it needs to save/restore the APX registers
because malloc could be -mapxf compiled even when glibc isn't.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

Richard Biener  changed:

   What|Removed |Added

 CC||matz at gcc dot gnu.org

--- Comment #14 from Richard Biener  ---
True.  Maybe the kernel VDSO should have a _save_all_regs (fnptr) and
"indirector" ...

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #12 from Jakub Jelinek  ---
(In reply to Florian Weimer from comment #11)
> (In reply to Richard Biener from comment #10)
> > I think a glibc fix would be very much preferred.
> 
> It's a bit of a maintenance nightmare because we have to update the code
> slightly each time new registers are added, and there isn't a good way for
> applications to detect whether they run on a compatible glibc.

But it is what the ABI of GNU2 TLS says or what even dl-tlsdesc.C says:
/* Preserve call-clobbered registers that we modify.
Yeah, the fact that it can call user-overloaded malloc significantly
complicates
stuff, otherwise it would be just a matter of new registers that can be
modified
while running whatever __tls_get_addr needs and could be changed only when
glibc is rebuilt with some newer compiler which starts modifying further call
clobbered registers.
But with overloaded malloc it can be just if the overloaded malloc is rebuilt
with newer compiler...

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #13 from Jakub Jelinek  ---
BTW, isn't _mcount similar in this regard?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #11 from Florian Weimer  ---
(In reply to Richard Biener from comment #10)
> I think a glibc fix would be very much preferred.

It's a bit of a maintenance nightmare because we have to update the code
slightly each time new registers are added, and there isn't a good way for
applications to detect whether they run on a compatible glibc.

> Is -mtls-dialect=gnu2
> supposed to work on a per-TU base or are all parts of an executable + loaded
> shlibs required to have the same setting?

It's possible to link various TLS variants together, and they should
interoperate.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #10 from Richard Biener  ---
I think a glibc fix would be very much preferred.  Is -mtls-dialect=gnu2
supposed to work on a per-TU base or are all parts of an executable + loaded
shlibs required to have the same setting?

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #9 from Florian Weimer  ---
(In reply to H.J. Lu from comment #7)
> > The __tls_get_addr call with the default approach potentially needs to solve
> > the same problem, doesn't it?
> 
> Isn't __tls_get_addr called via the PLT entry?

I'm not sure if that matters? Even if the lazy binding trampoline is active, it
won't protect the actual call.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #8 from Jakub Jelinek  ---
E.g.
https://sourceware.org/legacy-ml/binutils/2005-09/msg00184.html
says
The functions defined above use custom calling conventions that
require them to preserve any registers they modify.  This penalizes
the case that requires dynamic TLS, since it must preserve all
call-clobbered registers before calling __tls_get_addr(), but it is
optimized for the most common case of static TLS, and also for the
case in which the code generated by the compiler can be relaxed by the
linker to a more efficient access model: being able to assume no
registers are clobbered by the call tends to improve register
allocation.  Also, the function that handles the dynamic TLS case will
most often be able to avoid calling __tls_get_addr(), thus potentially
avoiding the need for preserving registers.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #7 from H.J. Lu  ---
(In reply to Florian Weimer from comment #6)
> > (In reply to H.J. Lu from comment #4)
> > > (In reply to H.J. Lu from comment #3)
> > > > Created attachment 57385 [details]
> > > > A patch
> > > > 
> > > > Try this.
> > > 
> > > This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic
> > > needs to save and restore ALL registers, which can be expensive.
> 
> Why doesn't this work properly? Is it possible to make it work with a
> different approach?

Clobber must be attached to TLS descriptor call insn.

> The __tls_get_addr call with the default approach potentially needs to solve
> the same problem, doesn't it?

Isn't __tls_get_addr called via the PLT entry?

> (In reply to Jakub Jelinek from comment #5)
> > Or it could be compiled with options to make sure it doesn't use vector
> > registers etc., and only save/restore if it needs to call into some code
> > where libc can't afford that (say allocate memory).
> 
> We currently call into malloc, which could be a replacement malloc. If GCC
> cannot be fixed, full context switch or elimination of the slow path are our
> best options for a glibc-side fix.

We should open a glibc bug.  I am working on the glibc fix.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #6 from Florian Weimer  ---
> (In reply to H.J. Lu from comment #4)
> > (In reply to H.J. Lu from comment #3)
> > > Created attachment 57385 [details]
> > > A patch
> > > 
> > > Try this.
> > 
> > This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic
> > needs to save and restore ALL registers, which can be expensive.

Why doesn't this work properly? Is it possible to make it work with a different
approach?

The __tls_get_addr call with the default approach potentially needs to solve
the same problem, doesn't it?

(In reply to Jakub Jelinek from comment #5)
> Or it could be compiled with options to make sure it doesn't use vector
> registers etc., and only save/restore if it needs to call into some code
> where libc can't afford that (say allocate memory).

We currently call into malloc, which could be a replacement malloc. If GCC
cannot be fixed, full context switch or elimination of the slow path are our
best options for a glibc-side fix.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #5 from Jakub Jelinek  ---
(In reply to H.J. Lu from comment #4)
> (In reply to H.J. Lu from comment #3)
> > Created attachment 57385 [details]
> > A patch
> > 
> > Try this.
> 
> This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic
> needs
> to save and restore ALL registers, which can be expensive.

Or it could be compiled with options to make sure it doesn't use vector
registers etc., and only save/restore if it needs to call into some code where
libc can't afford that (say allocate memory).

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-02-11

--- Comment #4 from H.J. Lu  ---
(In reply to H.J. Lu from comment #3)
> Created attachment 57385 [details]
> A patch
> 
> Try this.

This doesn't work properly.  To work around in ld.so, _dl_tlsdesc_dynamic needs
to save and restore ALL registers, which can be expensive.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #3 from H.J. Lu  ---
Created attachment 57385
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57385=edit
A patch

Try this.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

Jakub Jelinek  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Alex, what was the original intention here?
It wouldn't surprise me if the intention was to clobber as few registers as
possible because in the common case the call doesn't really call much, just a
TLS load or so, and one can have hundreds of those in a single function.

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-11 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

--- Comment #1 from Florian Weimer  ---
Brought to the x86-64 ABI list:

GCC and the GNU2 TLS descriptor call ABI