On Fri, 11 Mar 2022 12:18:36 GMT, Florian Weimer <fwei...@openjdk.org> wrote:

> > According to 
> > https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654:
> >  "these accesses are just a move from a system register plus a load/store 
> > at a constant offset."
> 
> Ideally you'd still benchmark that. Some AArch64 implementations have really, 
> really slow moves from the system register used as the thread pointer. 
> Hopefully Apple's implementation isn't in that category.

In a tight loop, loads from __thread variables take 1ns. It's this:


    0x18ea1c530 <+0>:   ldr    x16, [x0, #0x8]
    0x18ea1c534 <+4>:   mrs    x17, TPIDRRO_EL0 
    0x18ea1c538 <+8>:   and    x17, x17, #0xfffffffffffffff8
    0x18ea1c53c <+12>:  ldr    x17, [x17, x16, lsl #3]
    0x18ea1c540 <+16>:  cbz    x17, 0x18ea1c550          ; only executed first 
time
    0x18ea1c544 <+20>:  ldr    x16, [x0, #0x10]
    0x18ea1c548 <+24>:  add    x0, x17, x16
    0x18ea1c54c <+28>:  ret    


... which looks the same as what glibc does. Not bad, but quite a lot more to 
do than a simple load.

I'd still use a static SafeFetch, with no W^X fiddling. It just seems to me 
much more reasonable.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7727

Reply via email to