Re: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes Thu, 05 Nov 2015 19:57:32 -0800

Hi Jeremy,

I was going to ask you to elaborate :)


On 6/11/2015 12:24 PM, Jeremy Manson wrote:

I should probably elaborate on this.  With glibc + ELF, the first time a
thread accesses a variable declared __thread, if that variable is in a
shared library (as opposed to the main executable), the system calls
malloc() to allocate the space for it.  If that happens in a signal that
is being delivered during a call to malloc(), then you usually get a crash.

My understanding of the ELF ABI for thread-locals - which I read aboutin the Solaris 11.1 Linkers and libraries guide - does require use ofthe dynamic TLS model for any dynamically loaded shared object whichdefines a thread-local, but that is what we use as I understand it. Thedocs state:

"A shared object containing only dynamic TLS can be loaded followingprocess startup without limitations. The runtime linker extends the listof initialization records to include the initialization template of thenew object. The new object is given an index of m = M + 1. Thecounter M is incremented by 1. However, the allocation of new TLS blocksis deferred until the blocks are actually referenced."

Now I guess "extends the list" might be implemented using malloc ... butthis will only occur in the main thread (the one started by the launcherto load the JVM and become the main thread), at the time libjvm isloaded - which will all be over before any agent etc can run and doanything. But "allocation ... is deferred" suggests we may have aproblem until either the first call to Thread::current or the call toThread::initialize_thread_current. If it is the former then that shouldoccur well before any agent etc can be loaded. And I can easily injectan initial dummy call to initialize_thread_current(null) to force theTLS allocation.

This may bite you if AsyncGetCallTrace uses Thread::current(), and you
use system timers to do profiling.  If a thread doing a malloc() prior
to the first time it accesses Thread::current(), and it gets delivered a
signal, it might die.  This is especially likely for pure native threads
started by native code.

I believe that this is a use case you support, so you might want to make
sure it is okay.

For a VM embedded in a process, which already contains native threads,that will later attach to the VM, this may indeed be a problem. Onewould have hoped however that the implementation of TLS would becompletely robust, at least for something as simple as getting a signalwhilst in the allocator.

I'm unclear how to test for or check for this kind of problem. Arguablythere could be many things that are async-unsafe in this way.

Need to think more about this and do some more research. Would alsoappreciate any insight from any glibc and/or ELF gurus.


Thanks.
David

Jeremy

On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson <[email protected]
<mailto:[email protected]>> wrote:

    Something that's bitten me with __thread: it isn't async-safe when
    called from a shared object on Linux.  Have you vetted to make sure
    this doesn't make HS less async-safe?

    Jeremy

    On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
    <[email protected] <mailto:[email protected]>> wrote:

        bug: https://bugs.openjdk.java.net/browse/JDK-8132510

        Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/

        A simple (in principle) but wide-ranging change which should
        appeal to our Code Deletion Engineer's. We implement
        Thread::current() using a compiler/language-based thread-local
        variable eg:


          static __thread Thread *_thr_current;

          inline Thread* Thread::current() {
            return _thr_current;
          }

        with an appropriate setter of course. By doing this we can
        completely remove the platform-specific ThreadLocalStorage
        implementations, and the associated os::thread_local_storage*
        calls, plus all the uses of ThreadLocalStorage::thread() and
        ThreadLocalStorage::get_thread_slow(). This extends the previous
        work done on Solaris to implement ThreadLocalStorage::thread()
        using compiler-based thread-locals.

        We can also consolidate nearly all the os_cpu versions of
        MacroAssembler::get_thread on x86 into one cpu specific one ( a
        special variant is still needed for 32-bit Windows).

        As a result of this change we have further potential cleanups:
        - all the src/os/<os>/vm/thread_<os>.inline.hpp files are now
        completely empty and could also be removed
        - the MINIMIZE_RAM_USAGE define (which avoids use of the linux
        sp-map "cache" on 32-bit) now has no affect and so could be
        completely removed from the build system

        I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR,
        but could add the removal of the "inline" files to this CR if
        people think it worth removing them.

        I have one missing piece on Aarch64 - I need to change
        MacroAssembler::get_thread to simply call Thread::current() as
        on other platforms, but I don't know how to write that. I would
        appreciate it if someone could give me the right code for that.

        I would also appreciate comments/testing by the AIX and PPC64
        folk as well.

        A concern about memory-leaks had previously been raised, but
        experiments using simple C code on linux 86 and Solaris showed
        no issues. Also note that Aarch64 already uses this kind of
        thread-local.

        Thanks,
        David

Re: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Reply via email to