Re: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes Thu, 05 Nov 2015 23:53:55 -0800

On 6/11/2015 4:55 PM, Jeremy Manson wrote:

FWIW, Google tried to convince the glibc maintainers to make this
async-safe, but they weren't biting:


https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html

Yes I read all that. I wouldn't say they weren't biting, more of adisagreement on the right direction for the patch. glibc weren'tprepared to take it directly as is, while google-folk didn't seem tothink it worth their while to do whatever glibc folk wanted. The actualpatch proposal just died out. :( Quite a pity.

Most of the things you can do are going to be mitigation rather than a
fix.  I did what you suggest to mitigate, and no one complained, until
someone at Google started a sidecar C++ thread that did a boatload of
malloc'ing.


Yes all mitigation. :(

My workaround was far sneakier, and fixes the problem entirely, but you
probably won't want to do it for code hygiene reasons.  I declare the
__thread variable in the java launcher itself, and then export a
function that provides a pointer to it.  In practice, in glibc, if it is
in the main executable, ELF is smart enough to declare it as part of the
stack, and is therefore async-safe.

But even that only works for our own launchers - not for embedded in theJVM. :(

Fortunately, while this is a fun thing to think about, I don't think
there are any async paths in the JVM that use Thread::current()
(although I could be wrong - there is a comment in there saying not to
call Thread::current() in a signal handler...).  I would check the call
paths in AsyncGetCallTrace to make sure.


So two things ...

First, using Thread::current() in a signal context was disallowed, butthe alternative was ThreadLocalStorage::get_thread_slow(). The formermay not work in a signal context due to the caching mechanisms layeredin on different platforms, while the latter used the platform TLS APIwhich, even if not guaranteed, worked well enough in a signal context.With __thread we don't have even a pretend signal-safe alternative :(

Second, AsyncGetCallTrace violates the first rule by usingJavaThread::current() in an assertion.

Also, the problem may not be limited to something likeAsyncGetCallTrace. Though agents may get the current thread from theJNIEnv rather than invoking some JVM function that results inThread::current() being used, I can't be sure of that.


Anyway more things to mull over on the weekedn. :)

Thanks,
David

Jeremy

On Thu, Nov 5, 2015 at 10:26 PM, David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>> wrote:

    Hi Jeremy,

    Okay I have read:

    https://sourceware.org/glibc/wiki/TLSandSignals

    and the tree of mail postings referenced therefrom - great reading! :)

    So basic problem: access to __thread variables is not async-signal-safe

    Exacerbator to problem: first attempt to even read a __thread
    variable can lead to allocation which is the real problem in
    relation to async-signal-safety

    I mention the exacerbator because pthread_getspecific and
    pthread_setSpecific are also not async-signal-safe but we already
    use them. However, pthread_getspecific is in fact (per email threads
    linked above) effectively async-signal-safe, and further a call to
    pthread_getspecific never results in a call to pthread_setspecific
    or an allocation. Hence the pthread functions are almost, if not
    completely, safe in practice with reasonable uses (ie only read from
    signal handler). Which explain this code in existing Thread::current()

    #ifdef PARANOID
       // Signal handler should call ThreadLocalStorage::get_thread_slow()
       Thread* t = ThreadLocalStorage::get_thread_slow();
       assert(t != NULL && !t->is_inside_signal_handler(),
              "Don't use Thread::current() inside signal handler");
    #endif

    So problem scenario is: use of __thread variable (that belongs to
    the shared-library) in a signal handler.

    Solution 0: don't do that. Seriously - like any other
    async-signal-unsafe stuff we should not be using it in real signal
    handlers. The crash handler is a different matter - we try all sorts
    there because it might work and you can't die twice.

    Otherwise: narrow the window of exposure.

    1. We ensure we initialize thread_current (even with a dummy value)
    as early as possible in the thread that loads libjvm. As we have no
    signal handlers installed at that point that might use the same
    variable, we can not hit the problem scenario.

    2. We ensure we initialize thread_current in a new thread with all
    signals blocked. This again avoids the problem scenario.

    3. We initialize thread_current in an attaching thread as soon as
    possible and we again first block all signals.

    That still leaves the problem of an unattached native thread taking
    a signal whilst in async-signal-unsafe code, and executing a signal
    handler which in turns tries to access thread_current for the first
    time. This signal handler need not be an actual JVM handler, but one
    attached by other native code eg an agent. I'm not clear in the
    latter case how reasonable it is for an agent's handler to try and
    do things from an unattached thread - and we don't claim any JNI
    interfaces can, or should, be called from a signal handler - but it
    is something you can probably get away with today.

    Let me also point out that we already effectively have this code in
    Solaris already (at the ThreadLocalStorage class level). So if there
    is something here that will prevent the current proposal we already
    have a problem on Solaris. :(

    Thoughts/comments/suggestions?

    Thanks,
    David


    On 6/11/2015 1:09 PM, David Holmes wrote:

        Hi Jeremy,

        I was going to ask you to elaborate :)

        On 6/11/2015 12:24 PM, Jeremy Manson wrote:

            I should probably elaborate on this.  With glibc + ELF, the
            first time a
            thread accesses a variable declared __thread, if that
            variable is in a
            shared library (as opposed to the main executable), the
            system calls
            malloc() to allocate the space for it.  If that happens in a
            signal that
            is being delivered during a call to malloc(), then you
            usually get a
            crash.


        My understanding of the ELF ABI for thread-locals - which I read
        about
        in the Solaris 11.1 Linkers and libraries guide - does require
        use of
        the dynamic TLS model for any dynamically loaded shared object which
        defines a thread-local, but that is what we use as I understand
        it. The
        docs state:

        "A shared object containing only dynamic TLS can be loaded following
        process startup without limitations. The runtime linker extends
        the list
        of initialization records to include the initialization template
        of the
        new object. The new object is given an index of m = M + 1. The
        counter M is incremented by 1. However, the allocation of new
        TLS blocks
        is deferred until the blocks are actually referenced."

        Now I guess "extends the list" might be implemented using malloc
        ... but
        this will only occur in the main thread (the one started by the
        launcher
        to load the JVM and become the main thread), at the time libjvm is
        loaded - which will all be over before any agent etc can run and do
        anything. But "allocation ... is deferred" suggests we may have a
        problem until either the first call to Thread::current or the
        call to
        Thread::initialize_thread_current. If it is the former then that
        should
        occur well before any agent etc can be loaded. And I can easily
        inject
        an initial dummy call to initialize_thread_current(null) to
        force the
        TLS allocation.

            This may bite you if AsyncGetCallTrace uses
            Thread::current(), and you
            use system timers to do profiling.  If a thread doing a
            malloc() prior
            to the first time it accesses Thread::current(), and it gets
            delivered a
            signal, it might die.  This is especially likely for pure
            native threads
            started by native code.

            I believe that this is a use case you support, so you might
            want to make
            sure it is okay.


        For a VM embedded in a process, which already contains native
        threads,
        that will later attach to the VM, this may indeed be a problem. One
        would have hoped however that the implementation of TLS would be
        completely robust, at least for something as simple as getting a
        signal
        whilst in the allocator.

        I'm unclear how to test for or check for this kind of problem.
        Arguably
        there could be many things that are async-unsafe in this way.

        Need to think more about this and do some more research. Would also
        appreciate any insight from any glibc and/or ELF gurus.

        Thanks.
        David

            Jeremy

            On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
            <jeremyman...@google.com <mailto:jeremyman...@google.com>
            <mailto:jeremyman...@google.com
            <mailto:jeremyman...@google.com>>> wrote:

                 Something that's bitten me with __thread: it isn't
            async-safe when
                 called from a shared object on Linux.  Have you vetted
            to make sure
                 this doesn't make HS less async-safe?

                 Jeremy

                 On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
                 <david.hol...@oracle.com
            <mailto:david.hol...@oracle.com>
            <mailto:david.hol...@oracle.com
            <mailto:david.hol...@oracle.com>>> wrote:

                     bug: https://bugs.openjdk.java.net/browse/JDK-8132510

                     Open webrev:
            http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/

                     A simple (in principle) but wide-ranging change
            which should
                     appeal to our Code Deletion Engineer's. We implement
                     Thread::current() using a compiler/language-based
            thread-local
                     variable eg:


                       static __thread Thread *_thr_current;

                       inline Thread* Thread::current() {
                         return _thr_current;
                       }

                     with an appropriate setter of course. By doing this
            we can
                     completely remove the platform-specific
            ThreadLocalStorage
                     implementations, and the associated
            os::thread_local_storage*
                     calls, plus all the uses of
            ThreadLocalStorage::thread() and
                     ThreadLocalStorage::get_thread_slow(). This extends
            the previous
                     work done on Solaris to implement
            ThreadLocalStorage::thread()
                     using compiler-based thread-locals.

                     We can also consolidate nearly all the os_cpu
            versions of
                     MacroAssembler::get_thread on x86 into one cpu
            specific one ( a
                     special variant is still needed for 32-bit Windows).

                     As a result of this change we have further
            potential cleanups:
                     - all the src/os/<os>/vm/thread_<os>.inline.hpp
            files are now
                     completely empty and could also be removed
                     - the MINIMIZE_RAM_USAGE define (which avoids use
            of the linux
                     sp-map "cache" on 32-bit) now has no affect and so
            could be
                     completely removed from the build system

                     I plan to do the MINIMIZE_RAM_USAGE removal as a
            follow up CR,
                     but could add the removal of the "inline" files to
            this CR if
                     people think it worth removing them.

                     I have one missing piece on Aarch64 - I need to change
                     MacroAssembler::get_thread to simply call
            Thread::current() as
                     on other platforms, but I don't know how to write
            that. I would
                     appreciate it if someone could give me the right
            code for that.

                     I would also appreciate comments/testing by the AIX
            and PPC64
                     folk as well.

                     A concern about memory-leaks had previously been
            raised, but
                     experiments using simple C code on linux 86 and
            Solaris showed
                     no issues. Also note that Aarch64 already uses this
            kind of
                     thread-local.

                     Thanks,
                     David

Re: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Reply via email to