Hi David, On Fri, Nov 6, 2015 at 7:26 AM, David Holmes <david.hol...@oracle.com> wrote:
> Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great reading! :) > > So basic problem: access to __thread variables is not async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread variable can > lead to allocation which is the real problem in relation to > async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we already use them. > However, pthread_getspecific is in fact (per email threads linked above) > effectively async-signal-safe, and further a call to pthread_getspecific > never results in a call to pthread_setspecific or an allocation. Hence the > pthread functions are almost, if not completely, safe in practice with > reasonable uses (ie only read from signal handler). Which explain this code > in existing Thread::current() > > #ifdef PARANOID > // Signal handler should call ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that belongs to the > shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other async-signal-unsafe > stuff we should not be using it in real signal handlers. The crash handler > is a different matter - we try all sorts there because it might work and > you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a dummy value) as > early as possible in the thread that loads libjvm. As we have no signal > handlers installed at that point that might use the same variable, we can > not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread with all signals > blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as soon as possible > and we again first block all signals. > > That still leaves the problem of an unattached native thread taking a > signal whilst in async-signal-unsafe code, and executing a signal handler > which in turns tries to access thread_current for the first time. This > signal handler need not be an actual JVM handler, but one attached by other > native code eg an agent. I'm not clear in the latter case how reasonable it > is for an agent's handler to try and do things from an unattached thread - > and we don't claim any JNI interfaces can, or should, be called from a > signal handler - but it is something you can probably get away with today. > > Let me also point out that we already effectively have this code in > Solaris already (at the ThreadLocalStorage class level). So if there is > something here that will prevent the current proposal we already have a > problem on Solaris. :( > > Thoughts/comments/suggestions? > > The first problem: thread initializes TLS variable, gets interrupted and accesses the half-initialized variable from within the signal handler. This could happen today too, or? but I think we never saw this. In theory, it could be mitigated by some careful testing before using the Thread::current() value in the signal handler. Like, put an eyecatcher at the beginning of the Thread structure and check that using SafeFetch. As for the second problem - recursive malloc() deadlocks - I am at a loss. I do not fully understand though why pthread_getspecific is different - does it not have to allocate place for the TLS variable too? Regards, Thomas > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > >> Hi Jeremy, >> >> I was going to ask you to elaborate :) >> >> On 6/11/2015 12:24 PM, Jeremy Manson wrote: >> >>> I should probably elaborate on this. With glibc + ELF, the first time a >>> thread accesses a variable declared __thread, if that variable is in a >>> shared library (as opposed to the main executable), the system calls >>> malloc() to allocate the space for it. If that happens in a signal that >>> is being delivered during a call to malloc(), then you usually get a >>> crash. >>> >> >> My understanding of the ELF ABI for thread-locals - which I read about >> in the Solaris 11.1 Linkers and libraries guide - does require use of >> the dynamic TLS model for any dynamically loaded shared object which >> defines a thread-local, but that is what we use as I understand it. The >> docs state: >> >> "A shared object containing only dynamic TLS can be loaded following >> process startup without limitations. The runtime linker extends the list >> of initialization records to include the initialization template of the >> new object. The new object is given an index of m = M + 1. The >> counter M is incremented by 1. However, the allocation of new TLS blocks >> is deferred until the blocks are actually referenced." >> >> Now I guess "extends the list" might be implemented using malloc ... but >> this will only occur in the main thread (the one started by the launcher >> to load the JVM and become the main thread), at the time libjvm is >> loaded - which will all be over before any agent etc can run and do >> anything. But "allocation ... is deferred" suggests we may have a >> problem until either the first call to Thread::current or the call to >> Thread::initialize_thread_current. If it is the former then that should >> occur well before any agent etc can be loaded. And I can easily inject >> an initial dummy call to initialize_thread_current(null) to force the >> TLS allocation. >> >> This may bite you if AsyncGetCallTrace uses Thread::current(), and you >>> use system timers to do profiling. If a thread doing a malloc() prior >>> to the first time it accesses Thread::current(), and it gets delivered a >>> signal, it might die. This is especially likely for pure native threads >>> started by native code. >>> >>> I believe that this is a use case you support, so you might want to make >>> sure it is okay. >>> >> >> For a VM embedded in a process, which already contains native threads, >> that will later attach to the VM, this may indeed be a problem. One >> would have hoped however that the implementation of TLS would be >> completely robust, at least for something as simple as getting a signal >> whilst in the allocator. >> >> I'm unclear how to test for or check for this kind of problem. Arguably >> there could be many things that are async-unsafe in this way. >> >> Need to think more about this and do some more research. Would also >> appreciate any insight from any glibc and/or ELF gurus. >> >> Thanks. >> David >> >> Jeremy >>> >>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson <jeremyman...@google.com >>> <mailto:jeremyman...@google.com>> wrote: >>> >>> Something that's bitten me with __thread: it isn't async-safe when >>> called from a shared object on Linux. Have you vetted to make sure >>> this doesn't make HS less async-safe? >>> >>> Jeremy >>> >>> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes >>> <david.hol...@oracle.com <mailto:david.hol...@oracle.com>> wrote: >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 >>> >>> Open webrev: >>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >>> >>> A simple (in principle) but wide-ranging change which should >>> appeal to our Code Deletion Engineer's. We implement >>> Thread::current() using a compiler/language-based thread-local >>> variable eg: >>> >>> >>> static __thread Thread *_thr_current; >>> >>> inline Thread* Thread::current() { >>> return _thr_current; >>> } >>> >>> with an appropriate setter of course. By doing this we can >>> completely remove the platform-specific ThreadLocalStorage >>> implementations, and the associated os::thread_local_storage* >>> calls, plus all the uses of ThreadLocalStorage::thread() and >>> ThreadLocalStorage::get_thread_slow(). This extends the previous >>> work done on Solaris to implement ThreadLocalStorage::thread() >>> using compiler-based thread-locals. >>> >>> We can also consolidate nearly all the os_cpu versions of >>> MacroAssembler::get_thread on x86 into one cpu specific one ( a >>> special variant is still needed for 32-bit Windows). >>> >>> As a result of this change we have further potential cleanups: >>> - all the src/os/<os>/vm/thread_<os>.inline.hpp files are now >>> completely empty and could also be removed >>> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux >>> sp-map "cache" on 32-bit) now has no affect and so could be >>> completely removed from the build system >>> >>> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, >>> but could add the removal of the "inline" files to this CR if >>> people think it worth removing them. >>> >>> I have one missing piece on Aarch64 - I need to change >>> MacroAssembler::get_thread to simply call Thread::current() as >>> on other platforms, but I don't know how to write that. I would >>> appreciate it if someone could give me the right code for that. >>> >>> I would also appreciate comments/testing by the AIX and PPC64 >>> folk as well. >>> >>> A concern about memory-leaks had previously been raised, but >>> experiments using simple C code on linux 86 and Solaris showed >>> no issues. Also note that Aarch64 already uses this kind of >>> thread-local. >>> >>> Thanks, >>> David >>> >>> >>> >>>