> On Nov 20, 2018, at 3:50 AM, David Holmes <david.hol...@oracle.com> wrote: > > After discussions with Kim I've decided to split out the NJT list update into > a separate RFE: > > https://bugs.openjdk.java.net/browse/JDK-8214097 > > So only the change in management.cpp needs reviewing and testing. > > Updated webrev: > > http://cr.openjdk.java.net/~dholmes/8212207/webrev.v2/
Looks good. > > Thanks, > David > > On 20/11/2018 10:01 am, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8212207 >> webrev: http://cr.openjdk.java.net/~dholmes/8212207/webrev/ >> There is an internal management API that reports CPU times for >> NonJavaThreads (NJTs). That functionality requires a valid/live target >> thread so that we can use its pthread_t identity to obtain its CPU clock via >> pthread_getcpuclockid(). >> There is an iteration mechanism for NJTs in which the NJT is registered >> during its constructor and de-registered during its destructor. A thread >> that has only been constructed has not yet executed and so is not a valid >> target for this management API. This seems to be the cause of failures >> reported in this bug (and JDK-8213434). Registering a NJT only when it >> starts executing is an appealing fix for this, but that impacts all current >> users of the NJT list and straight-away causes a problem with the BarrierSet >> initialization logic. So I don't attempt that. >> Instead the first part of the fix is for ThreadTimesClosure::do_thread to >> skip threads that have not yet executed - which we can recognize by seeing >> an uninitialized (i.e. zero) stackbase. >> A second part of the fix, which can be deferred to a separate RFE for NJT >> lifecycle management if desired, tackles the problem of encountering a >> terminated thread during iteration - which can also lead to SEGVs. This can >> arise because NJT's are not actually "destructed", even if they terminate, >> and so they never get removed from the NJT list. Calling destructors is >> problematic because the code using these NJTs assume they are always valid. >> So the fix in this case is to move the de-registering from the NJT list out >> of the destructor and into the Thread::call_run() method so it is done >> before a thread actually terminates. This can be considered a first step in >> cleaning up the NJT lifecycle, where the remaining steps touch on a lot of >> areas and so need to be handled separately e.g. see JDK-8087340 for shutting >> down WorkGang GC worker threads. >> Testing: tiers 1 -3 >> I should point out that I've been unable to reproduce this failure locally, >> even after thousands of runs. I'm hoping Zhengyu can test this in the >> conditions reported in JDK-8213434. >> Thanks, >> David