ThreadCpuTimesDeadlock.java crashes with SEGV in pthread_getcpuclockid+0x0

Kim Barrett Tue, 20 Nov 2018 21:05:07 -0800

> On Nov 20, 2018, at 3:50 AM, David Holmes <david.hol...@oracle.com> wrote:
> 
> After discussions with Kim I've decided to split out the NJT list update into 
> a separate RFE:
> 
> https://bugs.openjdk.java.net/browse/JDK-8214097
> 
> So only the change in management.cpp needs reviewing and testing.
> 
> Updated webrev:
> 
> http://cr.openjdk.java.net/~dholmes/8212207/webrev.v2/


Looks good.

> 
> Thanks,
> David
> 
> On 20/11/2018 10:01 am, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8212207
>> webrev: http://cr.openjdk.java.net/~dholmes/8212207/webrev/
>> There is an internal management API that reports CPU times for 
>> NonJavaThreads (NJTs). That functionality requires a valid/live target 
>> thread so that we can use its pthread_t identity to obtain its CPU clock via 
>> pthread_getcpuclockid().
>> There is an iteration mechanism for NJTs in which the NJT is registered 
>> during its constructor and de-registered during its destructor. A thread 
>> that has only been constructed has not yet executed and so is not a valid 
>> target for this management API. This seems to be the cause of failures 
>> reported in this bug (and JDK-8213434). Registering a NJT only when it 
>> starts executing is an appealing fix for this, but that impacts all current 
>> users of the NJT list and straight-away causes a problem with the BarrierSet 
>> initialization logic. So I don't attempt that.
>> Instead the first part of the fix is for ThreadTimesClosure::do_thread to 
>> skip threads that have not yet executed - which we can recognize by seeing 
>> an uninitialized (i.e. zero) stackbase.
>> A second part of the fix, which can be deferred to a separate RFE for NJT 
>> lifecycle management if desired, tackles the problem of encountering a 
>> terminated thread during iteration - which can also lead to SEGVs. This can 
>> arise because NJT's are not actually "destructed", even if they terminate, 
>> and so they never get removed from the NJT list. Calling destructors is 
>> problematic because the code using these NJTs assume they are always valid. 
>> So the fix in this case is to move the de-registering from the NJT list out 
>> of the destructor and into the Thread::call_run() method so it is done 
>> before a thread actually terminates. This can be considered a first step in 
>> cleaning up the NJT lifecycle, where the remaining steps touch on a lot of 
>> areas and so need to be handled separately e.g. see JDK-8087340 for shutting 
>> down WorkGang GC worker threads.
>> Testing: tiers 1 -3
>> I should point out that I've been unable to reproduce this failure locally, 
>> even after thousands of runs. I'm hoping Zhengyu can test this in the 
>> conditions reported in JDK-8213434.
>> Thanks,
>> David

Re: RFR (S) 8212207: runtime/InternalApi/ThreadCpuTimesDeadlock.java crashes with SEGV in pthread_getcpuclockid+0x0

Reply via email to