On Wed, 13 Apr 2022 15:29:00 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:

>> Move the AsyncGetCallTrace method implementation into a separate method and 
>> wrap its call in non-assert compilation mode in `os::ThreadCrashProtection` 
>> like it is done in 
>> [JFR](https://github.com/openjdk/jdk/blob/965ea8d9cd29aee41ba2b1b0b0c67bb67eca22dd/src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp#L165).
>> This prevents AsyncGetCallTrace from crashing on segmentation faults (but 
>> not on `guarantee`s).
>> 
>> If a crash is observed, then the `num_frames` field of the trace is set to 
>> `ticks_unknown_state` (-7) to signal a state that cannot be properly 
>> handled. `ticks_unknown_state` is currently also used for signaling unknown 
>> thread states but this should not be a problem, as the semantic is the same. 
>> If `num_frames` already has an error code then this error code is not 
>> changed. This helps to distinguish between errors in walking threads in Java 
>> and non-Java mode, as `num_frames` is set there before the walking to the 
>> appropriate error code.
>> 
>> _Thanks for @tstuefe for suggesting this._
>
> I'm actually ambivalent about the "production only" thing.
> 
> I dislike having different behavior between debug and release, I prefer to 
> test what later runs in the field. Also, if we prevent crashes because we 
> want to ignore them, we should ignore them in debug too, otherwise, we burn 
> error analysis cycles needlessly. 
> 
> Optionally, I'd make the behavior of ThreadCrashProtection switchable at 
> runtime with a diagnostic XX switch. That switch would control the jumping 
> back in the signal handler. That way, if you encounter strange bugs while 
> using async profiler, one can disable the crash guard and enable asserts and 
> crashes. If you decide to do that, we should also do tests for it, so maybe 
> leave it for a future RFE.

What @tstuefe said about debug vs. production made me wonder whether the assert 
in `ThreadCrashProtection` is still useful at all. I mean, we are deliberately 
using it from threads other than the JFR sampler thread and if that usage is 
not breaking stuff left and right, probably we could remove it? (/cc @mgronlun)

-------------

PR: https://git.openjdk.java.net/jdk/pull/8225

Reply via email to