On Wed, 20 Jan 2021 03:36:52 GMT, Andrei Pangin <[email protected]> wrote:

>> The changes look ok to me. I think it would be good to get someone from the 
>> compiler team to verify your reasoning here.
>
> Hi Lutz,
> 
> Thank you for working on this issue.
> I'm not a formal OpenJDK Reviewer, but as an author of async-profiler project 
> (which heavily relies on AsyncGetCallTrace), I'm more or less familiar with 
> this code. Let me leave a few comments.
> 
> 1. The statement `if (candidate.pc() != NULL) return false;` looks a bit odd 
> to me. When a profiler calls AsyncGetCallTrace, pc of the initial frame is 
> obtained from a signal context, and it is never NULL. Returning false when pc 
> != NULL basically invalidates the entire `if (fr->cb() == NULL)` branch. If 
> this was intended, the branch could be simplified.
> 2. At the same time, returning false whenever cb == NULL, discards a fair 
> amount of valid stack traces, e.g. when a runtime function is called from 
> Java without switching to in_vm state. I guess, the original intention of 
> that loop was to handle such cases, but, as you noticed, it does not 
> currently work well because of the assertions. I'd rather prefer revising the 
> logic of that loop then instead of dropping all those valid samples.
> 3. What I don't really understand from the bug report is why `if (fr->cb() == 
> NULL)` branch is executed at all in this particular example. According to the 
> stack trace from the crash dump, the top frame (before a signal handler) is 
> `ThreadCritical::~ThreadCritical()`. This means, a thread is in_vm state, and 
> its last_Java_frame should have been set. In this case AsyncGetCallTrace 
> prefers to take last_Java_frame, and thus cb should be non-NULL. Another 
> suspicious thing is that a frame below `ThreadCritical::~ThreadCritical()` is 
> not a C frame. This cannot normally happen (the execution is in the middle of 
> `ThreadCritical::~ThreadCritical()`, where the stack frame is consistent). 
> These facts make me think something bad has already happened with the stack 
> earlier, and the failed guarantee is probably just one manifestation of a 
> bigger problem.

Spent a lot fo time and many thoughts. Finally, I found out it was all for 
nothing. The failure doesn't occur at the location I was sure it would. Will 
invest more time and more thoughts... I'll be back!

-------------

PR: https://git.openjdk.java.net/jdk/pull/2032

Reply via email to