On Wed, 20 Jan 2021 03:36:52 GMT, Andrei Pangin <[email protected]> wrote:
>> The changes look ok to me. I think it would be good to get someone from the >> compiler team to verify your reasoning here. > > Hi Lutz, > > Thank you for working on this issue. > I'm not a formal OpenJDK Reviewer, but as an author of async-profiler project > (which heavily relies on AsyncGetCallTrace), I'm more or less familiar with > this code. Let me leave a few comments. > > 1. The statement `if (candidate.pc() != NULL) return false;` looks a bit odd > to me. When a profiler calls AsyncGetCallTrace, pc of the initial frame is > obtained from a signal context, and it is never NULL. Returning false when pc > != NULL basically invalidates the entire `if (fr->cb() == NULL)` branch. If > this was intended, the branch could be simplified. > 2. At the same time, returning false whenever cb == NULL, discards a fair > amount of valid stack traces, e.g. when a runtime function is called from > Java without switching to in_vm state. I guess, the original intention of > that loop was to handle such cases, but, as you noticed, it does not > currently work well because of the assertions. I'd rather prefer revising the > logic of that loop then instead of dropping all those valid samples. > 3. What I don't really understand from the bug report is why `if (fr->cb() == > NULL)` branch is executed at all in this particular example. According to the > stack trace from the crash dump, the top frame (before a signal handler) is > `ThreadCritical::~ThreadCritical()`. This means, a thread is in_vm state, and > its last_Java_frame should have been set. In this case AsyncGetCallTrace > prefers to take last_Java_frame, and thus cb should be non-NULL. Another > suspicious thing is that a frame below `ThreadCritical::~ThreadCritical()` is > not a C frame. This cannot normally happen (the execution is in the middle of > `ThreadCritical::~ThreadCritical()`, where the stack frame is consistent). > These facts make me think something bad has already happened with the stack > earlier, and the failed guarantee is probably just one manifestation of a > bigger problem. Spent a lot fo time and many thoughts. Finally, I found out it was all for nothing. The failure doesn't occur at the location I was sure it would. Will invest more time and more thoughts... I'll be back! ------------- PR: https://git.openjdk.java.net/jdk/pull/2032
