Hi Chris,
The fix looks reasonable.
Thanks,
Serguei
On 4/27/20 12:16, Chris Plummer wrote:
The fix looks reasonable.
Thanks,
Serguei
On 4/27/20 12:16, Chris Plummer wrote:
Ping! Please help review if you can. There's also an RFR out for 8243500, which is related.
thanks,
Chris
-------- Forwarded Message --------
Subject: RFR(S) 8231634: SA stack walking fails with "illegal bci" Date: Thu, 23 Apr 2020 14:35:49 -0700 From: Chris Plummer <chris.plum...@oracle.com> To: serviceability-dev <serviceability-dev@openjdk.java.net>
Hello,
Please help review the following:
https://bugs.openjdk.java.net/browse/JDK-8231634
http://cr.openjdk.java.net/~cjplummer/8231634/webrev.00/index.html
The fix is fairly simple: Don't use R13 as the BCP if it does not point into the methods actual bytecodes. Instead rely on the BCP value flushed to the frame. Details below are mostly there to help you understand the big picture of what is going on.
First a bit a background on what the live_bcp field is in the X86Frame. BCP is "bytecode pointer". It points to the current bytecode being executed. I'm mostly just talking about interpreter frames here, but I think for compiled frames it points to the current native instruction being executed. For all but the current frame, frame->bcp reliably contains the BCP. For the current frame you normally need to look in a register for an up to date BCP. On x86_64 that's R13. When trying to find and create the current frame for a thread, eventually you end up in LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(), which does:
// pass the value of R13 which contains the bcp for the top level frame
Address bcp = context.getRegisterAsAddress(AMD64ThreadContext.R13);
return new X86Frame(guesser.getSP(), guesser.getFP(), guesser.getPC(), null, bcp <-- live_bcp field);
So in this case the X86Frame is constructed with R13 stored in the live_bcp field. For any frame other than the current frame, the live_bcp field will be NULL.
And one other bit of clarification before moving on to the bug. You'll see BCX in code. For example, INTERPRETER_FRAME_BCX_OFFSET. BCX == BCP. I'm not sure why BCX was ever chosen. I was tempted to rename it to BCP, but it's been cloned to all the other SA CPU ports, so I think it's best to just leave it.
Now on to the bug. Sometimes the interpreter uses R13 as a scratch register, so it does not always contain the BCP. This is why sometimes tests will see an "illegal bci" assert from SA. It's because r13 is currently not pointing to the BCP of the current frame, and thus can't be converted to a BCI.
What this fix does take advantage of the fact that when r13 is used as a scratch register, it is first flushed to frame->bcp. So with this fix R13 is only used as the BCP if it points within the Method's bytecodes. If it does not, then SA will use the BCP flushed to frame->bcp. The reason we don't always used frame->bcp is because it is not kept up to date as the interpreter moves between bytecodes, but we know it is up to date if R13 is currently being used as a scratch register. So in other words, an invalid R13 implies that frame->bcp is up to date.
And if you find yourself thinking "X86Frame.java supports is for both 32-bit and 64-bit x86, but R13 does not exists on 32-bit.", your right. A couple of years ago the concept of live_bcp was introduced to fix JDK-8214226 [1]. It only added support for fixing 64-bit and ignored 32-bit. So you'll never see live_bcp getting set for 32-bit, and thus JDK-8214226 is still broken on 32-bit, but it also means 32-bit is not impacted by the bug I'm fixing here. So the fix for JDK-8214226 is why you see R13 references in the X86Frame.java, but the code will still work for any platform that sets up live_bcd properly, even if it's not actually R13.
...and one more thing. 8214226 [1] actually introduced this "illegal bci" bug. However 8214226 [1] was only ever applied to LinuxAMD64. Never for BSD or Windows. This means two things: (1) the "illegal bci" bug never turned up on these platforms, and (2) 8214226 is still broken on these platform, meaning the bci returned by getInterpreterFrameBCI() is likely often stale, meaning stack traces will not show the proper line number for the current frame. It seems we don't have a test to catch this. I've filed 8243500 [2] to cover fixing 8214226 on BSD and Windows.
[1] https://bugs.openjdk.java.net/browse/JDK-8214226
[2] https://bugs.openjdk.java.net/browse/JDK-8243500
thanks,
Chris