Thanks Alex and Serguei!
Chris
On 4/27/20 6:23 PM, Alex Menkov wrote:
Hi Chris,
great analysis.
LGTM++
--alex
On 04/27/2020 14:52, serguei.spit...@oracle.com wrote:
Hi Chris,
The fix looks reasonable.
Thanks,
Serguei
On 4/27/20 12:16, Chris Plummer wrote:
Ping! Please help review if you can. There's also an RFR out for
8243500, which is related.
thanks,
Chris
-------- Forwarded Message --------
Subject: RFR(S) 8231634: SA stack walking fails with "illegal bci"
Date: Thu, 23 Apr 2020 14:35:49 -0700
From: Chris Plummer <chris.plum...@oracle.com>
To: serviceability-dev <serviceability-dev@openjdk.java.net>
Hello,
Please help review the following:
https://bugs.openjdk.java.net/browse/JDK-8231634
http://cr.openjdk.java.net/~cjplummer/8231634/webrev.00/index.html
The fix is fairly simple: Don't use R13 as the BCP if it does not
point into the methods actual bytecodes. Instead rely on the BCP
value flushed to the frame. Details below are mostly there to help
you understand the big picture of what is going on.
First a bit a background on what the live_bcp field is in the
X86Frame. BCP is "bytecode pointer". It points to the current
bytecode being executed. I'm mostly just talking about interpreter
frames here, but I think for compiled frames it points to the
current native instruction being executed. For all but the current
frame, frame->bcp reliably contains the BCP. For the current frame
you normally need to look in a register for an up to date BCP. On
x86_64 that's R13. When trying to find and create the current frame
for a thread, eventually you end up in
LinuxAMD64JavaThreadPDAccess.getCurrentFrameGuess(), which does:
// pass the value of R13 which contains the bcp for the top
level frame
Address bcp =
context.getRegisterAsAddress(AMD64ThreadContext.R13);
return new X86Frame(guesser.getSP(), guesser.getFP(),
guesser.getPC(), null, bcp <-- live_bcp field);
So in this case the X86Frame is constructed with R13 stored in the
live_bcp field. For any frame other than the current frame, the
live_bcp field will be NULL.
And one other bit of clarification before moving on to the bug.
You'll see BCX in code. For example, INTERPRETER_FRAME_BCX_OFFSET.
BCX == BCP. I'm not sure why BCX was ever chosen. I was tempted to
rename it to BCP, but it's been cloned to all the other SA CPU
ports, so I think it's best to just leave it.
Now on to the bug. Sometimes the interpreter uses R13 as a scratch
register, so it does not always contain the BCP. This is why
sometimes tests will see an "illegal bci" assert from SA. It's
because r13 is currently not pointing to the BCP of the current
frame, and thus can't be converted to a BCI.
What this fix does take advantage of the fact that when r13 is used
as a scratch register, it is first flushed to frame->bcp. So with
this fix R13 is only used as the BCP if it points within the
Method's bytecodes. If it does not, then SA will use the BCP flushed
to frame->bcp. The reason we don't always used frame->bcp is because
it is not kept up to date as the interpreter moves between
bytecodes, but we know it is up to date if R13 is currently being
used as a scratch register. So in other words, an invalid R13
implies that frame->bcp is up to date.
And if you find yourself thinking "X86Frame.java supports is for
both 32-bit and 64-bit x86, but R13 does not exists on 32-bit.",
your right. A couple of years ago the concept of live_bcp was
introduced to fix JDK-8214226 [1]. It only added support for fixing
64-bit and ignored 32-bit. So you'll never see live_bcp getting set
for 32-bit, and thus JDK-8214226 is still broken on 32-bit, but it
also means 32-bit is not impacted by the bug I'm fixing here. So the
fix for JDK-8214226 is why you see R13 references in the
X86Frame.java, but the code will still work for any platform that
sets up live_bcd properly, even if it's not actually R13.
...and one more thing. 8214226 [1] actually introduced this "illegal
bci" bug. However 8214226 [1] was only ever applied to LinuxAMD64.
Never for BSD or Windows. This means two things: (1) the "illegal
bci" bug never turned up on these platforms, and (2) 8214226 is
still broken on these platform, meaning the bci returned by
getInterpreterFrameBCI() is likely often stale, meaning stack traces
will not show the proper line number for the current frame. It seems
we don't have a test to catch this. I've filed 8243500 [2] to cover
fixing 8214226 on BSD and Windows.
[1] https://bugs.openjdk.java.net/browse/JDK-8214226
[2] https://bugs.openjdk.java.net/browse/JDK-8243500
thanks,
Chris