Ping!
If this fix is too complicated, there is a simplification I can make,
but at the cost of abandoning some attempts to determine the current
frame when this error condition pops up. At the start of
validateInterpreterFrame() it attempts to verify that the frame is valid
by verifying that frame->method and frame->bcp are valid. This part is
pretty simple. The complicated part is everything that follows if the
verification fails. It attempts to error correct the situation by
looking at various register contents and stack contents. I could just
abandon this complicated code and return false if frame->method and
frame->bcp don't check out. Upon return, the caller's code would be
simplified to:
if (validateInterpreterFrame(sp, fp, pc)) {
return true; // We're done. setValues() has been called
for valid interpreter frame.
} else {
return checkLastJavaSP();
}
So there's still a chance we can determine a valid current frame if
"last java frame" has been setup. However, if not setup we would not be
able to. This is where the complicated code in
validateInterpreterFrame() is useful because it can usually determine
the current frame, even if "last java frame" is not setup, but it's rare
enough that we run into this situation that I think failing to get the
current frame is ok.
So if I can get a couple promises for reviews if I make this change,
I'll go ahead and do it and send out a new RFR.
thanks,
Chris
On 6/18/20 5:54 PM, Chris Plummer wrote:
[I've added runtime-dev to this SA review since understanding
interpreter invokes (code generated by
TemplateInterpreterGenerator::generate_normal_entry()) and stack
walking is probably more important than understanding SA.]
Hello,
Please help review the following:
https://bugs.openjdk.java.net/browse/JDK-8244383
http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html
The crux of the bug is when doing stack walking the topmost frame is
in an inconsistent state because we are in the middle of pushing a new
interpreter frame. Basically we are executing code generated by
TemplateInterpreterGenerator::generate_normal_entry(). Since the PC
register is in this code, SA assumes the topmost frame is an
interpreter frame.
The first issue with this interpreter frame assumption is if we
haven't actually pushed the frame yet, then the current frame is the
caller's frame, and could be compiled. But since SA thinks it's
interpreted, later on it tries to convert the frame->bcp to a BCI, but
frame->bcp is only valid for interpreter frames. Thus the "illegal
BCI" failures. If the previous frame happened to be interpreted, then
the existing SA code works fine.
The other state of frame pushing that was problematic was when the new
frame had been pushed, but frame->method and frame->bcp were not setup
yet. This also would lead to "illegal BCI" later on because garbage
would be stored in these locations.
Fixing the above problems requires trying to determine the state of
the frame push through a series of checks, and then adapting what is
considered to be the current frame based on the outcome of the checks.
The first things checked is that frame->method is valid (we can
successfully instantiate a wrapper for the Method* without failure)
and that frame->bcp is within the method. If both these pass then we
can use the frame as-is.
If the above checks fail, then we try to determine whether the issue
is that the frame is not yet pushed and the current frame is actually
compiled, or the frame has been pushed but not yet initialized. This
is done by first getting the return address from the stack or RAX
(it's location depends on how far along we are in the entry code) and
comparing this to what is stored in frame->return_addr. If they are
the same, then we have pushed the frame but not yet initialized it. In
this case we use the previous frame (senderSP() and senderFP()) as the
current frame since the current frame is not yet initialized. If the
return address check fails, then we assume the new frame is not yet
pushed, and and treat the current frame as compiled, even though PC
points into the interpreter (we replace PC with RAX in this case).
Comments in the code pretty well explain all the above, so it is
probably easier to follow the logic in the code along with the
comments rather than apply my above description to the code.
I should add that it's very rare that we ever get into this special
error handling code. This bug was very hard to reproduce initially. I
was only able to make progress with reproducing and debugging by
inserting delay loops in various spots in the code generated by
TemplateInterpreterGenerator::generate_normal_entry(). By doing this I
was able to reproduce the issue quite easily and hit all the logic in
the new code I've added.
The fix is basically entirely contained within
AMD64CurrentFrameGuess.java. The rest of the changes are minor:
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java
-Main fix for CR
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java
-Added getInterpreterFrameBCP(), which is now needed by
AMD64CurrentFrameGuess.java
-I also simplified some code by using the existing
getInterpreterFrameMethod()
rather than replicating inline what it does.
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java
-I noticed the windows version of this code had some extra checks that
were missing
from the bsd version. I then looked at the linux version, but it had
been heavily modified
a short while back to leverage DWARF info to determine frames. So I
looked at the previous
rev and it too had these extra checks. I decided to add them to the
BSD port. I'm not sure
if it helps at all, but it certainly doesn't seem to do any harm.
thanks,
Chris