We are on aarch64.

When a function needs stack extension, we build a stack that has this shape:

https://github.com/openjdk/valhalla/blob/b1d14c658511cced2a860d9e2741ae89827e25ba/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L6040-L6073

Currently, when leaving the frame, we use LR §1 (I use § not to mess with 
github rendering that interpret `#` as PR references) as return address 
(because it can be patched for deoptimization), and FP §2 to restore x29 
(because when it contains an oop, the GC is only aware of this copy).

In our failing case, we have a C2-compiled frame that is being deoptimized when 
returning from a call to an interpreted method. During deoptimization, the 
function `frame::sender_for_compiled_frame(RegisterMap*) const` is used to 
locate the location on the stack where rfp (x29) is saved.

https://github.com/openjdk/valhalla/blob/b1d14c658511cced2a860d9e2741ae89827e25ba/src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp#L446

Actually this function is a bit more general: it computes the sender frame of a 
compiled frame, and build the `RegisterMap`. The problem is that during 
deoptimization, this function locates the wrong save of rfp (FP §1) because the 
C2 frame is being modified by the deoptimization process and it's not anymore 
recognized as a C2-compiled method that needs stack repairs. In this modified 
frame the sender's sp is correctly known (or the deoptimization mechanism would 
not work), and the saved FP is taken just 2 words above: that is FP §1. On top 
of that, if rfp contained an oop and the GC moved the pointed object during the 
call we are returning from, the value we get for rfp is not valid anymore.

The good and bad news is that the GC also locates the saved location of rfp 
thanks to the same function. The bad news is that GC sees the C2 frame 
correctly, and so `sender_for_compiled_frame` can locate FP §2. We can follow a 
few ideas:
- make the deoptimized frame bottom under FP/LR §2. This is not possible, for 
many reasons: we need LR §1, we need to remove the whole frame to find the 
sender's frame...
- make `sender_for_compiled_frame` detects when the deoptimized frame is the 
one of a C2 compiled method that needs stack repair. No idea how to do that! 
Also, it seems brittle, and more complicated than the next solution.
- always pick FP §1: since the deoptimized frame will pick FP §1, in case it's 
a regular C2 frame, we can also make sure to use FP §1. It is the simplest 
solution and the one I explain after.

In [JDK-8365996](https://bugs.openjdk.org/browse/JDK-8365996), the problem was 
pretty much the opposite: remove_frame was using FP §1 to restore rfp but the 
GC only updates FP §2. So the solution was to restore from FP §2:

https://github.com/openjdk/valhalla/pull/1540/files#diff-0f4150a9c607ccd590bf256daa800c0276144682a92bc6bdced5e8bc1bb81f3aR6140-R6145

Here the solution is to revert this part (restore rfp from FP §1), and let GC 
knows about FP §1 only in `sender_for_compiled_frame`. Overall, let's never 
speak about FP/LR §2. This way, we always have the sender's sp, the saved LR 
and FP consecutively. FP/LR §2 is only needed to make space between the 
unpacked arguments and the locals, as there would be between regular arguments 
and locals. They could have a fictive value and we should probably implement 
that.

This make virtual thread tests fail massively. Surely because of mismatch 
between our choice of FP §1 or 2. Let's problem list this for now... To help 
with that, I introduced `frame::compiled_frame_details() const` to do all this 
little tricks and return the location of LR/FP§1 and the sender's sp at once, 
without letting the frame users have to figure out the internal structure.

I found the extraction of `compiled_frame_details` a bit risky, so I proceded 
in steps: first making the function, calling it from 
`sender_for_compiled_frame` and compare the results with the old way of getting 
everything. These temporary assert weren't triggered, so I actually used the 
returned value of `compiled_frame_details` and removed the newly useless code 
from `sender_for_compiled_frame`. So I'm rather confident it does as good as 
before.

I don't have much opinion about the names of `compiled_frame_details` and 
`CompiledFramePointers`, feel free to suggest better if you have a better idea.

Thanks,
Marc

-------------

Commit messages:
 - Actually use compiled_frame_details
 - ProblemList and comment
 - Factor compiled frame pointers computation
 - First fix

Changes: https://git.openjdk.org/valhalla/pull/1751/files
  Webrev: https://webrevs.openjdk.org/?repo=valhalla&pr=1751&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367151
  Stats: 99 lines in 5 files changed: 60 ins; 28 del; 11 mod
  Patch: https://git.openjdk.org/valhalla/pull/1751.diff
  Fetch: git fetch https://git.openjdk.org/valhalla.git pull/1751/head:pull/1751

PR: https://git.openjdk.org/valhalla/pull/1751

Reply via email to