On Tue, 2 Sep 2025 15:19:58 GMT, Marc Chevalier <[email protected]> wrote:

> That fixes a pair of issues. @TobiHartmann fixed one where some native 
> function call would mess up registers. Pushing and popping them solves it.
> 
> The second part is about using `x29`, aka `fp` (frame pointer), as a regular 
> register. Even when not used as the frame pointer register `x29` is 
> nevertheless saved on the stack when setting up the frame, for when it's used 
> as actually the frame pointer. On exit, `x29` and `x30` are restored from 
> there. But with Valhalla, they can be saved twice on the stack! When the 
> non-scalarized entry point of a C2-compiled method is used, one might need a 
> bit more space to scalarize the arguments. This leads to shape such as drawn 
> here.
> 
> https://github.com/openjdk/valhalla/blob/63314117aa0c30f1a5928b56d21d944de063b8c6/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L6107-L6126
> 
> Currently, `LR` and `FP` are restored from the copy `#1`. But the comment is 
> not quite exact: `FP #1` and `FP #2` won't be always the same in case it 
> contained an oop to an object that the GC moved between frame creation and 
> frame destruction: the GC knows only one spilled location for each register, 
> and in the case of `x29`, it's `FP #2.`
> 
> It seems difficult to adapt the GC mechanisms to handle more than one 
> location to update for a register. Likewise, we know that `LR #1` is update, 
> but `LR #2` doesn't seem to be: it's fine to update only one, if we know 
> which one is the trustworthy one. Updating only one `FP` won't mess with the 
> cases where we are actually using `x29` as frame pointer since then, the GC 
> won't update any of them as they won't contain oops.
> 
> The process is rather simple: since we need to load `sp_inc` to know how much 
> to pop from the stack, by changing the `ldr` into a `ldp`, we can load `FP 
> #2` into `x29` at the same time. Then, we reduce the stack by `sp_inc`, we 
> read LR#1 (always 1 word under the new top), and we pop the two last words 
> left. This works whether we need some stack extension or not, and makes sure 
> `sp` stays 16-byte aligned when reading `LR #1` (I've learned the hard way, 
> the SIGBUS way, it's a thing...). This requires only one more instruction 
> than the old way. Let's call that a lesser evil.
> 
> Thanks,
> Marc

Great job in deep diving right into this, identifying the root cause and 
fixing, Marc! Looks good to me.

src/hotspot/cpu/aarch64/gc/shared/cardTableBarrierSetAssembler_aarch64.cpp line 
93:

> 91:     if (!precise || (dst.index() == noreg && dst.offset() == 0)) {
> 92:       if (tmp3 != noreg) {
> 93:         // TODO This change is from before the 'tmp3' arg was added to 
> mainline, check if it's still needed. Same on x64. Also, this should be a __ 
> lea

Suggestion:

        // TODO 8366717 This change is from before the 'tmp3' arg was added to 
mainline, check if it's still needed. Same on x64. Also, this should be a __ lea

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 7408:

> 7406:   }
> 7407: 
> 7408:   // TODO This is probably okay but looks fishy because stream is reset 
> in the "Set null marker to zero" case just above. Same on x64.

Suggestion:

  // TODO 8366717 This is probably okay but looks fishy because stream is reset 
in the "Set null marker to zero" case just above. Same on x64.

src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 3129:

> 3127: 
> 3128:   int pack_fields_off = __ offset();
> 3129:   // TODO this would need to go to pack_fields_jobject_off as well ...

Suggestion:

-------------

Marked as reviewed by thartmann (Committer).

PR Review: 
https://git.openjdk.org/valhalla/pull/1540#pullrequestreview-3181030650
PR Review Comment: 
https://git.openjdk.org/valhalla/pull/1540#discussion_r2319278789
PR Review Comment: 
https://git.openjdk.org/valhalla/pull/1540#discussion_r2319279309
PR Review Comment: 
https://git.openjdk.org/valhalla/pull/1540#discussion_r2319221692

Reply via email to