On Mon, 1 Dec 2025 11:59:38 GMT, Stuart Monteith <[email protected]> wrote:
> MemorySegments allocated from shared Arena from
> java.lang.foreign.Arena.ofShared() have their lifecycle controlled by
> jdk.internal.foreign.SharedSession. This class ensures that the
> MemorySegments can't be freed until after a thread has called Arena.close().
> This is implemented using a counter that is atomically incremented when used,
> and decremented when not used, on every invocation of a downcall. While
> shared Arenas allow any thread to use it and to close it, this tracking has a
> cost when multiple threads are contended on it. This patch changes the
> implementation to use multiple counters to reduce contention.
> sun.nio.ch.IOUtil, java.nio.Buffer and
> sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have
> threads releasing the scope different from the ones that allocated them, so a
> ticket that tracks the counter has to be passed over.
>
> The microbenchmark org.openjdk.bench.java.lang.foreign.
> CallOverheadConstant.panama_identity_memory_address_shared_3 was used to
> generate the following results. The scalability was checked on a number of
> platforms with the JMH parameter "-t" specifying the number of threads.
> Measurements are in ns/op .
>
> The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the AMD
> Epyc 9654.
>
> | Threads | N1 | N2 | V1 | V2 | Xeon |
> Epyc |
> |---------|-------|-------|-------|-------|-------|-------|
> | 1 | 30.88 | 32.15 | 33.54 | 32.82 | 27.46 |
> 8.45 |
> | 2 | 142.56 | 134.48 | 132.01 | 131.50 | 116.68 | 46.53
> |
> | 4 | 310.18 | 282.75 | 287.59 | 271.82 | 251.88 | 86.11
> |
> | 8 | 702.02 | 710.29 | 736.72 | 670.63 | 533.46 | 194.60
> |
> | 16 | 1,436.17 | 1,684.80 | 1,833.69 | 1,782.78 | 1,100.15 |
> 827.28 |
> | 24 | 2,185.55 | 2,508.86 | 2,732.22 | 2,815.26 | 1,646.09 | 1,530.28
> |
> | 32 | 2,942.48 | 3,432.84 | 3,643.64 | 3,782.23 | 2,236.81 |
> 2,278.52 |
> | 48 | 4,466.56 | 5,174.72 | 5,401.95 | 5,621.41 | 4,926.30 |
> 3,026.58 |
>
> After:
>
> | Threads | N1 | N2 | V1 | V2 | Xeon |
> Epyc |
> |---------|-------|-------|-------|-------|-------|-------|
> | 1 | 32.41 | 32.11 | 34.43 | 31.32 | 27.94 | 9.82 |
> | 2 | 32.64 | 33.72 | 35.11 | 31.30 | 28.02 | 9.81 |
> | 4 | 32.71 | 36.84 | 34.67 | 31.35 | 28.12 | 10.49 |
> | 8 | 58.22 | 31.60 | 36.87 | 31.72 | 47.09 |...
Thanks for looking at this.
Besides the code comments, there are few points that I'd like to bring up as
well:
1. I'm not sure I see using a ticket being worth it vs. just calling
`getCounter` again in `release0`. The JIT understands that the current thread
doesn't change in most situations, so it should be able to common out the load
of the hash code.
2. This adds significant footprint to SharedSession. Especially on server class
machines which can have as much as to 512 logical processors (or more), this
would add 8 pages of memory use per session instance, assuming the typical 64
byte cache line size, just for the counters.
3. This is already a tricky bit of code in which we've seen several hard to
diagnose issues. So, making changes here makes me a bit nervous, and I'm
honestly having a hard time judging the cost vs. benefit of this change,
considering also that the benchmark is testing the absolute worst case scenario
where the acquire/release are being hammered. (and now I _really_ wish I had
found the time to look at this earlier).
Given 2. and 3. taken together, and out of an abundance of caution, I suggest
keeping both the old and new implementations as separate MemorySessionImpl
sub-classes, and then using a system property to select one or the other in
`createShared`. Just in case someone wants to switch back to the single counter
implementation, or in case we want to use this to diagnose an issue. (Having a
single static property select between the impls should also avoid issues with
megamorphic call sites since the total number of loaded impls in a given
process should stay the same).
src/java.base/share/classes/jdk/internal/foreign/ConfinedSession.java line 54:
> 52: public int acquire0() {
> 53: checkValidState();
> 54: if (acquireCount == MAX_FORKS) {
This acquireCount field should be moved to ConfinedSession
src/java.base/share/classes/jdk/internal/foreign/SharedResourceList.java line
66:
> 64: if (FST.getAcquire(this) != ResourceCleanup.CLOSED_LIST) {
> 65: //ok now we're really closing down
> 66: ResourceCleanup prev = null;
My IDE tells me this `null` assignment is redundant.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 51:
> 49:
> 50: final static private int numCounters;
> 51: final static private int mask;
Suggestion:
final static private int COUNTER_MASK;
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 54:
> 52:
> 53: // The number of ints per cacheline.
> 54: final static private int multiplier;
Suggestion:
final static private int INTS_PER_COUNTER;
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 56:
> 54: final static private int multiplier;
> 55: final static int CNT_CLOSING = -1;
> 56: final static int CNT_CLOSED = -2;
Please use the existing style for modifier order here: `private static final`,
and capitalize `static final` field names.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 58:
> 56: final static int CNT_CLOSED = -2;
> 57:
> 58: private static final jdk.internal.misc.Unsafe UNSAFE =
> jdk.internal.misc.Unsafe.getUnsafe();
Could you use an import here?
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 78:
> 76:
> 77: // Each counter is an integer on its own cacheline.
> 78: multiplier = ((cacheLineSize < Integer.BYTES) ? 64 :
> cacheLineSize) / Integer.BYTES;
Why 64 and not 4? Or just `Math.max(cacheLineSize, Integer.BYTES)`.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 101:
> 99: private int compareAndExchange(int index, int expected, int value) {
> 100: assert numCounters > index;
> 101: return counters.compareAndExchange(index * multiplier, expected,
> value);
Suggestion:
@ForceInline
private int getAcquireCounter(int index) {
assert numCounters > index;
return counters.getAcquire(index * multiplier);
}
@ForceInline
private int compareAndExchangeCounter(int index, int expected, int value) {
assert numCounters > index;
return counters.compareAndExchange(index * multiplier, expected, value);
Also, maybe you want to use these from `justClose`.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 129:
> 127: old = getAcquire(ticket);
> 128: Thread.onSpinWait();
> 129: } while (old == CNT_CLOSING);
My worry with something like this is that we don't make monotonic progress
between the states, but we might back off and retry.
In other words, if a closing thread fails to close the session, we loop back
around, but might end up here again if another thread tries to close the
session again, so we could theoretically get 'stuck' here.
In this case I believe it's okay since failing to close a session is typically
an indication of an issue in the application logic, so we don't expect this to
happen over and over again. Although, we've seen this being used in the wild as
a fallback where close was called on a maybe-closed session from a cleaner.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 151:
> 149: old = compareAndExchange(ticket, value, value - 1);
> 150: } else {
> 151: throw alreadyClosed();
Please keep the existing comment for this case
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 157:
> 155:
> 156: synchronized void justClose() {
> 157: int value;
Why is this declared here?
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 180:
> 178: // It is already closed - throw an exception.
> 179: throw alreadyClosed();
> 180: }
I don't see how this could occur. You've made `justClose` synchronized, which
means we will always see an up to date `state`, and threads can not race to
close the session anymore.
src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 220:
> 218: public boolean isCloseable() {
> 219: if (state == CLOSED) {
> 220: return true;
This doesn't look right to me. I don't think any override is needed here. It's
not a predicate intended to ask whether a session is closeable in it's current
state, but rather to ask whether the session kind can be explicitly closed at
all.
src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java
line 301:
> 299: sessionTickets[i] = ticketLocal;
> 300: cb.loadConstant(0)
> 301: .istore(ticketLocal);
Suggestion:
.istore(ticketLocal);
src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java
line 566:
> 564: int ticketLocal = sessionTickets[i];
> 565: cb.aload(scopeLocal)
> 566: .ifThen(Opcode.IFNONNULL, ifCb -> {
Suggestion:
.ifThen(Opcode.IFNONNULL, ifCb -> {
src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java
line 569:
> 567: ifCb.aload(scopeLocal)
> 568: .iload(ticketLocal)
> 569: .invokevirtual(CD_MemorySessionImpl, "release0",
> MTD_RELEASE0);
Suggestion:
.iload(ticketLocal)
.invokevirtual(CD_MemorySessionImpl, "release0",
MTD_RELEASE0);
test/jdk/java/foreign/TestMemorySession.java line 396:
> 394: int ticket = sessionImpl.acquire0();
> 395:
> 396: assertFalse(sessionImpl.isCloseable());
i.e. this should return `true`, shared session may be closed.
test/jdk/java/foreign/TestMemorySession.java line 431:
> 429: } catch (IllegalStateException e) {
> 430: fail();
> 431: }
Failure is implicit if an exception is thrown, so this try/catch shouldn't be
necessary.
test/jdk/java/foreign/TestMemorySession.java line 439:
> 437: } catch (IllegalStateException e) {
> 438: fail();
> 439: }
Same here
test/jdk/java/foreign/TestMemorySession.java line 466:
> 464: boolean closeLose = false;
> 465:
> 466: while (true) {
We can't just loop indefinitely. This will definitely result in test timeouts
eventually. Please add an upper bound for the number of iterations.
test/jdk/java/foreign/TestMemorySession.java line 481:
> 479: threads[i].start();
> 480: }
> 481: Thread.sleep(100);
Thread.sleep is not very reliable, especially in highly concurrent CI
environment. If the intent is to wait for the threads so start up, I suggest
using a CountDownLatch.
-------------
PR Review: https://git.openjdk.org/jdk/pull/28575#pullrequestreview-3544278113
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782215557
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782052307
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782237502
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782236426
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2676944433
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2592405598
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782072535
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782322487
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2635843460
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782220024
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782226537
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782264630
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782317235
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2774197899
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2592332912
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2592333337
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782350583
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782365127
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782370062
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782403652
PR Review Comment: https://git.openjdk.org/jdk/pull/28575#discussion_r2782390762