MemorySegments allocated from shared Arena from
java.lang.foreign.Arena.ofShared() have their lifecycle controlled by 
jdk.internal.foreign.SharedSession. This class ensures that the MemorySegments 
can't be freed until after a thread has called Arena.close(). This is 
implemented using a counter that is atomically incremented when used, and 
decremented when not used, on every invocation of a downcall. While shared 
Arenas allow any thread to use it and to close it, this tracking has a cost 
when multiple threads are contended on it. This patch changes the 
implementation to use multiple counters to reduce contention. 
sun.nio.ch.IOUtil, java.nio.Buffer and 
sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have threads 
releasing the scope different from the ones that allocated them, so a ticket 
that tracks the counter has to be passed over.

The microbenchmark org.openjdk.bench.java.lang.foreign. 
CallOverheadConstant.panama_identity_memory_address_shared_3 was used to 
generate the following results. The scalability was checked on a number of 
platforms with the JMH parameter "-t" specifying the number of threads. 
Measurements are in ns/op .

The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the AMD 
Epyc 9654.

| Threads |   N1   |      N2   |         V1  |       V2   |    Xeon   |    Epyc 
|
|---------|-------|-------|-------|-------|-------|-------|
|    1  |    30.88   |   32.15  |    33.54  |    32.82  |    27.46  |     8.45 |
|   2    | 142.56    | 134.48  |   132.01 |    131.50 |    116.68   |   46.53 |
|  4    |  310.18   |  282.75  |   287.59  |   271.82  |   251.88   |   86.11 |
|  8    |  702.02   |  710.29  |   736.72  |   670.63  |   533.46   |  194.60 |
|   16  |  1,436.17 |  1,684.80 |  1,833.69 |  1,782.78 |  1,100.15 |    827.28 
|
|  24  | 2,185.55 |  2,508.86 |  2,732.22 |  2,815.26 |  1,646.09 |  1,530.28  |
|   32  | 2,942.48 |  3,432.84 |  3,643.64 |  3,782.23 |  2,236.81 |  2,278.52 |
|   48  | 4,466.56 |  5,174.72 |  5,401.95 |  5,621.41 |  4,926.30  | 3,026.58 |

After:

| Threads |   N1   |      N2   |         V1  |       V2   |    Xeon   |    Epyc 
|
|---------|-------|-------|-------|-------|-------|-------|
|    1  |    32.41  |    32.11  |    34.43  |  31.32  |    27.94  |     9.82 |
|    2  |    32.64  |    33.72  |    35.11  |  31.30  |    28.02  |     9.81 |
|    4  |    32.71  |    36.84  |    34.67  |  31.35  |   28.12   |   10.49 |
|    8  |    58.22  |    31.60  |    36.87  |  31.72  |    47.09  |    16.52 |
|   16 |     70.15 |     47.76 |     52.37 |   47.26 |     70.91 |     14.53 |
|   24 |     77.38 |     78.14 |     81.67 |   71.98 |     87.20  |    21.70 |
|   32 |     87.54 |     98.01 |     84.73 |   86.79 |    109.25  |    22.65 |
|   48 |    121.54|     128.14  |   120.51 |  104.35 |    175.08 |     26.85 |

-------------

Commit messages:
 - 8371260: Improve scaling of downcalls using MemorySegments allocated with 
shared arenas

Changes: https://git.openjdk.org/jdk/pull/28575/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28575&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8371260
  Stats: 628 lines in 33 files changed: 402 ins; 38 del; 188 mod
  Patch: https://git.openjdk.org/jdk/pull/28575.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28575/head:pull/28575

PR: https://git.openjdk.org/jdk/pull/28575

Reply via email to