On Mon, 9 Feb 2026 12:50:14 GMT, Jorn Vernee <[email protected]> wrote:
> Thanks for looking at this. > > Besides the code comments, there are few points that I'd like to bring up as > well: > > 1. I'm not sure I see using a ticket being worth it vs. just calling > `getCounter` again in `release0`. The JIT understands that the current thread > doesn't change in most situations, so it should be able to common out the > load of the hash code. > > 2. This adds significant footprint to SharedSession. Especially on server > class machines which can have as much as to 512 logical processors (or more), > this would add 8 pages of memory use per session instance, assuming the > typical 64 byte cache line size, just for the counters. > > 3. This is already a tricky bit of code in which we've seen several hard > to diagnose issues. So, making changes here makes me a bit nervous, and I'm > honestly having a hard time judging the cost vs. benefit of this change, > considering also that the benchmark is testing the absolute worst case > scenario where the acquire/release are being hammered. (and now I _really_ > wish I had found the time to look at this earlier). > > > Given 2. and 3. taken together, and out of an abundance of caution, I suggest > keeping both the old and new implementations as separate MemorySessionImpl > sub-classes, and then using a system property to select one or the other in > `createShared`. Just in case someone wants to switch back to the single > counter implementation, or in case we want to use this to diagnose an issue. > (Having a single static property select between the impls should also avoid > issues with megamorphic call sites since the total number of loaded impls in > a given process should stay the same). Thank you both for looking at this in such detail, I know it is a lot of work. re: 1 - My original version had both release0() and release0(int ticket), as in most cases it was unnecessary. I decided to require a ticket to reduce the chance of a bug being introduced when the ticket actually did need to be passed between threads. The IOUtil Releasers are example of where the tickets are necessary. The side effect, of course, was many more changes. I benchmarked with both, and it didn't make a measurable difference. re: 2 - I was initially experimenting with a SharedSession and a MultiSharedSession, which was switchable in MemorySessionImp with a property. I didn't submit that, thinking that another option would be undesirable. To avoid the overhead, where it is unnecessary, I have been experimenting with a single counter that only expands to multiple counters when there is contention, but does add to the complexity - I'm not there yet. There doesn't have to be a 1:1 ratio between the number of cores and the number of counters - you can get much of the benefit with a 4:1 or even 8:1 ratio, which would reduce the memory overhead somewhat. While the microbenchmark does hammer the code, there was a real issue the patch was addressing. We had an issue where someone had to work the scalability issue in SharedSession by reimplementing their code with JNI. > Overall, I think that this is a good experiment (and I'm very grateful for > it), but I'm a bit skeptical in seeing it introduced in the JDK codebase. > This is a very sensitive area, with lots of multi-threaded gotchas (lots of > different failure modes to take into account). As I commented originally, I > still feel like separating the ref counter from the liveness bit (ignoring > everything else) brings even more failure modes to the table. > > After discussing all of this with @JornVernee, we agreed that we should > refrain from integrating this work (even if provided merely as an option), > and maybe keep the door open to explore other alternatives that might provide > some relief, while keeping the code more maintainable in the long run Thank you too for taking the time to review this. That is all fair. The circumstance where this was necessary might be quite rare as it only applies when Arena.ofShared() arenas don't scale when memory is shared across a number of threads with many calls to quickly executing native functions. Instead I'd imagine we'd recommend using Arena.ofAuto(), or perhaps confined without sharing the memory, or at worst JNI could be used instead. This issue was spotted using the async-profiler. With a large number of threads executing with the same SharedSession. It was observed that more time and more time is spent in SharedSession.acquire0() and SharedSession.release0() as the load on the server increased. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28575#issuecomment-3884203193
