> While working on improving the TLAB sizing code for ZGC @kstefanj ran into an 
> issue with the following tests failing:
> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorInterpreterObjectTest.java
> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java
> 
> The reason for seeing the problems now is that with the new sizing code ZGC 
> used smaller TLABs at first, before resizing them to a proper size (to lower 
> the waste). In the HeapMonitor tests we don't allocate enough to trigger GCs 
> that will resize the TLABs so most of the tests will now run with small 
> TLABs. This should not be a problem, but it turns out the current sampling 
> code is not working properly when you get a lot of outside TLAB allocations. 
> You get those when trying to allocate a fairly large object (~1400B) that 
> won't fit into the TLAB, but there are still quite a bit of room in the TLAB 
> so we don't want to throw it away and take a new one.
> 
> The problem in the current code is that we keep track of when to sample with 
> multiple variables and when getting out of TLAB allocations these get out of 
> sync.
> 
> The proposed patch is the result of a restructuring and fixing of the the 
> code that me and @kstefanj did to fix this issue.
> 
> The idea is to better account how much we have allocated in three different 
> buckets:
> * Outside of TLAB allocations
> * Accounted TLAB allocations
> * The last bit of TLAB allocations that hasn't been accounted yet
> 
> And then use the sum of that to compare against a *non-changing* threshold to 
> see if it is time to take a sample.
> 
> There are a few things to think about when reading this code:
> * The thread can allocate and retire multiple TLABs before we cross the 
> sample threshold.
> * The sampling can take multiple samples in a single TLAB
> * Any overshooting of the sample threshold triggers only one sample and the 
> extra bytes are ignored when checking for the next sample.
> 
> There are some restructuring in the patch to confine the ThreadHeapSampler 
> variables and code. For example:
> 
> 1) Moved accounting variables out of TLAB and into the ThreadHeapSampler
> 2) Moved thread allocation accounting and sampling code out of the TLAB
> 3) Moved retiring of TLABs out of make_parseable (needed to support (2))
> 
> Some of that could be extracted into a separate PR if that's deemed 
> worthwhile.
> 
> Tested with the HeapMonitor tests various TLAB sizes.
> 
> If there are anyone using these APIs it would be nice to get feedback if 
> these changes work well for you.

Stefan Karlsson has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains seven additional 
commits since the last revision:

 - Merge remote-tracking branch 'upstream/master' into 
8356372_thread_heap_sampler
 - Tweak names
 - Fix sample threshold setup
 - Apply suggestions from code review
   
   Co-authored-by: Stefan Johansson <54407259+kstef...@users.noreply.github.com>
 - Re-enable tests after merge
 - Merge remote-tracking branch 'upstream/master' into 
8356372_thread_heap_sampler
 - 8356372: JVMTI heap sampling not working properly with outside TLAB 
allocations

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25114/files
  - new: https://git.openjdk.org/jdk/pull/25114/files/a318053e..dd6d8b4d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25114&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25114&range=02-03

  Stats: 28120 lines in 704 files changed: 10509 ins; 13536 del; 4075 mod
  Patch: https://git.openjdk.org/jdk/pull/25114.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25114/head:pull/25114

PR: https://git.openjdk.org/jdk/pull/25114

Reply via email to