On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <d...@openjdk.org> wrote:

> ### Summary
> 
> The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of 
> the existing marking phases and records strongly marked objects in a 
> histogram. If the event is disabled, the original marking closures are used. 
> When enabled new mark-and-count closures are used by the worker threads. Each 
> worker thread updates its local histogram as it marks an object. These local 
> histograms are merged at the conclusion of the marking phase under a mutex. 
> The event is emitted outside a safepoint. Because (most) Shenandoah's marking 
> is done concurrently, so is the object counting work.
> 
> ### Performance
> The performance test were ran using the Extremem benchmark on a default and 
> stress workload. (will edit this section to include data after average time 
> and test for GenShen)
> 
> #### Default workload:
> ObjectCountAfterGC disabled (master branch):
> `[807.216s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s 
> (a =      264 us)`
> `[807.216s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s 
> (a =       91 us)`
> `[807.216s][info][gc,stats    ] Concurrent Mark Roots          =    0.041 s 
> (a =     4099 us)`
> `[807.216s][info][gc,stats    ] Concurrent Marking             =    1.660 s 
> (a =   166035 us)`
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s 
> (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (G)           =    0.004 s 
> (a =      446 us) `
> `[807.216s][info][gc,stats    ] Pause Final Mark (N)           =    0.004 s 
> (a =      357 us)`
> 
> ObjectCountAfterGC disabled (feature branch):
> `[807.104s][info][gc,stats    ] Pause Init Mark (G)            =    0.003 s 
> (a =      302 us)`
> `[807.104s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s 
> (a =       92 us) `
> `[807.104s][info][gc,stats    ] Concurrent Mark Roots          =    0.048 s 
> (a =     4827 us)`
> `[807.104s][info][gc,stats    ] Concurrent Marking             =    1.666 s 
> (a =   166638 us) `
> `[807.104s][info][gc,stats    ] Pause Final Mark (G)           =    0.006 s 
> (a =      603 us)`
> `[807.104s][info][gc,stats    ] Pause Final Mark (N)           =    0.005 s 
> (a =      516 us)`
> 
> ObjectCountAfterGC enabled (feature branch)
> `[807.299s][info][gc,stats    ] Pause Init Mark (G)            =    0.002 s 
> (a =      227 us)`
> `[807.299s][info][gc,stats    ] Pause Init Mark (N)            =    0.001 s 
> (a =       89 us) `
> `[807.299s][info][gc,stats    ] Concurrent Mark Roots          =    0.053 s 
> (a =     5279 us)`
> `[807.299s][info][gc,st...

Left a few superficial suggestions. My big question is what work remains for 
this event to also work with the generational mode?

src/hotspot/share/gc/shared/gcTrace.hpp line 136:

> 134:   void report_gc_reference_stats(const ReferenceProcessorStats& rp) 
> const;
> 135:   void report_object_count_after_gc(BoolObjectClosure* object_filter, 
> WorkerThreads* workers) NOT_SERVICES_RETURN;
> 136:   // Report object count by not performing a heap inspection. This 
> method will

s/by not/without

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 179:

> 177:   // Use object counting closure if ObjectCount or ObjectCountAfterGC 
> event is enabled.
> 178:   const bool object_count_enabled = 
> ObjectCountEventSender::should_send_event();
> 179:   if (object_count_enabled && 
> !ShenandoahHeap::heap()->mode()->is_generational()) {

Can the generational mode support this? It can also perform a _global_ 
collection.

src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 130:

> 128:       // Create the KlassInfoTable for Shenandoah only if JFR is enabled.
> 129: #if INCLUDE_JFR
> 130:       KlassInfoTable cit(false);

Might consider a `CADR` helper class here (Constructor Acquires, Destructor 
Releases). There are a lot of lines of code between the two `set_cit` calls 
(and the second one isn't guarded by `INCLUDE_JFR`).

src/hotspot/share/memory/heapInspection.hpp line 33:

> 31: #include "oops/objArrayOop.hpp"
> 32: #include "oops/oop.hpp"
> 33: #include "runtime/mutex.hpp"

Does this need to be in the header?

src/hotspot/share/runtime/mutexLocker.hpp line 90:

> 88: extern Monitor* Notify_lock;                     // a lock used to 
> synchronize the start-up of the vm
> 89: extern Mutex*   ExceptionCache_lock;             // a lock used to 
> synchronize exception cache updates
> 90: extern Mutex*   TableMerge_lock;                 // a lock used to 
> synchronize merging a thread-local table into a global table

Might call this `ObjectCountMerge_lock` and describe its usage as `merging a 
thread local object count`. There are many `Tables` in the JVM (SymbolTable, 
InternedStrings, etc.).

test/jdk/jdk/jfr/event/gc/objectcount/TestObjectCountAfterGCEventWithShenandoah.java
 line 11:

> 9:  *           & vm.opt.ExplicitGCInvokesConcurrent != true
> 10:  * @library /test/lib /test/jdk
> 11:  * @run main/othervm -XX:+UnlockExperimentalVMOptions 
> -XX:-UseFastUnorderedTimeStamps -XX:+UseShenandoahGC -XX:MarkSweepDeadRatio=0 
> -XX:-UseCompressedOops -XX:-UseCompressedClassPointers 
> -XX:+IgnoreUnrecognizedVMOptions 
> jdk.jfr.event.gc.objectcount.TestObjectCountAfterGCEventWithShenandoah

Are all of these flags necessary to run this test? Can we pare any unnecessary 
options?

-------------

Changes requested by wkemper (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26977#pullrequestreview-3178372422
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317339492
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317344826
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317348710
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317355707
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317358657
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317360739

Reply via email to