On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <d...@openjdk.org> wrote: > ### Summary > > The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of > the existing marking phases and records strongly marked objects in a > histogram. If the event is disabled, the original marking closures are used. > When enabled new mark-and-count closures are used by the worker threads. Each > worker thread updates its local histogram as it marks an object. These local > histograms are merged at the conclusion of the marking phase under a mutex. > The event is emitted outside a safepoint. Because (most) Shenandoah's marking > is done concurrently, so is the object counting work. > > ### Performance > The performance test were ran using the Extremem benchmark on a default and > stress workload. (will edit this section to include data after average time > and test for GenShen) > > #### Default workload: > ObjectCountAfterGC disabled (master branch): > `[807.216s][info][gc,stats ] Pause Init Mark (G) = 0.003 s > (a = 264 us)` > `[807.216s][info][gc,stats ] Pause Init Mark (N) = 0.001 s > (a = 91 us)` > `[807.216s][info][gc,stats ] Concurrent Mark Roots = 0.041 s > (a = 4099 us)` > `[807.216s][info][gc,stats ] Concurrent Marking = 1.660 s > (a = 166035 us)` > `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s > (a = 446 us) ` > `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s > (a = 446 us) ` > `[807.216s][info][gc,stats ] Pause Final Mark (N) = 0.004 s > (a = 357 us)` > > ObjectCountAfterGC disabled (feature branch): > `[807.104s][info][gc,stats ] Pause Init Mark (G) = 0.003 s > (a = 302 us)` > `[807.104s][info][gc,stats ] Pause Init Mark (N) = 0.001 s > (a = 92 us) ` > `[807.104s][info][gc,stats ] Concurrent Mark Roots = 0.048 s > (a = 4827 us)` > `[807.104s][info][gc,stats ] Concurrent Marking = 1.666 s > (a = 166638 us) ` > `[807.104s][info][gc,stats ] Pause Final Mark (G) = 0.006 s > (a = 603 us)` > `[807.104s][info][gc,stats ] Pause Final Mark (N) = 0.005 s > (a = 516 us)` > > ObjectCountAfterGC enabled (feature branch) > `[807.299s][info][gc,stats ] Pause Init Mark (G) = 0.002 s > (a = 227 us)` > `[807.299s][info][gc,stats ] Pause Init Mark (N) = 0.001 s > (a = 89 us) ` > `[807.299s][info][gc,stats ] Concurrent Mark Roots = 0.053 s > (a = 5279 us)` > `[807.299s][info][gc,st...
Left a few superficial suggestions. My big question is what work remains for this event to also work with the generational mode? src/hotspot/share/gc/shared/gcTrace.hpp line 136: > 134: void report_gc_reference_stats(const ReferenceProcessorStats& rp) > const; > 135: void report_object_count_after_gc(BoolObjectClosure* object_filter, > WorkerThreads* workers) NOT_SERVICES_RETURN; > 136: // Report object count by not performing a heap inspection. This > method will s/by not/without src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 179: > 177: // Use object counting closure if ObjectCount or ObjectCountAfterGC > event is enabled. > 178: const bool object_count_enabled = > ObjectCountEventSender::should_send_event(); > 179: if (object_count_enabled && > !ShenandoahHeap::heap()->mode()->is_generational()) { Can the generational mode support this? It can also perform a _global_ collection. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 130: > 128: // Create the KlassInfoTable for Shenandoah only if JFR is enabled. > 129: #if INCLUDE_JFR > 130: KlassInfoTable cit(false); Might consider a `CADR` helper class here (Constructor Acquires, Destructor Releases). There are a lot of lines of code between the two `set_cit` calls (and the second one isn't guarded by `INCLUDE_JFR`). src/hotspot/share/memory/heapInspection.hpp line 33: > 31: #include "oops/objArrayOop.hpp" > 32: #include "oops/oop.hpp" > 33: #include "runtime/mutex.hpp" Does this need to be in the header? src/hotspot/share/runtime/mutexLocker.hpp line 90: > 88: extern Monitor* Notify_lock; // a lock used to > synchronize the start-up of the vm > 89: extern Mutex* ExceptionCache_lock; // a lock used to > synchronize exception cache updates > 90: extern Mutex* TableMerge_lock; // a lock used to > synchronize merging a thread-local table into a global table Might call this `ObjectCountMerge_lock` and describe its usage as `merging a thread local object count`. There are many `Tables` in the JVM (SymbolTable, InternedStrings, etc.). test/jdk/jdk/jfr/event/gc/objectcount/TestObjectCountAfterGCEventWithShenandoah.java line 11: > 9: * & vm.opt.ExplicitGCInvokesConcurrent != true > 10: * @library /test/lib /test/jdk > 11: * @run main/othervm -XX:+UnlockExperimentalVMOptions > -XX:-UseFastUnorderedTimeStamps -XX:+UseShenandoahGC -XX:MarkSweepDeadRatio=0 > -XX:-UseCompressedOops -XX:-UseCompressedClassPointers > -XX:+IgnoreUnrecognizedVMOptions > jdk.jfr.event.gc.objectcount.TestObjectCountAfterGCEventWithShenandoah Are all of these flags necessary to run this test? Can we pare any unnecessary options? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26977#pullrequestreview-3178372422 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317339492 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317344826 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317348710 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317355707 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317358657 PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317360739