I’d like to solicit opinions on reporting GC pause duration (stopped-world 
pause time) via JMX. This info would be useful in figuring out whether or not 
GC pause times are factors in failing to meet response time SLAs. The info is 
of course available directly from GC logs, but parsing logs is fraught and JMX 
doesn’t seem to report the equivalent info.

GcInfo

https://docs.oracle.com/javase/9/docs/api/com/sun/management/GcInfo.html

has a getDuration() method which works fine for the non-concurrent collectors 
(since they’re STW), but for CMS and G1 it appears to report the duration of an 
entire concurrent cycle, which isn’t what I want. The number of STW pauses 
during a concurrent cycle varies by collector, so ideally there would be a 
method that reports cause (as a string) and duration for each STW pause. If 
that’s too much, perhaps the minimum might be a getMaxPauseDuration() method 
that reports the maximum pause duration of all the STW pauses that happen 
during a concurrent cycle.

Relatedly, the full compacting GCs that happen as a result of CMS and G1 
concurrent mode failure aren’t reported separately from concurrent cycles. It 
would be useful to differentiate these from “ConcurrentMarkSweep” and “G1 Old 
Generation”. Perhaps add collector types to CMS and G1, vis. “MarkSweepCompact” 
(which already exists and is literally what’s executed by CMS) and a new “G1 
MarkSweepCompact” collector for G1.

If there’s a consensus that something should be done about either of these 
issues, I’d be happy to file RFE(s) and do the work.

Thanks,

Paul

Reply via email to