Hi Ramki, I agree with your analysis and will review & shepherd your suggested fix. > I'll file a CR to cover this fix and send out a CR id shortly.
Thank you ^_^ If the suggested fix does fix the problem, then there's more code that can probably go away as well: (diff against HS20) diff -r f0f676c5a2c6 > src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp > --- > a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp > Tue > Mar 15 19:30:16 2011 -0700 > +++ > b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp > Thu > Jul 28 01:33:33 2011 +0800 > @@ -2022,9 +2022,6 @@ > > _intra_sweep_estimate.padded_average()); > } > > - { > - TraceCMSMemoryManagerStats(); > - } > GenMarkSweep::invoke_at_safepoint(_cmsGen->level(), > ref_processor(), clear_all_soft_refs); > #ifdef ASSERT > @@ -9263,15 +9260,3 @@ > ShouldNotReachHere(); > } > } > - > -// when bailing out of cms in concurrent mode failure > -TraceCMSMemoryManagerStats::TraceCMSMemoryManagerStats(): > TraceMemoryManagerStats() { > - initialize(true /* fullGC */ , > - true /* recordGCBeginTime */, > - true /* recordPreGCUsage */, > - true /* recordPeakUsage */, > - true /* recordPostGCusage */, > - true /* recordAccumulatedGCTime */, > - true /* recordGCEndTime */, > - true /* countCollection */ ); > -} > diff -r f0f676c5a2c6 > src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp > --- > a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp > Tue > Mar 15 19:30:16 2011 -0700 > +++ > b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp > Thu > Jul 28 01:33:33 2011 +0800 > @@ -1896,7 +1896,6 @@ > > public: > TraceCMSMemoryManagerStats(CMSCollector::CollectorState phase); > - TraceCMSMemoryManagerStats(); > }; If you have not already done so, please complete the contributor > agreement (OCA) so that your contribution may be used Joseph and I are seeking to get our employer sign the OCA soon. We'll report back when we've got progress. Thanks, Kris Mok On Thu, Jul 28, 2011 at 1:07 AM, Y. Srinivas Ramakrishna < y.s.ramakris...@oracle.com> wrote: > Hi Kris -- > > I agree with your analysis and will review & shepherd your suggested fix. > I'll file a CR to cover this fix and send out a CR id shortly. > > Thanks for the fix. If you have not already done so, please > complete the contributor agreement (OCA) so that your > contribution may be used; see:- > http://openjdk.java.net/**contribute/<http://openjdk.java.net/contribute/> > > -- ramki > > > On 7/27/2011 9:42 AM, Y. Srinivas Ramakrishna wrote: > >> Cross-posting to serviceability-dev list. Please include both lists >> in your responses to this thread. >> >> -- ramki >> >> On 7/27/2011 9:12 AM, Krystal Mok wrote: >> >>> Hi all, >>> >>> I've been looking at a strange inconsistency of full GC count recorded by >>> jvmstat and JMM counters. I'd like to know which ones of the following >>> behaviors are by design, which ones are bugs, and which ones are just my >>> misunderstanding. I apologize for making a short story long... >>> >>> ==============================**======================= >>> >>> The counters involved: >>> >>> * A jvmstat counter named "sun.gc.collector.1.**invocations" keeps track >>> of >>> the number of pauses occured as a result of a major collection. It is >>> used >>> by utilities such as jstat as the source of "FGC" (full collection >>> count), >>> and the old gen collection count in Visual GC. It's updated by an >>> TraceCollectorStats object. >>> * A JMM counter, GCMemoryManager::_num_**collections, keeps track of the >>> number of collections that have ended. This counter is used as HotSpot's >>> implementation of the JMX GarbageCollectorMXBean.**getCollectionCount(). >>> It's >>> updated by either a TraceMemoryManagerStats object or a >>> TraceCMSMemoryManagerStats object. >>> >>> To show the situation, I've made a screenshot of a VisualVM and a >>> JConsole >>> running side by side, both are monitoring the VisualVM's GC stats: >>> http://dl.iteye.com/upload/**attachment/524811/913cb0e1-** >>> 7add-3ac0-a718-24ca705cad22.**png<http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png> >>> (I'll upload the screenshot to somewhere else if anybody can't see it) >>> The VisualVM instance is running on JDK6u26, with ParNew+CMS. >>> In the screenshot, Visual GC reports that the old gen collection count is >>> 20, while JConsole reports 10. >>> >>> I see that there was this bug: >>> 6580448: CMS: Full GC collection count mismatch between >>> GarbageCollectorMXBean and jvmstat (VisualGC) >>> I don't think the current implementation has a bug in the sense that the >>> two >>> counters don't report the same number. >>> >>> This behavior seems reasonable, but the naming of the value in these >>> tools >>> are confusing: both tools say "collections", but apparently the number in >>> Visual GC means "number of pauses" where as the number in JConsole means >>> "number of collection cycles". It'd be great if the difference could be >>> documented somewhere, if that's the intended behavior. >>> >>> And then the buggy behavior. Code demo posted on gist: >>> https://gist.github.com/**1106263 <https://gist.github.com/1106263> >>> Starting from JDK6u23, when using CMS without >>> ExplicitGCInvokesConcurrent, >>> System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) >>> would make the JMM GC counter increment by 2 per invocation, while the >>> jvmstat counter is only incremented by 1. I believe the latter is correct >>> and the former needs some fixing. >>> >>> ==============================**======================= >>> >>> My understanding of the behavior shown above: >>> >>> 1. The concurrent GC part: >>> >>> There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark >>> phase, and one in the final remark phase. >>> To trigger a concurrent GC cycle, the CMS thread wakes up periodically to >>> see if it shouldConcurrentCollect(), and trigger a cycle when the >>> predicate >>> returned true, or goes back to sleep if the predicate returned false. The >>> whole concurrent GC cycle doesn't go through >>> GenCollectedHeap::do_**collection(). >>> >>> The jvmstat counter for old gen pauses is updated in >>> CMSCollector::do_CMS_**operation(CMS_op_type op), which exactly covers >>> both >>> pause phases. >>> >>> The JMM counter, however, is updated in the concurrent sweep phase, >>> CMSCollector::sweep(bool asynch), if there was no concurrent mode >>> failure; >>> or it is updated in CMSCollector::do_compaction_**work(bool >>> clear_all_soft_refs) in case of a bailout due to concurrent mode failure >>> (advertised as so in the code comments). So that's an increment by 1 per >>> concurrent GC cycle, which does reflect the "number of collection >>> cycles", >>> fair enough. >>> >>> So far so good. >>> >>> 2. The System.gc() part: >>> >>> Without ExplicitGCInvokesConcurrent set, System.gc() does a >>> stop-the-world >>> full GC, which consists of only one pause, so "number of pauses" would >>> equal >>> "number of collections" in this case. >>> It will go into GenCollectedHeap::do_**collection(); both the jvmstat >>> and the >>> JMM GC counter gets incremented by 1 here, >>> >>> TraceCollectorStats tcs(_gens[i]->counters()); >>> TraceMemoryManagerStats tmms(_gens[i]->kind()); >>> >>> But, drilling down into: >>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab); >>> >>> That'll eventually go into: >>> CMSCollector::acquire_control_**and_collect(bool full, bool >>> clear_all_soft_refs) >>> >>> System.gc() is user requested so that'll go further into >>> mark-sweep-compact: >>> CMSCollector::do_compaction_**work(bool clear_all_soft_refs) >>> And here, it increments the JMM GC counter again (remember it was in the >>> concurrent GC path too, to handle bailouts), even though this is still in >>> the same collection. This leads to the "buggy behavior" mentioned >>> earlier. >>> >>> The JMM GC counter wasn't added to CMS until this fix got in: >>> 6581734: CMS Old Gen's collection usage is zero after GC which is >>> incorrect >>> >>> The code added to CMSCollector::do_compaction_**work() works fine in the >>> concurrent GC path, but interacts badly with the original logic in >>> GenCollectedHeap::do_**collection(). >>> >>> ==============================**======================= >>> >>> I thought all concurrent mode failures/interrupts come from >>> GenCollectedHeap::do_**collection(). If that's the case, then it seems >>> unnecessary to update the JMM GC counter in >>> CMSCollector::do_compaction_**work(), simply removing it should fix the >>> problem. >>> >>> With that, I'd purpose the following (XS) change: (diff against HS20) >>> >>> diff -r f0f676c5a2c6 >>> src/share/vm/gc_**implementation/**concurrentMarkSweep/** >>> concurrentMarkSweepGeneration.**cpp >>> --- >>> a/src/share/vm/gc_**implementation/**concurrentMarkSweep/** >>> concurrentMarkSweepGeneration.**cpp >>> Tue Mar 15 19:30:16 2011 -0700 >>> +++ >>> b/src/share/vm/gc_**implementation/**concurrentMarkSweep/** >>> concurrentMarkSweepGeneration.**cpp >>> Thu Jul 28 00:02:41 2011 +0800 >>> @@ -2022,9 +2022,6 @@ >>> >>> _intra_sweep_estimate.padded_**average()); >>> } >>> >>> - { >>> - TraceCMSMemoryManagerStats(); >>> - } >>> GenMarkSweep::invoke_at_**safepoint(_cmsGen->level(), >>> ref_processor(), clear_all_soft_refs); >>> #ifdef ASSERT >>> >>> The same goes for the changes in: >>> 7036199: Adding a notification to the implementation of >>> GarbageCollectorMXBeans >>> >>> ==============================**======================= >>> >>> P.S. Is there an "official" name for the counters that I referred to as >>> "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA? >>> >>> >> >