Re: Number of CMS collections and the JMM GC counters

Krystal Mok Wed, 27 Jul 2011 10:42:32 -0700

Hi Ramki,

I agree with your analysis and will review & shepherd your suggested fix.
> I'll file a CR to cover this fix and send out a CR id shortly.


Thank you ^_^

If the suggested fix does fix the problem, then there's more code that can
probably go away as well: (diff against HS20)

diff -r f0f676c5a2c6
> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
> ---
> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>  Tue
> Mar 15 19:30:16 2011 -0700
> +++
> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>  Thu
> Jul 28 01:33:33 2011 +0800
> @@ -2022,9 +2022,6 @@
>
>  _intra_sweep_estimate.padded_average());
>    }
>
> -  {
> -    TraceCMSMemoryManagerStats();
> -  }
>    GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>      ref_processor(), clear_all_soft_refs);
>    #ifdef ASSERT
> @@ -9263,15 +9260,3 @@
>        ShouldNotReachHere();
>    }
>  }
> -
> -// when bailing out of cms in concurrent mode failure
> -TraceCMSMemoryManagerStats::TraceCMSMemoryManagerStats():
> TraceMemoryManagerStats() {
> -  initialize(true /* fullGC */ ,
> -             true /* recordGCBeginTime */,
> -             true /* recordPreGCUsage */,
> -             true /* recordPeakUsage */,
> -             true /* recordPostGCusage */,
> -             true /* recordAccumulatedGCTime */,
> -             true /* recordGCEndTime */,
> -             true /* countCollection */ );
> -}
> diff -r f0f676c5a2c6
> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp
> ---
> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp
>  Tue
> Mar 15 19:30:16 2011 -0700
> +++
> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp
>  Thu
> Jul 28 01:33:33 2011 +0800
> @@ -1896,7 +1896,6 @@
>
>   public:
>    TraceCMSMemoryManagerStats(CMSCollector::CollectorState phase);
> -  TraceCMSMemoryManagerStats();
>  };



If you have not already done so, please complete the contributor
> agreement (OCA) so that your contribution may be used

Joseph and I are seeking to get our employer sign the OCA soon. We'll report
back when we've got progress.

Thanks,
Kris Mok

On Thu, Jul 28, 2011 at 1:07 AM, Y. Srinivas Ramakrishna <
y.s.ramakris...@oracle.com> wrote:

> Hi Kris --
>
> I agree with your analysis and will review & shepherd your suggested fix.
> I'll file a CR to cover this fix and send out a CR id shortly.
>
> Thanks for the fix. If you have not already done so, please
> complete the contributor agreement (OCA) so that your
> contribution may be used; see:-
>   http://openjdk.java.net/**contribute/<http://openjdk.java.net/contribute/>
>
> -- ramki
>
>
> On 7/27/2011 9:42 AM, Y. Srinivas Ramakrishna wrote:
>
>> Cross-posting to serviceability-dev list. Please include both lists
>> in your responses to this thread.
>>
>> -- ramki
>>
>> On 7/27/2011 9:12 AM, Krystal Mok wrote:
>>
>>> Hi all,
>>>
>>> I've been looking at a strange inconsistency of full GC count recorded by
>>> jvmstat and JMM counters. I'd like to know which ones of the following
>>> behaviors are by design, which ones are bugs, and which ones are just my
>>> misunderstanding. I apologize for making a short story long...
>>>
>>> ==============================**=======================
>>>
>>> The counters involved:
>>>
>>> * A jvmstat counter named "sun.gc.collector.1.**invocations" keeps track
>>> of
>>> the number of pauses occured as a result of a major collection. It is
>>> used
>>> by utilities such as jstat as the source of "FGC" (full collection
>>> count),
>>> and the old gen collection count in Visual GC. It's updated by an
>>> TraceCollectorStats object.
>>> * A JMM counter, GCMemoryManager::_num_**collections, keeps track of the
>>> number of collections that have ended. This counter is used as HotSpot's
>>> implementation of the JMX GarbageCollectorMXBean.**getCollectionCount().
>>> It's
>>> updated by either a TraceMemoryManagerStats object or a
>>> TraceCMSMemoryManagerStats object.
>>>
>>> To show the situation, I've made a screenshot of a VisualVM and a
>>> JConsole
>>> running side by side, both are monitoring the VisualVM's GC stats:
>>> http://dl.iteye.com/upload/**attachment/524811/913cb0e1-**
>>> 7add-3ac0-a718-24ca705cad22.**png<http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png>
>>> (I'll upload the screenshot to somewhere else if anybody can't see it)
>>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>> In the screenshot, Visual GC reports that the old gen collection count is
>>> 20, while JConsole reports 10.
>>>
>>> I see that there was this bug:
>>> 6580448: CMS: Full GC collection count mismatch between
>>> GarbageCollectorMXBean and jvmstat (VisualGC)
>>> I don't think the current implementation has a bug in the sense that the
>>> two
>>> counters don't report the same number.
>>>
>>> This behavior seems reasonable, but the naming of the value in these
>>> tools
>>> are confusing: both tools say "collections", but apparently the number in
>>> Visual GC means "number of pauses" where as the number in JConsole means
>>> "number of collection cycles". It'd be great if the difference could be
>>> documented somewhere, if that's the intended behavior.
>>>
>>> And then the buggy behavior. Code demo posted on gist:
>>> https://gist.github.com/**1106263 <https://gist.github.com/1106263>
>>> Starting from JDK6u23, when using CMS without
>>> ExplicitGCInvokesConcurrent,
>>> System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX)
>>> would make the JMM GC counter increment by 2 per invocation, while the
>>> jvmstat counter is only incremented by 1. I believe the latter is correct
>>> and the former needs some fixing.
>>>
>>> ==============================**=======================
>>>
>>> My understanding of the behavior shown above:
>>>
>>> 1. The concurrent GC part:
>>>
>>> There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark
>>> phase, and one in the final remark phase.
>>> To trigger a concurrent GC cycle, the CMS thread wakes up periodically to
>>> see if it shouldConcurrentCollect(), and trigger a cycle when the
>>> predicate
>>> returned true, or goes back to sleep if the predicate returned false. The
>>> whole concurrent GC cycle doesn't go through
>>> GenCollectedHeap::do_**collection().
>>>
>>> The jvmstat counter for old gen pauses is updated in
>>> CMSCollector::do_CMS_**operation(CMS_op_type op), which exactly covers
>>> both
>>> pause phases.
>>>
>>> The JMM counter, however, is updated in the concurrent sweep phase,
>>> CMSCollector::sweep(bool asynch), if there was no concurrent mode
>>> failure;
>>> or it is updated in CMSCollector::do_compaction_**work(bool
>>> clear_all_soft_refs) in case of a bailout due to concurrent mode failure
>>> (advertised as so in the code comments). So that's an increment by 1 per
>>> concurrent GC cycle, which does reflect the "number of collection
>>> cycles",
>>> fair enough.
>>>
>>> So far so good.
>>>
>>> 2. The System.gc() part:
>>>
>>> Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>> stop-the-world
>>> full GC, which consists of only one pause, so "number of pauses" would
>>> equal
>>> "number of collections" in this case.
>>> It will go into GenCollectedHeap::do_**collection(); both the jvmstat
>>> and the
>>> JMM GC counter gets incremented by 1 here,
>>>
>>> TraceCollectorStats tcs(_gens[i]->counters());
>>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>
>>> But, drilling down into:
>>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>
>>> That'll eventually go into:
>>> CMSCollector::acquire_control_**and_collect(bool full, bool
>>> clear_all_soft_refs)
>>>
>>> System.gc() is user requested so that'll go further into
>>> mark-sweep-compact:
>>> CMSCollector::do_compaction_**work(bool clear_all_soft_refs)
>>> And here, it increments the JMM GC counter again (remember it was in the
>>> concurrent GC path too, to handle bailouts), even though this is still in
>>> the same collection. This leads to the "buggy behavior" mentioned
>>> earlier.
>>>
>>> The JMM GC counter wasn't added to CMS until this fix got in:
>>> 6581734: CMS Old Gen's collection usage is zero after GC which is
>>> incorrect
>>>
>>> The code added to CMSCollector::do_compaction_**work() works fine in the
>>> concurrent GC path, but interacts badly with the original logic in
>>> GenCollectedHeap::do_**collection().
>>>
>>> ==============================**=======================
>>>
>>> I thought all concurrent mode failures/interrupts come from
>>> GenCollectedHeap::do_**collection(). If that's the case, then it seems
>>> unnecessary to update the JMM GC counter in
>>> CMSCollector::do_compaction_**work(), simply removing it should fix the
>>> problem.
>>>
>>> With that, I'd purpose the following (XS) change: (diff against HS20)
>>>
>>> diff -r f0f676c5a2c6
>>> src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>> ---
>>> a/src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>> Tue Mar 15 19:30:16 2011 -0700
>>> +++
>>> b/src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>> Thu Jul 28 00:02:41 2011 +0800
>>> @@ -2022,9 +2022,6 @@
>>>
>>> _intra_sweep_estimate.padded_**average());
>>> }
>>>
>>> - {
>>> - TraceCMSMemoryManagerStats();
>>> - }
>>> GenMarkSweep::invoke_at_**safepoint(_cmsGen->level(),
>>> ref_processor(), clear_all_soft_refs);
>>> #ifdef ASSERT
>>>
>>> The same goes for the changes in:
>>> 7036199: Adding a notification to the implementation of
>>> GarbageCollectorMXBeans
>>>
>>> ==============================**=======================
>>>
>>> P.S. Is there an "official" name for the counters that I referred to as
>>> "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?
>>>
>>>
>>
>

Re: Number of CMS collections and the JMM GC counters

Reply via email to