Hi Kevin --
thanks for jumping on this! More inline below ...
On 07/28/11 09:33, Krystal Mok wrote:
Hi Kevin,
Thank you for taking care of this, and it's good to see the problem is
verified.
I think whether or not the suggested fix is sufficient depends on what
paths can reach CMSCollector::do_compaction_work(). If all paths that
can reach CMSCollector::do_compaction_work() come from
GenCollectedHeap::do_collection(), then the fix should be good to go.
Otherwise it'll need a better workaround.
I believe all concurrent mode failures/interrupts (which includes the
System.gc() case) does come from GenCollectedHeap::do_collection(), but
I'm not exactly sure about this, could anybody please clarify on it?
Yes, i believe this is indeed the case, and my browsing of the
code using cscope seemed to confirm that belief.
More below ...
Regards,
Kris Mok
On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.wa...@oracle.com
<mailto:kevin.wa...@oracle.com>> wrote:
__
Hi --
6580448 was marked as a duplicate of 6581734, which fixed the fact
that CMS collections were just not counted at all - with CMS, only a
stop the world full gc would be counted in the stats.
But looks like you're right... Here is a quick variation of the
testcase from 6581734 which shows the same thing, and this verifies
the same, and is solved by ExplicitGCInvokesConcurrent. If there is
no other feedback I can test if the removal of the
TraceCMSMemoryManagerStats() call in
CMSCollector::do_compaction_work is all we need...
Kevin, yes, it would be great if you could verify this and push the fix.
I am not sure if the push would need to wait for the signing of OCA
from Kris, but best to check with Those Who Would Know Such Things.
Since the original CR has been closed, i'll open one momentarily and
can make you RE (if that's OK with you). I'll be happy to serve as
reviewer of the change.
As regards the jstat counter reporting two pauses per concurrent
CMS cycle, I am of two minds on what the original intention
was. I'd have originally regarded the double increment as a
bug, but as you state it is really two pauses, even if part of
a single cycle. And it makes sense to count them as two. I
agree that this should be documented and left alone, given
how long we have had this behaviour, and the alternative
(of counting cycles, rather than pauses) may be no better
(or arguably worse). There's actually an open CR for this
which we can redirect into a CR to update the relevant documentation.
-- ramki
Regards
Kevin
/*
* Copyright (c) 2011, Oracle and/or its affiliates. All rights
reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but
WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE
file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License
version
* 2 along with this work; if not, write to the Free Software
Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
94065 USA
* or visit www.oracle.com <http://www.oracle.com> if you need
additional information or have any
* questions.
*/
/*
* @test TestFullGCount.java
* @bug
* @summary
* @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
*
*/
import java.util.*;
import java.lang.management.*;
public class TestFullGCCount {
private String poolName = "CMS";
private String collectorName = "ConcurrentMarkSweep";
public static void main(String [] args) {
TestFullGCCount t = null;
if (args.length==2) {
t = new TestFullGCCount(args[0], args[1]);
} else {
System.out.println("Defaulting to monitor CMS pool and
collector.");
t = new TestFullGCCount();
}
t.run();
}
public TestFullGCCount(String pool, String collector) {
poolName = pool;
collectorName = collector;
}
public TestFullGCCount() {
}
public void run() {
int count = 0;
int iterations = 20;
long counts[] = new long[iterations];
boolean diffAlways2 = true; // assume we will fail
for (int i=0; i<iterations; i++) {
System.gc();
counts[i] = checkStats();
if (i>0) {
if (counts[i] - counts[i-1] != 2) {
diffAlways2 = false;
}
}
}
if (diffAlways2) {
throw new RuntimeException("FAILED: difference in count is
always 2.");
}
System.out.println("Passed.");
}
private long checkStats() {
long count = 0;
List<MemoryPoolMXBean> pools =
ManagementFactory.getMemoryPoolMXBeans();
List<GarbageCollectorMXBean> collectors =
ManagementFactory.getGarbageCollectorMXBeans();
for (int i=0; i<collectors.size(); i++) {
GarbageCollectorMXBean collector = collectors.get(i);
String name = collector.getName();
if (name.contains(collectorName)) {
System.out.println(name + ": collection count = "
+ collector.getCollectionCount());
count = collector.getCollectionCount();
}
}
return count;
}
}
On 27/07/11 17:12, Krystal Mok wrote:
Hi all,
I've been looking at a strange inconsistency of full GC count
recorded by jvmstat and JMM counters. I'd like to know which ones
of the following behaviors are by design, which ones are bugs, and
which ones are just my misunderstanding. I apologize for making a
short story long...
=====================================================
The counters involved:
* A jvmstat counter named "sun.gc.collector.1.invocations" keeps
track of the number of pauses occured as a result of a major
collection. It is used by utilities such as jstat as the source of
"FGC" (full collection count), and the old gen collection count in
Visual GC. It's updated by an TraceCollectorStats object.
* A JMM counter, GCMemoryManager::_num_collections, keeps track of
the number of collections that have ended. This counter is used as
HotSpot's implementation of the JMX
GarbageCollectorMXBean.getCollectionCount(). It's updated by
either a TraceMemoryManagerStats object or a
TraceCMSMemoryManagerStats object.
To show the situation, I've made a screenshot of a VisualVM and a
JConsole running side by side, both are monitoring the VisualVM's
GC stats:
http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
(I'll upload the screenshot to somewhere else if anybody can't see it)
The VisualVM instance is running on JDK6u26, with ParNew+CMS.
In the screenshot, Visual GC reports that the old gen collection
count is 20, while JConsole reports 10.
I see that there was this bug:
6580448: CMS: Full GC collection count mismatch between
GarbageCollectorMXBean and jvmstat (VisualGC)
I don't think the current implementation has a bug in the sense
that the two counters don't report the same number.
This behavior seems reasonable, but the naming of the value in
these tools are confusing: both tools say "collections", but
apparently the number in Visual GC means "number of pauses" where
as the number in JConsole means "number of collection cycles".
It'd be great if the difference could be documented somewhere, if
that's the intended behavior.
And then the buggy behavior. Code demo posted on gist:
https://gist.github.com/1106263
Starting from JDK6u23, when using CMS without
ExplicitGCInvokesConcurrent, System.gc() (or
Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
make the JMM GC counter increment by 2 per invocation, while the
jvmstat counter is only incremented by 1. I believe the latter is
correct and the former needs some fixing.
=====================================================
My understanding of the behavior shown above:
1. The concurrent GC part:
There are 2 pauses in a CMS concurrent GC cycle, one in the
initial mark phase, and one in the final remark phase.
To trigger a concurrent GC cycle, the CMS thread wakes up
periodically to see if it shouldConcurrentCollect(), and trigger a
cycle when the predicate returned true, or goes back to sleep if
the predicate returned false. The whole concurrent GC cycle
doesn't go through GenCollectedHeap::do_collection().
The jvmstat counter for old gen pauses is updated in
CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
covers both pause phases.
The JMM counter, however, is updated in the concurrent sweep
phase, CMSCollector::sweep(bool asynch), if there was no
concurrent mode failure; or it is updated in
CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case
of a bailout due to concurrent mode failure (advertised as so in
the code comments). So that's an increment by 1 per concurrent GC
cycle, which does reflect the "number of collection cycles", fair
enough.
So far so good.
2. The System.gc() part:
Without ExplicitGCInvokesConcurrent set, System.gc() does a
stop-the-world full GC, which consists of only one pause, so
"number of pauses" would equal "number of collections" in this case.
It will go into GenCollectedHeap::do_collection(); both the
jvmstat and the JMM GC counter gets incremented by 1 here,
TraceCollectorStats tcs(_gens[i]->counters());
TraceMemoryManagerStats tmms(_gens[i]->kind());
But, drilling down into:
_gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
That'll eventually go into:
CMSCollector::acquire_control_and_collect(bool full, bool
clear_all_soft_refs)
System.gc() is user requested so that'll go further into
mark-sweep-compact:
CMSCollector::do_compaction_work(bool clear_all_soft_refs)
And here, it increments the JMM GC counter again (remember it was
in the concurrent GC path too, to handle bailouts), even though
this is still in the same collection. This leads to the "buggy
behavior" mentioned earlier.
The JMM GC counter wasn't added to CMS until this fix got in:
6581734: CMS Old Gen's collection usage is zero after GC which is
incorrect
The code added to CMSCollector::do_compaction_work() works fine in
the concurrent GC path, but interacts badly with the original
logic in GenCollectedHeap::do_collection().
=====================================================
I thought all concurrent mode failures/interrupts come from
GenCollectedHeap::do_collection(). If that's the case, then it
seems unnecessary to update the JMM GC counter in
CMSCollector::do_compaction_work(), simply removing it should fix
the problem.
With that, I'd purpose the following (XS) change: (diff against HS20)
diff -r f0f676c5a2c6
src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
---
a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
Tue Mar 15 19:30:16 2011 -0700
+++
b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
Thu Jul 28 00:02:41 2011 +0800
@@ -2022,9 +2022,6 @@
_intra_sweep_estimate.padded_average());
}
- {
- TraceCMSMemoryManagerStats();
- }
GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
ref_processor(), clear_all_soft_refs);
#ifdef ASSERT
The same goes for the changes in:
7036199: Adding a notification to the implementation of
GarbageCollectorMXBeans
=====================================================
P.S. Is there an "official" name for the counters that I referred
to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
HSPERFDATA?