Hi Erik - aha yes, so not that rare then for the right kind of program!
I'll talk to you about when/how to do this and the new event.
Thanks
Kevin
On 08/03/13 09:50, Erik Helin wrote:
Kevin,
On 03/07/2013 05:52 PM, Kevin Walls wrote:
Hi Erik -
Yes, now you mention it I can see the route to printing the old warning
or logging the event twice...
I don't think it's reported as a problem, or maybe it's very rare and
nobody has spotted it.
It is possible to reproduce the problem with a Java program that
simply allocates until OOM:
import java.util.ArrayList;
public class Reproducer {
public static ArrayList<byte[]> garbage = new ArrayList<byte[]>();
public static void main(String[] args) {
while (true) {
garbage.add(new byte[1024]);
}
}
}
Compile the above Java program and then run:
java -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails
-XX:-UseCMSCompactAtFullCollection -Xmx100m Reproducer
Depending on how fast machine you have, the Java program will run for
about 2 seconds and the GC logs will show (near the start of the output):
[GC (Allocation Failure) [ParNew: 30720K->30720K(30720K), 0.0001940
secs][CMS[CMS-concurrent-mark: 0.035/0.036 secs] [Times: user=0.03
sys=0.00, real=0.04 secs]
(concurrent mode failure) (concurrent mode failure)[YG occupancy:
30720 K (30720 K)][checkpointRootsFinalWork[Rescan (parallel) ,
0.0314620 secs][refProcessingWork[weak refs processing, 0.0000320
secs][class unloading, 0.0006770 secs][scrub symbol table, 0.0009850
secs][scrub string table, 0.0000690 secs], 0.0021430 secs], 0.0339750
secs]: 50655K->50654K(68288K), 0.0972030 secs] 81375K->81374K(99008K),
[Metaspace: 2662K->2662K(4486K/110592K)], 0.0980400 secs] [Times:
user=0.11 sys=0.00, real=0.10 secs]
Please notice the two "(concurrent mode failure)" above which are
printed for the reason I explained in my previous email.
On 03/07/2013 05:52 PM, Kevin Walls wrote:
But assuming it's not a "user-requested" collection, to get that false
"should_compact" in acquire_control_and_collect, we need to call
decide_foreground_collection_type(), and when it calls
incremental_collection_will_fail(), that returns false.
This is assuming that the flag UseCMSCompactAtFullCollection is true.
On 03/07/2013 05:52 PM, Kevin Walls wrote:
Possibly that is why we don't see the event reported twice in
practice***: if we've got to this point, and state>Idling, we are
usually here because that inc. collection would fail/is failing...
This is most likely the reason, since UseCMSCompactAtFullCollection is
true by default and if users aren't changing it, then your reasoning
is correct.
I suggest that we file a separate bug for CMS printing out
"(concurrent mode failure)" twice, fix that and then we base your
trace code on that.
What do you think?
Thanks,
Erik
Thanks
Kevin
*** if anybody really does hit this, or think it's likely, we can look
again...
On 06/03/13 18:19, Erik Helin wrote:
Hi Kevin,
I think that there _might_ be a bug in CMS which was present even
before you added the event tracing.
If you look in aquire_control_and_collect, you will see that
"should_compact" can be set to false by
decide_foreground_collection_type. If this is the case, then we will
end up in do_mark_sweep_work.
The problem is that you have already reported, and CMS has already
printed, that a concurrent mode failure has occurred in
acquire_control_and_collect. Then, when you enter do_mark_sweep_work,
you will once again report, and CMS will again print, that concurrent
mode failure has happened.
I am not 100% sure that I am right, by I believe that this can happen.
What do you think?
Thanks,
Erik
On 03/01/2013 06:34 PM, Kevin Walls wrote:
Hi,
I'd like some reviews on this CMS Concurrent Mode Failure event:
http://cr.openjdk.java.net/~kevinw/8008917/hotspot/
The event doesn't actually carry any new information, but it is a
warning we need to capture.
This is against hsx24, I'll prepare the same, or reviewed, changes
against very latest hotspot also.
Thanks
Kevin