Re: RFR: 8008917 CMS: Concurrent mode failure tracing event

Kevin Walls Fri, 08 Mar 2013 02:09:24 -0800


Hi Erik - aha yes, so not that rare then for the right kind of program!


I'll talk to you about when/how to do this and the new event.

Thanks
Kevin


On 08/03/13 09:50, Erik Helin wrote:

Kevin,

On 03/07/2013 05:52 PM, Kevin Walls wrote:
Hi Erik -

Yes, now you mention it I can see the route to printing the old warning
or logging the event twice...

I don't think it's reported as a problem, or maybe it's very rare and
nobody has spotted it.
It is possible to reproduce the problem with a Java program thatsimply allocates until OOM:
  import java.util.ArrayList;

  public class Reproducer {
      public static ArrayList<byte[]> garbage = new ArrayList<byte[]>();
      public static void main(String[] args) {
          while (true) {
              garbage.add(new byte[1024]);
          }
      }
  }

Compile the above Java program and then run:
java -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails-XX:-UseCMSCompactAtFullCollection -Xmx100m Reproducer
Depending on how fast machine you have, the Java program will run forabout 2 seconds and the GC logs will show (near the start of the output):
[GC (Allocation Failure) [ParNew: 30720K->30720K(30720K), 0.0001940secs][CMS[CMS-concurrent-mark: 0.035/0.036 secs] [Times: user=0.03sys=0.00, real=0.04 secs](concurrent mode failure) (concurrent mode failure)[YG occupancy:30720 K (30720 K)][checkpointRootsFinalWork[Rescan (parallel) ,0.0314620 secs][refProcessingWork[weak refs processing, 0.0000320secs][class unloading, 0.0006770 secs][scrub symbol table, 0.0009850secs][scrub string table, 0.0000690 secs], 0.0021430 secs], 0.0339750secs]: 50655K->50654K(68288K), 0.0972030 secs] 81375K->81374K(99008K),[Metaspace: 2662K->2662K(4486K/110592K)], 0.0980400 secs] [Times:user=0.11 sys=0.00, real=0.10 secs]
Please notice the two "(concurrent mode failure)" above which areprinted for the reason I explained in my previous email.
On 03/07/2013 05:52 PM, Kevin Walls wrote:
But assuming it's not a "user-requested" collection, to get that false
"should_compact" in acquire_control_and_collect, we need to call
decide_foreground_collection_type(), and when it calls
incremental_collection_will_fail(), that returns false.
This is assuming that the flag UseCMSCompactAtFullCollection is true.

On 03/07/2013 05:52 PM, Kevin Walls wrote:
Possibly that is why we don't see the event reported twice in
practice***: if we've got to this point, and state>Idling, we are
usually here because that inc. collection would fail/is failing...
This is most likely the reason, since UseCMSCompactAtFullCollection istrue by default and if users aren't changing it, then your reasoningis correct.
I suggest that we file a separate bug for CMS printing out"(concurrent mode failure)" twice, fix that and then we base yourtrace code on that.
What do you think?

Thanks,
Erik
Thanks
Kevin

*** if anybody really does hit this, or think it's likely, we can look
again...



On 06/03/13 18:19, Erik Helin wrote:
Hi Kevin,

I think that there _might_ be a bug in CMS which was present even
before you added the event tracing.

If you look in aquire_control_and_collect, you will see that
"should_compact" can be set to false by
decide_foreground_collection_type. If this is the case, then we will
end up in do_mark_sweep_work.

The problem is that you have already reported, and CMS has already
printed, that a concurrent mode failure has occurred in
acquire_control_and_collect. Then, when you enter do_mark_sweep_work,
you will once again report, and CMS will again print, that concurrent
mode failure has happened.

I am not 100% sure that I am right, by I believe that this can happen.

What do you think?

Thanks,
Erik

On 03/01/2013 06:34 PM, Kevin Walls wrote:
Hi,

I'd like some reviews on this CMS Concurrent Mode Failure event:

http://cr.openjdk.java.net/~kevinw/8008917/hotspot/

The event doesn't actually carry any new information, but it is a
warning we need to capture.

This is against hsx24, I'll prepare the same, or reviewed, changes
against very latest hotspot also.

Thanks
Kevin

Re: RFR: 8008917 CMS: Concurrent mode failure tracing event

Reply via email to