Hi Erik - aha yes, so not that rare then for the right kind of program!

I'll talk to you about when/how to do this and the new event.

Thanks
Kevin


On 08/03/13 09:50, Erik Helin wrote:
Kevin,

On 03/07/2013 05:52 PM, Kevin Walls wrote:
Hi Erik -

Yes, now you mention it I can see the route to printing the old warning
or logging the event twice...

I don't think it's reported as a problem, or maybe it's very rare and
nobody has spotted it.

It is possible to reproduce the problem with a Java program that simply allocates until OOM:

  import java.util.ArrayList;

  public class Reproducer {
      public static ArrayList<byte[]> garbage = new ArrayList<byte[]>();
      public static void main(String[] args) {
          while (true) {
              garbage.add(new byte[1024]);
          }
      }
  }

Compile the above Java program and then run:

java -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:-UseCMSCompactAtFullCollection -Xmx100m Reproducer

Depending on how fast machine you have, the Java program will run for about 2 seconds and the GC logs will show (near the start of the output):

[GC (Allocation Failure) [ParNew: 30720K->30720K(30720K), 0.0001940 secs][CMS[CMS-concurrent-mark: 0.035/0.036 secs] [Times: user=0.03 sys=0.00, real=0.04 secs] (concurrent mode failure) (concurrent mode failure)[YG occupancy: 30720 K (30720 K)][checkpointRootsFinalWork[Rescan (parallel) , 0.0314620 secs][refProcessingWork[weak refs processing, 0.0000320 secs][class unloading, 0.0006770 secs][scrub symbol table, 0.0009850 secs][scrub string table, 0.0000690 secs], 0.0021430 secs], 0.0339750 secs]: 50655K->50654K(68288K), 0.0972030 secs] 81375K->81374K(99008K), [Metaspace: 2662K->2662K(4486K/110592K)], 0.0980400 secs] [Times: user=0.11 sys=0.00, real=0.10 secs]

Please notice the two "(concurrent mode failure)" above which are printed for the reason I explained in my previous email.

On 03/07/2013 05:52 PM, Kevin Walls wrote:
But assuming it's not a "user-requested" collection, to get that false
"should_compact" in acquire_control_and_collect, we need to call
decide_foreground_collection_type(), and when it calls
incremental_collection_will_fail(), that returns false.

This is assuming that the flag UseCMSCompactAtFullCollection is true.

On 03/07/2013 05:52 PM, Kevin Walls wrote:
Possibly that is why we don't see the event reported twice in
practice***: if we've got to this point, and state>Idling, we are
usually here because that inc. collection would fail/is failing...

This is most likely the reason, since UseCMSCompactAtFullCollection is true by default and if users aren't changing it, then your reasoning is correct.

I suggest that we file a separate bug for CMS printing out "(concurrent mode failure)" twice, fix that and then we base your trace code on that.

What do you think?

Thanks,
Erik

Thanks
Kevin

*** if anybody really does hit this, or think it's likely, we can look
again...



On 06/03/13 18:19, Erik Helin wrote:
Hi Kevin,

I think that there _might_ be a bug in CMS which was present even
before you added the event tracing.

If you look in aquire_control_and_collect, you will see that
"should_compact" can be set to false by
decide_foreground_collection_type. If this is the case, then we will
end up in do_mark_sweep_work.

The problem is that you have already reported, and CMS has already
printed, that a concurrent mode failure has occurred in
acquire_control_and_collect. Then, when you enter do_mark_sweep_work,
you will once again report, and CMS will again print, that concurrent
mode failure has happened.

I am not 100% sure that I am right, by I believe that this can happen.

What do you think?

Thanks,
Erik

On 03/01/2013 06:34 PM, Kevin Walls wrote:
Hi,

I'd like some reviews on this CMS Concurrent Mode Failure event:

http://cr.openjdk.java.net/~kevinw/8008917/hotspot/

The event doesn't actually carry any new information, but it is a
warning we need to capture.

This is against hsx24, I'll prepare the same, or reviewed, changes
against very latest hotspot also.

Thanks
Kevin





Reply via email to