Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast Linux machines with Total safepoint time 0 ms

David Holmes Tue, 30 Jul 2019 14:37:02 -0700

On 30/07/2019 10:39 pm, Baesken, Matthias wrote:

Hi David,   "put that whole code (the while loop) in a helper method."   was 
JC's idea,  and I like the idea .

Regardless I think the way you are using NUM_THREAD_DUMPS is reallyconfusing. As an all-caps static you'd expect it to be a constant.


Thanks,
David

Let's see what others think .


Overall tests like this are not very useful, yet very fragile.


I am also  fine with putting the test on the exclude list.

Best regards, Matthias

-----Original Message-----
From: David Holmes <[email protected]>
Sent: Dienstag, 30. Juli 2019 14:12
To: Baesken, Matthias <[email protected]>; Jean Christophe
Beyler <[email protected]>
Cc: [email protected]; serviceability-dev <serviceability-
[email protected]>
Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast
Linux machines with Total safepoint time 0 ms

Hi Matthias,

On 30/07/2019 9:25 pm, Baesken, Matthias wrote:

Hello  JC / David,   here is a second webrev  :

http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/

It moves   the  thread dump execution into a  method
executeThreadDumps(long)     , and also adds  while loops   (but with a
limitation  for the number of thread dumps, really don’t
want to cause timeouts etc.).    I removed a check for
MAX_VALUE_FOR_PASS   because we cannot go over Long.MAX_VALUE .


I don't think executeThreadDumps is worth factoring out like out.

The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather it
remains a constant 100, and then you set a simple loop iteration count
limit. Further with the proposed code when you get here:

   85         NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2;

you don't even know what value you may be starting with.

But I was thinking of simply:

long value = 0;
do {
      Thread.getAllStackTraces();
      value = mbean.getTotalSafepointTime();
} while (value == 0);

We'd only hit a timeout if something is completely broken - which is fine.

Overall tests like this are not very useful, yet very fragile.

Thanks,
David

Hope you like this version  better.

Best regards, Matthias

*From:*Jean Christophe Beyler <[email protected]>
*Sent:* Dienstag, 30. Juli 2019 05:39
*To:* David Holmes <[email protected]>
*Cc:* Baesken, Matthias <[email protected]>;
[email protected]; serviceability-dev
<[email protected]>
*Subject:* Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails
on fast Linux machines with Total safepoint time 0 ms

Hi Matthias,

I wonder if you should not do what David is suggesting and then put that
whole code (the while loop) in a helper method. Below you have a
calculation again using value2 (which I wonder what the added value of
it is though) but anyway, that value2 could also be 0 at some point, no?

So would it not be best to just refactor the getAllStackTraces and
calculate safepoint time in a helper method for both value / value2
variables?

Thanks,

Jc

On Mon, Jul 29, 2019 at 7:50 PM David Holmes <[email protected]
<mailto:[email protected]>> wrote:

     Hi Matthias,

     On 29/07/2019 8:20 pm, Baesken, Matthias wrote:
      > Hello , please review this small test fix .
      >
      > The test

test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime.
java

     fails sometimes on fast Linux machines with this error message :
      >
      > java.lang.RuntimeException: Total safepoint time illegal value: 0
     ms (MIN = 1; MAX = 9223372036854775807)
      >
      > looks like the total safepoint time is too low currently on these
     machines, it is < 1 ms.
      >
      > There might be several ways to handle this :
      >
      >    *   Change the test  in a way that it might generate nigher
     safepoint times
      >    *   Allow  safepoint time  == 0 ms
      >    *   Offer an additional interface that gives  safepoint times
     with finer granularity ( currently the HS has safepoint time values
     in ns , see  jdk/src/hotspot/share/runtime/safepoint.cpp
       SafepointTracing::end
      >
      > But it is converted on ms in this code
      >
      > 114jlong RuntimeService::safepoint_time_ms() {
      > 115  return UsePerfData ?
      > 116
     Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1;
      > 117}
      >
      > 064jlong Management::ticks_to_ms(jlong ticks) {
      > 2065  assert(os::elapsed_frequency() > 0, "Must be non-zero");
      > 2066  return (jlong)(((double)ticks /
     (double)os::elapsed_frequency())
      > 2067                 * (double)1000.0);
      > 2068}
      >
      >
      >
      > Currently I go for  the first attempt (and try to generate
     higher safepoint times in my patch) .

     Yes that's probably best. Coarse-grained timing on very fast machines
     was bound to eventually lead to problems.

     But perhaps a more future-proof approach is to just add a do-while loop
     around the stack dumps and only exit when we have a non-zero

safepoint

     time?

     Thanks,
     David
     -----

      > Bug/webrev :
      >
      > https://bugs.openjdk.java.net/browse/JDK-8228658
      >
      > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/
      >
      >
      > Thanks, Matthias
      >


--

Thanks,

Jc

Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast Linux machines with Total safepoint time 0 ms

Reply via email to