On 31/07/2019 9:08 am, Jean Christophe Beyler wrote:
FWIW, I would have done something like what David was suggesting, just
slightly tweaked:
public static long executeThreadDumps() {
long value;
long initial_value = mbean.getTotalSafepointTime();
do {
Thread.getAllStackTraces();
value = mbean.getTotalSafepointTime();
} while (value == initial_value);
return value;
}
This ensures that the value is a new value as opposed to the current
value and if something goes wrong, as David said, it will timeout; which
is ok.
Works for me.
But I come back to not really understanding why we are doing this at
this point of relaxing (just get a new value of safepoint time).
Because, if we accept timeouts now as a failure here, then really the
whole test becomes:
executeThreadDumps();
executeThreadDumps();
Since the first call will return when value > 0 and the second call will
return when value2 > value (I still wonder why we want to ensure it
works twice...).
The test is trying to sanity check that we are actually recording the
time used by safepoints. So first check is that we can get a non-zero
value; second check is we get a greater non-zero value. It's just a
sanity test to try and catch if something gets unexpectedly broken in
the time tracking code.
So both failures and even testing for it is kind of redundant, once you
have a do/while until a change?
Yes - the problem with the tests that try to check internal VM behaviour
is that we have no specified way to do something, in this case execute
safepoints, that relates to internal VM behaviour, so we have to do
something we know will currently work even if not specified to do so -
e.g. dumping all thread stacks uses a global safepoint. The second
problem is that the timer granularity is so coarse that we then have to
guess how many times we need to do that something before seeing a
change. To make the test robust we can keep doing stuff until we see a
change and so the only way that will fail is if the overall timeout of
the test kicks in. Or we can try and second guess how long it should
take by introducing our own internal timeout - either directly or by
limiting the number of loops in this case. That has its own problems and
in general we have tried to reduce internal test timeouts (by removing
them) and let overall timeouts take charge.
No ideal solution. And this has already consumed way too much of
everyone's time.
Cheers,
David
Thanks,
Jc
On Tue, Jul 30, 2019 at 2:35 PM David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>> wrote:
On 30/07/2019 10:39 pm, Baesken, Matthias wrote:
> Hi David, "put that whole code (the while loop) in a helper
method." was JC's idea, and I like the idea .
Regardless I think the way you are using NUM_THREAD_DUMPS is really
confusing. As an all-caps static you'd expect it to be a constant.
Thanks,
David
> Let's see what others think .
>
>>
>> Overall tests like this are not very useful, yet very fragile.
>>
>
> I am also fine with putting the test on the exclude list.
>
> Best regards, Matthias
>
>
>> -----Original Message-----
>> From: David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>>
>> Sent: Dienstag, 30. Juli 2019 14:12
>> To: Baesken, Matthias <matthias.baes...@sap.com
<mailto:matthias.baes...@sap.com>>; Jean Christophe
>> Beyler <jcbey...@google.com <mailto:jcbey...@google.com>>
>> Cc: hotspot-...@openjdk.java.net
<mailto:hotspot-...@openjdk.java.net>; serviceability-dev
<serviceability-
>> d...@openjdk.java.net <mailto:d...@openjdk.java.net>>
>> Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java
fails on fast
>> Linux machines with Total safepoint time 0 ms
>>
>> Hi Matthias,
>>
>> On 30/07/2019 9:25 pm, Baesken, Matthias wrote:
>>> Hello JC / David, here is a second webrev :
>>>
>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/
>>>
>>> It moves the thread dump execution into a method
>>> executeThreadDumps(long) , and also adds while loops
(but with a
>>> limitation for the number of thread dumps, really don’t
>>> want to cause timeouts etc.). I removed a check for
>>> MAX_VALUE_FOR_PASS because we cannot go over Long.MAX_VALUE .
>>
>> I don't think executeThreadDumps is worth factoring out like out.
>>
>> The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather it
>> remains a constant 100, and then you set a simple loop iteration
count
>> limit. Further with the proposed code when you get here:
>>
>> 85 NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2;
>>
>> you don't even know what value you may be starting with.
>>
>> But I was thinking of simply:
>>
>> long value = 0;
>> do {
>> Thread.getAllStackTraces();
>> value = mbean.getTotalSafepointTime();
>> } while (value == 0);
>>
>> We'd only hit a timeout if something is completely broken -
which is fine.
>>
>> Overall tests like this are not very useful, yet very fragile.
>>
>> Thanks,
>> David
>>
>>> Hope you like this version better.
>>>
>>> Best regards, Matthias
>>>
>>> *From:*Jean Christophe Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>>
>>> *Sent:* Dienstag, 30. Juli 2019 05:39
>>> *To:* David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>>
>>> *Cc:* Baesken, Matthias <matthias.baes...@sap.com
<mailto:matthias.baes...@sap.com>>;
>>> hotspot-...@openjdk.java.net
<mailto:hotspot-...@openjdk.java.net>; serviceability-dev
>>> <serviceability-dev@openjdk.java.net
<mailto:serviceability-dev@openjdk.java.net>>
>>> *Subject:* Re: RFR: [XS] 8228658: test
GetTotalSafepointTime.java fails
>>> on fast Linux machines with Total safepoint time 0 ms
>>>
>>> Hi Matthias,
>>>
>>> I wonder if you should not do what David is suggesting and then
put that
>>> whole code (the while loop) in a helper method. Below you have a
>>> calculation again using value2 (which I wonder what the added
value of
>>> it is though) but anyway, that value2 could also be 0 at some
point, no?
>>>
>>> So would it not be best to just refactor the getAllStackTraces and
>>> calculate safepoint time in a helper method for both value / value2
>>> variables?
>>>
>>> Thanks,
>>>
>>> Jc
>>>
>>> On Mon, Jul 29, 2019 at 7:50 PM David Holmes
<david.hol...@oracle.com <mailto:david.hol...@oracle.com>
>>> <mailto:david.hol...@oracle.com
<mailto:david.hol...@oracle.com>>> wrote:
>>>
>>> Hi Matthias,
>>>
>>> On 29/07/2019 8:20 pm, Baesken, Matthias wrote:
>>> > Hello , please review this small test fix .
>>> >
>>> > The test
>>>
>> test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime.
>> java
>>> fails sometimes on fast Linux machines with this error
message :
>>> >
>>> > java.lang.RuntimeException: Total safepoint time
illegal value: 0
>>> ms (MIN = 1; MAX = 9223372036854775807)
>>> >
>>> > looks like the total safepoint time is too low
currently on these
>>> machines, it is < 1 ms.
>>> >
>>> > There might be several ways to handle this :
>>> >
>>> > * Change the test in a way that it might generate
nigher
>>> safepoint times
>>> > * Allow safepoint time == 0 ms
>>> > * Offer an additional interface that gives
safepoint times
>>> with finer granularity ( currently the HS has safepoint
time values
>>> in ns , see jdk/src/hotspot/share/runtime/safepoint.cpp
>>> SafepointTracing::end
>>> >
>>> > But it is converted on ms in this code
>>> >
>>> > 114jlong RuntimeService::safepoint_time_ms() {
>>> > 115 return UsePerfData ?
>>> > 116
>>>
Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1;
>>> > 117}
>>> >
>>> > 064jlong Management::ticks_to_ms(jlong ticks) {
>>> > 2065 assert(os::elapsed_frequency() > 0, "Must be
non-zero");
>>> > 2066 return (jlong)(((double)ticks /
>>> (double)os::elapsed_frequency())
>>> > 2067 * (double)1000.0);
>>> > 2068}
>>> >
>>> >
>>> >
>>> > Currently I go for the first attempt (and try to generate
>>> higher safepoint times in my patch) .
>>>
>>> Yes that's probably best. Coarse-grained timing on very
fast machines
>>> was bound to eventually lead to problems.
>>>
>>> But perhaps a more future-proof approach is to just add a
do-while loop
>>> around the stack dumps and only exit when we have a non-zero
>> safepoint
>>> time?
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>> > Bug/webrev :
>>> >
>>> > https://bugs.openjdk.java.net/browse/JDK-8228658
>>> >
>>> > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/
>>> >
>>> >
>>> > Thanks, Matthias
>>> >
>>>
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> Jc
>>>
--
Thanks,
Jc