Hi upload works again, now with webrev : http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.2/
Best regards, Matthias > -----Original Message----- > From: Baesken, Matthias > Sent: Mittwoch, 31. Juli 2019 14:05 > To: 'David Holmes' <[email protected]>; Jean Christophe Beyler > <[email protected]> > Cc: [email protected]; serviceability-dev <serviceability- > [email protected]> > Subject: RE: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast > Linux machines with Total safepoint time 0 ms > > Hello, here is a version following the latest proposal of JC . > > Unfortunately attached as patch, sorry for that - the uploads / pushes > currently do not work from here . > > Best regards, Matthias > > > > -----Original Message----- > > From: David Holmes <[email protected]> > > Sent: Mittwoch, 31. Juli 2019 05:04 > > To: Jean Christophe Beyler <[email protected]> > > Cc: Baesken, Matthias <[email protected]>; hotspot- > > [email protected]; serviceability-dev <serviceability- > > [email protected]> > > Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on > fast > > Linux machines with Total safepoint time 0 ms > > > > On 31/07/2019 9:08 am, Jean Christophe Beyler wrote: > > > FWIW, I would have done something like what David was suggesting, just > > > slightly tweaked: > > > > > > public static long executeThreadDumps() { > > > long value; > > > long initial_value = mbean.getTotalSafepointTime(); > > > do { > > > Thread.getAllStackTraces(); > > > value = mbean.getTotalSafepointTime(); > > > } while (value == initial_value); > > > return value; > > > } > > > > > > This ensures that the value is a new value as opposed to the current > > > value and if something goes wrong, as David said, it will timeout; which > > > is ok. > > > > Works for me. > > > > > But I come back to not really understanding why we are doing this at > > > this point of relaxing (just get a new value of safepoint time). > > > Because, if we accept timeouts now as a failure here, then really the > > > whole test becomes: > > > > > > executeThreadDumps(); > > > executeThreadDumps(); > > > > > > Since the first call will return when value > 0 and the second call will > > > return when value2 > value (I still wonder why we want to ensure it > > > works twice...). > > > > The test is trying to sanity check that we are actually recording the > > time used by safepoints. So first check is that we can get a non-zero > > value; second check is we get a greater non-zero value. It's just a > > sanity test to try and catch if something gets unexpectedly broken in > > the time tracking code. > > > > > So both failures and even testing for it is kind of redundant, once you > > > have a do/while until a change? > > > > Yes - the problem with the tests that try to check internal VM behaviour > > is that we have no specified way to do something, in this case execute > > safepoints, that relates to internal VM behaviour, so we have to do > > something we know will currently work even if not specified to do so - > > e.g. dumping all thread stacks uses a global safepoint. The second > > problem is that the timer granularity is so coarse that we then have to > > guess how many times we need to do that something before seeing a > > change. To make the test robust we can keep doing stuff until we see a > > change and so the only way that will fail is if the overall timeout of > > the test kicks in. Or we can try and second guess how long it should > > take by introducing our own internal timeout - either directly or by > > limiting the number of loops in this case. That has its own problems and > > in general we have tried to reduce internal test timeouts (by removing > > them) and let overall timeouts take charge. > > > > No ideal solution. And this has already consumed way too much of > > everyone's time. > > > > Cheers, > > David > > > > > Thanks, > > > Jc > > > > > > > > > On Tue, Jul 30, 2019 at 2:35 PM David Holmes <[email protected] > > > <mailto:[email protected]>> wrote: > > > > > > On 30/07/2019 10:39 pm, Baesken, Matthias wrote: > > > > Hi David, "put that whole code (the while loop) in a helper > > > method." was JC's idea, and I like the idea . > > > > > > Regardless I think the way you are using NUM_THREAD_DUMPS is > really > > > confusing. As an all-caps static you'd expect it to be a constant. > > > > > > Thanks, > > > David > > > > > > > Let's see what others think . > > > > > > > >> > > > >> Overall tests like this are not very useful, yet very fragile. > > > >> > > > > > > > > I am also fine with putting the test on the exclude list. > > > > > > > > Best regards, Matthias > > > > > > > > > > > >> -----Original Message----- > > > >> From: David Holmes <[email protected] > > > <mailto:[email protected]>> > > > >> Sent: Dienstag, 30. Juli 2019 14:12 > > > >> To: Baesken, Matthias <[email protected] > > > <mailto:[email protected]>>; Jean Christophe > > > >> Beyler <[email protected] <mailto:[email protected]>> > > > >> Cc: [email protected] > > > <mailto:[email protected]>; serviceability-dev > > > <serviceability- > > > >> [email protected] <mailto:[email protected]>> > > > >> Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java > > > fails on fast > > > >> Linux machines with Total safepoint time 0 ms > > > >> > > > >> Hi Matthias, > > > >> > > > >> On 30/07/2019 9:25 pm, Baesken, Matthias wrote: > > > >>> Hello JC / David, here is a second webrev : > > > >>> > > > >>> http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/ > > > >>> > > > >>> It moves the thread dump execution into a method > > > >>> executeThreadDumps(long) , and also adds while loops > > > (but with a > > > >>> limitation for the number of thread dumps, really don’t > > > >>> want to cause timeouts etc.). I removed a check for > > > >>> MAX_VALUE_FOR_PASS because we cannot go over > > Long.MAX_VALUE . > > > >> > > > >> I don't think executeThreadDumps is worth factoring out like out. > > > >> > > > >> The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather > it > > > >> remains a constant 100, and then you set a simple loop iteration > > > count > > > >> limit. Further with the proposed code when you get here: > > > >> > > > >> 85 NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2; > > > >> > > > >> you don't even know what value you may be starting with. > > > >> > > > >> But I was thinking of simply: > > > >> > > > >> long value = 0; > > > >> do { > > > >> Thread.getAllStackTraces(); > > > >> value = mbean.getTotalSafepointTime(); > > > >> } while (value == 0); > > > >> > > > >> We'd only hit a timeout if something is completely broken - > > > which is fine. > > > >> > > > >> Overall tests like this are not very useful, yet very fragile. > > > >> > > > >> Thanks, > > > >> David > > > >> > > > >>> Hope you like this version better. > > > >>> > > > >>> Best regards, Matthias > > > >>> > > > >>> *From:*Jean Christophe Beyler <[email protected] > > > <mailto:[email protected]>> > > > >>> *Sent:* Dienstag, 30. Juli 2019 05:39 > > > >>> *To:* David Holmes <[email protected] > > > <mailto:[email protected]>> > > > >>> *Cc:* Baesken, Matthias <[email protected] > > > <mailto:[email protected]>>; > > > >>> [email protected] > > > <mailto:[email protected]>; serviceability-dev > > > >>> <[email protected] > > > <mailto:[email protected]>> > > > >>> *Subject:* Re: RFR: [XS] 8228658: test > > > GetTotalSafepointTime.java fails > > > >>> on fast Linux machines with Total safepoint time 0 ms > > > >>> > > > >>> Hi Matthias, > > > >>> > > > >>> I wonder if you should not do what David is suggesting and then > > > put that > > > >>> whole code (the while loop) in a helper method. Below you have a > > > >>> calculation again using value2 (which I wonder what the added > > > value of > > > >>> it is though) but anyway, that value2 could also be 0 at some > > > point, no? > > > >>> > > > >>> So would it not be best to just refactor the getAllStackTraces > > > and > > > >>> calculate safepoint time in a helper method for both value / > > > value2 > > > >>> variables? > > > >>> > > > >>> Thanks, > > > >>> > > > >>> Jc > > > >>> > > > >>> On Mon, Jul 29, 2019 at 7:50 PM David Holmes > > > <[email protected] <mailto:[email protected]> > > > >>> <mailto:[email protected] > > > <mailto:[email protected]>>> wrote: > > > >>> > > > >>> Hi Matthias, > > > >>> > > > >>> On 29/07/2019 8:20 pm, Baesken, Matthias wrote: > > > >>> > Hello , please review this small test fix . > > > >>> > > > > >>> > The test > > > >>> > > > >> > > > test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime. > > > >> java > > > >>> fails sometimes on fast Linux machines with this error > > > message : > > > >>> > > > > >>> > java.lang.RuntimeException: Total safepoint time > > > illegal value: 0 > > > >>> ms (MIN = 1; MAX = 9223372036854775807) > > > >>> > > > > >>> > looks like the total safepoint time is too low > > > currently on these > > > >>> machines, it is < 1 ms. > > > >>> > > > > >>> > There might be several ways to handle this : > > > >>> > > > > >>> > * Change the test in a way that it might generate > > > nigher > > > >>> safepoint times > > > >>> > * Allow safepoint time == 0 ms > > > >>> > * Offer an additional interface that gives > > > safepoint times > > > >>> with finer granularity ( currently the HS has safepoint > > > time values > > > >>> in ns , see jdk/src/hotspot/share/runtime/safepoint.cpp > > > >>> SafepointTracing::end > > > >>> > > > > >>> > But it is converted on ms in this code > > > >>> > > > > >>> > 114jlong RuntimeService::safepoint_time_ms() { > > > >>> > 115 return UsePerfData ? > > > >>> > 116 > > > >>> > > > Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1; > > > >>> > 117} > > > >>> > > > > >>> > 064jlong Management::ticks_to_ms(jlong ticks) { > > > >>> > 2065 assert(os::elapsed_frequency() > 0, "Must be > > > non-zero"); > > > >>> > 2066 return (jlong)(((double)ticks / > > > >>> (double)os::elapsed_frequency()) > > > >>> > 2067 * (double)1000.0); > > > >>> > 2068} > > > >>> > > > > >>> > > > > >>> > > > > >>> > Currently I go for the first attempt (and try to > > > generate > > > >>> higher safepoint times in my patch) . > > > >>> > > > >>> Yes that's probably best. Coarse-grained timing on very > > > fast machines > > > >>> was bound to eventually lead to problems. > > > >>> > > > >>> But perhaps a more future-proof approach is to just add a > > > do-while loop > > > >>> around the stack dumps and only exit when we have a non-zero > > > >> safepoint > > > >>> time? > > > >>> > > > >>> Thanks, > > > >>> David > > > >>> ----- > > > >>> > > > >>> > Bug/webrev : > > > >>> > > > > >>> > https://bugs.openjdk.java.net/browse/JDK-8228658 > > > >>> > > > > >>> > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/ > > > >>> > > > > >>> > > > > >>> > Thanks, Matthias > > > >>> > > > > >>> > > > >>> > > > >>> -- > > > >>> > > > >>> Thanks, > > > >>> > > > >>> Jc > > > >>> > > > > > > > > > > > > -- > > > > > > Thanks, > > > Jc
