Hi Chris and Serguei,
Please review the new version of the fix that includes the changes Chris
suggested.
Webrev: http://cr.openjdk.java.net/~dtitov/8207367/webrev.04
Bug: https://bugs.openjdk.java.net/browse/JDK-8207367
Thanks!
--Daniil
On 2/27/19, 5:10 PM, "Daniil Titov" <[email protected]> wrote:
Hi Chris,
>> Is there a reason not to adjust it by argHandler.getWaitTime()?
I agree, for consistency it makes sense to adjust it by
argHandler.getWaitTime().
Thanks.
Daniil
On 2/27/19, 4:54 PM, "Chris Plummer" <[email protected]> wrote:
On 2/27/19 4:09 PM, Daniil Titov wrote:
> Hi Chris,
>
>> It look like the exclfilter001.java loop will always end up looping
for eventWaitTime seconds:
> It is true for revision 3 of the patch
http://cr.openjdk.java.net/~dtitov/8207367/webrev.03/test/hotspot/jtreg/vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java.sdiff.html
>
> But in revision 1
http://cr.openjdk.java.net/~dtitov/8207367/webrev.01/test/hotspot/jtreg/vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java.sdiff.html
the loop could finish earlier if timeout happens on lines 160 or 183 (since
eventSet is null).
> When talking about 4 seconds I was referring to lines 160, 163 and
183. Now the timeout passed in remove(waitTime) method is 4 seconds in Mach5. But
the total waiting time is limited by 200 seconds. If no events are delivered for 4
seconds timespan then the method returns earlier.
> It seems as 4 seconds is quite sufficient, but we could increase it
if there are some concerns here.
>
> 153 long waitTime = Utils.adjustTimeout(1000);
> 160 eventSet =
debugee.VM().eventQueue().remove(waitTime);
> 185 eventSet =
debugee.VM().eventQueue().remove(waitTime);
Is there a reason not to adjust it by argHandler.getWaitTime()?
Chris
>
>
> Thanks!
>
> --Daniil
>
>
> On 2/27/19, 3:08 PM, "Chris Plummer" <[email protected]>
wrote:
>
>
>
> On 2/27/19 2:33 PM, Daniil Titov wrote:
> > Hi Chris,
> >
> > The change in while loop in exclfilter001.java between the
first and the last revisions was to ensure that all events are drained before the
method returns and the next check starts. It used to wait for 5 minutes and now it
keeps receiving events in portions waiting for 1 seconds at every iteration.
> It look like the exclfilter001.java loop will always end up
looping for
> eventWaitTime seconds:
>
> 92 eventWaitTime =
> Utils.adjustTimeout(argHandler.getWaitTime() * 10000);
>
> 125 receiveEvents(eventWaitTime, patterns[i]);
>
> I think you said that argHandler.getWaitTime() is 5 and
> Utils.adjustTimeout() is another 4x for mach5 jobs, so this
would mean a
> total of 200 seconds, and this is how much time the loop would
always
> take since it loops for eventWaitTime seconds.
> > You mentioned in one of previous emails that due to network
issues it might be the case that for some iteration we fail to receive an event
within this smaller timeout. But later, if I understood you right, you suggested that
4 seconds ( Utils.adjustTimeout(1000) adjusts timeout for Mach5 builds to 4 seconds)
should be sufficient.
> Actually I was going more by the argHandler.getWaitTime()
adjustment,
> which I thought was 5. Mach5 will multiply that by 4. I think
1s is too
> short but 5s is probably ok, and 20s for mach5 jobs should
prevent any
> noise from rare network issues, except for serious ones that we
can't
> expect to recover from.
> > I agree that it's better to make the test return earlier
and I will revert changes in exclfilter001.java and filter001.java to the first
revision.
> Ok.
> >
> > For consistency I will change line 153 to take into account a
wait time specified in argHandler.getWaitTime()
> > 153 long waitTime = Utils.adjustTimeout(1000);
> >
> Ok.
> > I am sorry I missed you comments abut Utils.java. You are
right we don’t need a new method and Utils.waitForCondition() should be sufficient.
> > There is no need for passing logger into since if the thread
is interrupted the error is thrown and the test fails. Will send a new webrev as
soon as testing completes.
> thanks,
>
> Chris
> > Thanks!
> > Daniil
> >
> >
> > On 2/27/19, 12:08 PM, "Chris Plummer"
<[email protected]> wrote:
> >
> > Hi Daniil,
> >
> > In exclfilter001.java, since the first revision you
changed the while
> > loop to be "while (true)". I'm not sure of the
reasoning. It used to
> > exit the first time remove() didn't return an EventSet.
Now it retries
> > until the total waittime is exceeded.
> >
> > When waiting for an event, sometimes you use:
> >
> > 91 eventWaitTime =
> > Utils.adjustTimeout(argHandler.getWaitTime() * 10000);
> >
> > and sometimes:
> >
> > 153 long waitTime = Utils.adjustTimeout(1000);
> >
> > Why the discrepancy. In one case you are willing to wait
50 seconds for
> > an event and in the other only 5.
> >
> > I think you missed my initial comment on Utils.java. See
below:
> >
> > > In Utils.java, I think wait() should be moved right
after
> > > waitForCondition() and maybe given a more descriptive
name. It seems
> > > to basically the same as waitForCondition(), except you added a
"log"
> > > parameter and slightly changed the behavior. Are these
behavior
> > > differences necessary? Could you share code with the
existing
> > > waitForCondition()?
> >
> > thanks,
> >
> > Chris
> >
> > On 2/27/19 9:03 AM, Daniil Titov wrote:
> > > Hi Serguei and Chris,
> > >
> > > Thank you for reviewing this change. Please review a
new version of the fix that addresses these findings.
> > >
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8207367
> > > Webrev:
http://cr.openjdk.java.net/~dtitov/8207367/webrev.03
> > >
> > > Best regards,
> > > Daniil
> > >
> > > From: <[email protected]>
> > > Organization: Oracle Corporation
> > > Date: Tuesday, February 26, 2019 at 6:50 PM
> > > To: Daniil Titov <[email protected]>, Chris Plummer
<[email protected]>, OpenJDK Serviceability <[email protected]>
> > > Subject: Re: RFR 8207367: 10 vmTestbase/nsk/jdi tests
timed out when running with jtreg
> > >
> > > Hi Daniil,
> > >
> > > It looks good to me.
> > > I have some minor comments though.
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java.frames.html
> > >
> > > 163 if(eventSet != null) {
> > > Space is missed after if
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove/remove004.java.frames.html
> > > 70 * In first one, second thread waits for any incoming
event from the <BR>
> > > 71 * debugger which is sleeping for some time; hence,
<BR>
> > > 72 * no events are expected to be received at the
debugger end. <BR>
> > > <BR> is not aligned at 71
> > >
> > >
> > > While you are at this code, could you, please,
> > > also fix unneeded spaces at the lines? :
> > > 516 } catch ( InterruptedException e1) {
> > > ...
> > > 526 } catch ( Exception e ) {
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove/remove004a.java.frames.html
> > > 111
Utils.adjustTimeout(argHandler.getWaitTime() * 10000),
> > > 112 100,
> > > 113
remove004a::log1);
> > > Line 112 is not properly aligned
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove_l/remove_l004.java.frames.html
> > > 65 * In the first one the first assertion is checked up
on as follows: <BR>
> > > 66 * the debugger sleeps for some time;
<BR>
> > > 67 * hence, no event is expected in the debugger within
WAITTIME, and <BR>
> > > <BR> at 66 is not aligned
> > > Thank you for catching and fixing the typo at 67!
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002.java.frames.html
> > >
> > > 295 },
Utils.adjustTimeout(waitTime * 60000), 1000, dispose002::log3);
> > > A separate line is needed for next wait() argument
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002a.java.frames.html
> > > 128 while(true) {
> > > ...
> > > 130
if(instruction.equals("check_done")){
> > > 131 if
(test_thread.isAlive()) {
> > > 132 exitCode =
FAILED;
> > > 133 }
> > >
> > > Space is missed after while.
> > > Some logErr("...") for failed case would be useful.
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003.java.frames.html
> > > 260 },
Utils.adjustTimeout(waitTime * 60000), 1000, dispose003::log3);
> > > New line is needed for next wait() argument
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java.frames.html
> > > 129 while(true) {
> > > 130 instruction =
pipe.readln();
> > > 131
if(instruction.equals("check_done")) {
> > > 132 if
(test_thread.isAlive()) {
> > > 133 exitCode =
FAILED;
> > > 134 }
> > > Space is missed after while and if.
> > > Some logErr("...") for failed case would be useful.
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004.java.frames.html
> > > 284 },
Utils.adjustTimeout(waitTime * 60000), 1000, dispose004::log3);
> > > New line is needed for next wait() argument
> > >
> > >
> > >
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004a.java.frames.html
> > > 130 while(true) {
> > > 131 instruction =
pipe.readln();
> > > 132
if(instruction.equals("check_done")) {
> > > 133 if
(test_thread.isAlive()) {
> > > 134 exitCode =
FAILED;
> > > 135 }
> > > Space is missed after while and if.
> > > Some logErr("...") for failed case would be useful.
> > >
> > >
> > > Thanks,
> > > Serguei
> > >
> > > On 2/26/19 6:06 PM, Daniil Titov wrote:
> > > Hi Chris,
> > >
> > > Please review a new version of the webrev that
slightly changes method receiveEvents(long,pattern) in the following tests:
> > >
-vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java
> > >
-vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java)
> > >
> > >
> > > The new changes ensure that
receiveEvents(long,pattern) method keeps receiving events in a while loop even if
eventSet returned by debugee.VM().eventQueue().remove(waitTime) is a null due to
timeout.
> > >
> > >
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8207367
> > > Webrev:
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02
> > >
> > > Thanks!
> > > -Daniil
> > >
> > > On 2/26/19, 11:41 AM, "Chris Plummer"
mailto:[email protected] wrote:
> > >
> > > Ok. I think you mentioned below that default wait
time will be 5
> > > seconds. That seems sufficient to avoid a timeout
if there are some
> > > minor network issues and some packets are lost.
> > >
> > > Changes look good.
> > >
> > > thanks,
> > >
> > > Chris
> > >
> > > On 2/26/19 10:01 AM, Daniil Titov wrote:
> > > > Hi Chris,
> > > >
> > > > Yes , it is correct. For example in this
particular test the timeout is expected (line 283 expects that breakpoint() returns
returnCode3 that is set on line 460 when eventSet on line 456 is null due to a timeout in
eventQueue.remove()) and shortening it makes the whole test fit in jtreg time limits.
> > > >
> > > > 281 log2(" checking up
that the thread2 is not at breakpoint1 because of suspension");
> > > > 282 expresult =
breakpoint();
> > > > 283 if (expresult !=
returnCode3) {
> > > > 284 log3("ERROR: no
timeout for waiting for BreakpointEvent when the thread2 is suspended");
> > > > 285 expresult =
returnCode1;
> > > > 286 break label1;
> > > > 287 } else
> > > >
> > > >
> > > > 445 private int breakpoint () {
> > > > 446
> > > > 447 int returnCode =
returnCode0;
> > > > 448
> > > > 449 log2(" waiting for
BreakpointEvent");
> > > > 450
> > > > 451 labelBP:
> > > > 452 for (;;) {
> > > > 453
> > > > 454 log2(" new:
eventSet = eventQueue.remove();");
> > > > 455 try {
> > > > 456 eventSet =
eventQueue.remove (Utils.adjustTimeout(waitTime*1000));
> > > > 457 if (eventSet ==
null) {
> > > > 458 log2("::::::
timeout when waiting for a BreakpintEvent");
> > > > 459 // log3("ERROR:
timeout for waiting for a BreakpintEvent");
> > > > 460 returnCode
= returnCode3;
> > > > 461 break
labelBP;
> > > > 462 }
> > > >
> > > >> And I just noticed the space right after
"remove". Can you remove it?
> > > > Sure. Will do.
> > > >
> > > > Thanks!
> > > > --Daniil
> > > >
> > > > On 2/25/19, 7:26 PM, "Chris Plummer"
mailto:[email protected] wrote:
> > > >
> > > > Hi Daniil,
> > > >
> > > > For suspend001, are you saying the
following is expected to timeout
> > > > sometimes, so you need a shorter waittime
to avoid making the whole test
> > > > time out?
> > > >
> > > > 456 eventSet =
eventQueue.remove
> > > > (Utils.adjustTimeout(waitTime*1000));
> > > >
> > > > And I just noticed the space right after
"remove". Can you remove it?
> > > >
> > > > thanks,
> > > >
> > > > Chris
> > > >
> > > > On 2/25/19 6:57 PM, Daniil Titov wrote:
> > > > > Hi Chris,
> > > > >
> > > > > The timeout issue mentioned in the bug
is about jtreg aborting the tests since they are running longer than the maximum allowed time.
That happens since these tests use extreme long internal delays, e.g. a sleep for 5 minutes
or a wait for 5 minutes for a case when no events ( and a notify()) are expected. Reducing
these internal delays makes the test passing within the default jtreg timeout ( 2 minutes).
> > > > >
> > > > > Best regards,
> > > > > Daniil
> > > > >
> > > > > On 2/25/19, 6:15 PM, "Chris Plummer"
mailto:[email protected] wrote:
> > > > >
> > > > > Ok. So how is the timeout issue
mentioned in the bug addressed when
> > > > > there is now a shorter wait time?
> > > > >
> > > > > Chris
> > > > >
> > > > > On 2/25/19 5:04 PM, Daniil Titov
wrote:
> > > > > > Hi Chris,
> > > > > >
> > > > > > Forgot to answer to your another
question:
> > > > > > > > For these 3 tests the
event wait timeout was reduced and adjusted for test.timeout.factor:
> > > > > > > >
-vmTestbase/nsk/jdi/Event/_itself_/event001.java
> > > > > > > >
-vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java
> > > > > > > >
-vmTestbase/nsk/jdi/ThreadReference/suspend/suspend001.java
> > > > > > > So overall is this a
shorter or longer waittime now?
> > > > > >
> > > > > > Overall this is a shorter waitime
now. Instead of 300 seconds it is now 20 seconds for Mach5 jobs (they are run with
test.timeout.factor set to 4.0) and 5 seconds for regular jtreg runs.
> > > > > >
> > > > > > Best regards,
> > > > > > Daniil
> > > > > >
> > > > > >
> > > > > > On 2/25/19, 4:38 PM, "Chris
Plummer" mailto:[email protected] wrote:
> > > > > >
> > > > > > Hi Daniil,
> > > > > >
> > > > > > Yes, my point was that the
max time you wait for a single event is much
> > > > > > smaller now. I can see a
possibility that with a little bit of network
> > > > > > instability a packet gets
lost and resend does not happen fast enough.
> > > > > >
> > > > > > thanks,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On 2/25/19 4:32 PM, Daniil
Titov wrote:
> > > > > > > Hi Chris,
> > > > > > >
> > > > > > > The code still waits for
the whole total wait time. There is a while loop at lines 163-186 that keeps receiving new events
(line 183) till elapsed time is less than the waittime (line 178) or a timeout happens (so eventSet
is null).
> > > > > > >
> > > > > > > 159 begin =
System.currentTimeMillis();
> > > > > > > 160
eventSet = debugee.VM().eventQueue().remove(waitTime);
> > > > > > > 161
delta = System.currentTimeMillis() - begin;
> > > > > > > 162
totalWaitTime -= delta;
> > > > > > > 163
while (eventSet != null) {
> > > > > > > 164
EventIterator eventIterator = eventSet.eventIterator();
> > > > > > >
> > > > > > > 178
if (totalWaitTime <= 0 || exit) {
> > > > > > > 179
break;
> > > > > > > 180
}
> > > > > > > 181
debugee.resume();
> > > > > > > 182
begin = System.currentTimeMillis();
> > > > > > > 183
eventSet = debugee.VM().eventQueue().remove(waitTime);
> > > > > > > 184
delta = System.currentTimeMillis() - begin;
> > > > > > > 185
totalWaitTime -= delta;
> > > > > > > 186 }
> > > > > > >
> > > > > > >
> > > > > > > However, as I see now in
case if a timeout happens on line 160 (eventSet is null) the loop is not executed at all. I
haven't observed it in test runs but I think it makes sense to adjust this test to take this
potential case into account. I will send an updated version of the patch soon.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Daniil
> > > > > > >
> > > > > > > On 2/25/19, 12:21 PM, "Chris
Plummer" mailto:[email protected] wrote:
> > > > > > >
> > > > > > > Hi Daniil,
> > > > > > >
> > > > > > > On 2/23/19 1:02 PM,
Daniil Titov wrote:
> > > > > > > > Please review the
change that fixes timeout issues for the following 10 tests when running with jtreg and default timeout
factor (1.0).
> > > > > > > In Utils.java, I
think wait() should be moved right after
> > > > > > > waitForCondition()
and maybe given a more descriptive name. It seems to
> > > > > > > basically the same as
waitForCondition(), except you added a "log"
> > > > > > > parameter and
slightly changed the behavior. Are these behavior
> > > > > > > differences
necessary? Could you share code with the existing
> > > > > > > waitForCondition()?
> > > > > > > >
> > > > > > > > For the following 2
tests the event wait timeout was reduced and adjusted for test.timeout.factor. Method
receiveEvents(long,pattern) was fixed to ensure that it gracefully exits after the specified wait
period elapsed:
> > > > > > > >
-vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java
> > > > > > > >
-vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java
> > > > > > > 183
eventSet = debugee.VM().eventQueue().remove(waitTime);
> > > > > > >
> > > > > > > This code used to
wait for the total remaining waittime. Now it waits a
> > > > > > > fixed amount based on:
> > > > > > >
> > > > > > > 153 long
waitTime = Utils.adjustTimeout(1000);
> > > > > > >
> > > > > > > How did you come up
with this wait amount, and is it long enough to deal
> > > > > > > with occasional
hiccups?
> > > > > > > > For these 3 tests
the event wait timeout was reduced and adjusted for test.timeout.factor:
> > > > > > > >
-vmTestbase/nsk/jdi/Event/_itself_/event001.java
> > > > > > > >
-vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java
> > > > > > > >
-vmTestbase/nsk/jdi/ThreadReference/suspend/suspend001.java
> > > > > > > So overall is this a
shorter or longer waittime now?
> > > > > > > >
> > > > > > > > For next 2 tests
the event wait timeout and the thread sleep time were reduced and adjusted for test.timeout.factor.
Additional synchronization between the debugger and the debuggee was added to ensure the debugee
process continues as soon as the test finishes the timeout related checks and advances to the next
steps:
> > > > > > > > -
vmTestbase/nsk/jdi/EventQueue/remove_l/remove_l004/TestDescription.java
> > > > > > > > -
vmTestbase/nsk/jdi/EventQueue/remove/remove004/TestDescription.java
> > > > > > > Ok.
> > > > > > > >
> > > > > > > > Instead of just
sleeping for 5 minutes while waiting for the debuggee test thread to complete the tests now check
whether the debuggee thread is alive in the loop. The total waiting timeout was adjusted for
test.timeout.factor:
> > > > > > > >
-vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004/TestDescription.java
> > > > > > > >
-vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003/TestDescription.java
> > > > > > > >
-vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002/TestDescription.java
> > > > > > > Ok.
> > > > > > >
> > > > > > > thanks,
> > > > > > >
> > > > > > > Chris
> > > > > > > >
> > > > > > > > Testing.
> > > > > > > > The following VM
options were used in Mach5 jobs to verify these changes:
> > > > > > > > 1. No VM args
> > > > > > > > 2.
-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler
-Djvmci.Compiler=grail
> > > > > > > > 3. -Xcomp
> > > > > > > >
> > > > > > > > Also tier1, tier2
and tier3 Mach5 jobs succeeded.
> > > > > > > >
> > > > > > > > To verify that
tests succeed with test.timeout.factor set to 1.0 the following patch was used before running Mach5
jobs.
> > > > > > > >
> > > > > > > > ---
a/make/RunTests.gmk Thu Feb 21 15:17:42 2019 -0800
> > > > > > > > +++
b/make/RunTests.gmk Thu Feb 21 15:42:15 2019 -0800
> > > > > > > > @@ -826,6 +826,7 @@
> > > > > > > > else
> > > > > > > >
JTREG_TIMEOUT_FACTOR ?= 4
> > > > > > > > endif
> > > > > > > > +
JTREG_TIMEOUT_FACTOR = 1
> > > > > > > > JTREG_VERBOSE
?= fail,error,summary
> > > > > > > > JTREG_RETAIN ?=
fail,error
> > > > > > > >
> > > > > > > > Bug:
https://bugs.openjdk.java.net/browse/JDK-8207367
> > > > > > > > Webrev:
http://cr.openjdk.java.net/~dtitov/8207367/webrev.01
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > --Daniil
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>