On 5/8/16 03:58, serguei.spit...@oracle.com wrote:
Severin,
The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a
NullPointerException.
The log link is (you can also find the most relevant fragment of the
log in the bug report):
http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log
Sorry for providing the link that is not accessible for you.
But, please, find a fragment of the failure log in my bug report comment.
Thanks,
Serguei
Please, find the email notification in the attachment.
Thanks,
Serguei
On 5/8/16 00:59, serguei.spit...@oracle.com wrote:
Hi Severin,
I filed two new bugs to cover the discovered issues:
8156498: more places in the invoke.c that need protection with the
invokerLock
8156500: deadlock provoked by new stress test
com/sun/jdi/OomDebugTest.java
Will try to push your two fixes today or tomorrow.
I know, you've just got a committer status.
But I'm not sure you are comfortable to push the fixes yourself at
this time.
Thanks,
Serguei
On 5/3/16 03:21, serguei.spit...@oracle.com wrote:
Hi Severin,
Please, find my comments below.
On 5/2/16 01:44, Severin Gehwolf wrote:
On Fri, 2016-04-29 at 12:33 -0700, serguei.spit...@oracle.com wrote:
On 4/29/16 01:56, Severin Gehwolf wrote:
Hi Serguei,
On Fri, 2016-04-29 at 01:34 -0700, serguei.spit...@oracle.com wrote:
Hi Severin,
The fix looks good in general.
I'm testing both fixes together at the moment.
That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.
A couple of questions...
It seems, there are more places where an invokerLock critical
section is missed.
Right.
The following functions:
- invoker_enableInvokeRequests
This should be fixed with the patch for JDK-8154529
- invokeConstructor
- invokeStatic
- invokeNonvirtual
- invokeVirtual
- saveGlobalRef
Correct. The intent would be to fix the callsites of
saveGlobalRef. If
we fix invoke* then saveGlobalRef should not be an issue. I
didn't want
to include this in this fix since it's pretty hairy and would
like to
fix this in incremental steps.
The first function is easy to fix.
The last 5 functions are called from the invoker_doInvoke() that
we already had a problem with.
I'm puzzled with the question: How to synchronize and avoid
deadlocks at the same time?
I'm not sure yet. Perhaps locking only while saveGlobalRef is being
called in invoke* functions will help.
I'm inclined to let your fix go (if the testing is Ok) and file
one more bug on the remaining sync issues.
Please keep me in the loop about your test results.
Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are
all passed.
I also ran the 4 previously failed tests in big loops of 1000 runs:
com/sun/jdi/InvokeTest.java
com/sun/jdi/InvokeHangTest.java
com/sun/jdi/InterfaceMethodsTest.java
com/sun/jdi/OomDebugTest.java (new test introduced in the
webrev)
I suggest to run InvokeTest, InvokeHangTest and
InterfaceMethodsTest in
a loop. Those never failed for me in such a scenario.
The OomDebugTest.java failed with a timeout (most likely, a
deadlock).
Please, find the OomDebugTest.jtr file in attachments.
Correct. This is what I was seeing. See the last comment of the bug:
https://bugs.openjdk.java.net/browse/JDK-8153711
Need to check if it is the same as I'm seeing.
It has the jstack output of the hung OomDebugTestTarg JVM. I'm not
convinced this is the same failure we were seeing in JDK-4858370 since
the stack suggests it's doing a GC upon a newInstance of a primitive
array. It also does not seem like the same issue as the deadlocks
exposed by locking during an invoke, because it was reproducible
fairly
consistently with InvokeHangTest.
Agreed.
It looks to me like a new issue. Probably one which was there before,
but is only exposed by the new test. The new test stress-tests the GC
with a debugger attached. Of course, this is going to be hard to prove
since it would just run out of memory prior the patch.
Thanks.
Your analysis seems to be reasonable.
I tend to agree with it but need more time to convince myself.
Thoughts?
Let me double check the above.
If the analysis is correct then I'd suggest to file new bug and push
your fix as is.
Thanks,
Serguei
Cheers,
Severin
Thanks,
Serguei
Thanks for your help!
Cheers,
Severin
Thanks,
Serguei
On 4/28/16 02:00, Severin Gehwolf wrote:
On Tue, 2016-04-19 at 19:32 -0700, serguei.spit...@oracle.com
wrote:
Hi Severin,
I postpone a push for this fix.
There are two nsk.jdi test failures (they look like deadlocks):
nsk/jdi/ObjectReference/invokeMethod/invokemethod012
FAIL(TIMEOUT)
nsk/jdi/Scenarios/invokeMethod/popframes001 FAIL(TIMEOUT)
and one jtreg sun/com/jdi failure (it looks like a deadlock
too):
com/sun/jdi/InvokeHangTes.java
I'll double check if these failures are regressions caused by
your fix
or not.
I confirm, the failures above are new regressions introduced
by the fix.
The tests fail consistently with the fix and do not fail
without the fix.
OK this was caused by the locking done in invoker_doInvoke().
Note that
holding either of them invoker lock or event handler lock
causes this.
Here is a new webrev with those hunks removed. It's sufficient
to grab
the locks again in invoke_completeInvokeRequest() when clearing
the
global references in order to not get those failures we've seen
when
the fix for JDK-4858370 was in place.
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/
Testing done:
- com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java
and
sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.
- regular com/sun/jdi test set: no regressions
Note that there are some rare cases where OomDebugTest times
out which
seems to be caused by the GC, though. See the bug for details.
Since
this problem is rare, it's still worthwhile having that test
included.
If it turns out a problem in practice we could consider
disabling the
test.
Thoughts?
Cheers,
Severin