On 5/8/16 03:58, serguei.spit...@oracle.com wrote:
Severin,

The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a NullPointerException. The log link is (you can also find the most relevant fragment of the log in the bug report):
http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log

Sorry for providing the link that is not accessible for you.
But, please, find a fragment of the failure log in my bug report comment.


Thanks,
Serguei


Please, find the email notification in the attachment.

Thanks,
Serguei


On 5/8/16 00:59, serguei.spit...@oracle.com wrote:
Hi Severin,

I filed two new bugs to cover the discovered issues:
8156498: more places in the invoke.c that need protection with the invokerLock 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java


Will try to push your two fixes today or tomorrow.
I know, you've just got a committer status.
But I'm not sure you are comfortable to push the fixes yourself at this time.

Thanks,
Serguei



On 5/3/16 03:21, serguei.spit...@oracle.com wrote:
Hi Severin,

Please, find my comments below.


On 5/2/16 01:44, Severin Gehwolf wrote:
On Fri, 2016-04-29 at 12:33 -0700, serguei.spit...@oracle.com wrote:
On 4/29/16 01:56, Severin Gehwolf wrote:
Hi Serguei,

On Fri, 2016-04-29 at 01:34 -0700, serguei.spit...@oracle.com wrote:
Hi Severin,

The fix looks good in general.
I'm testing both fixes together at the moment.
That is JDK-8154529 and JDK-8153711? Yes, that's what I've done too.

A couple of questions...

It seems, there are more places where an invokerLock critical section is missed.
Right.

The following functions:
   - invoker_enableInvokeRequests
This should be fixed with the patch for JDK-8154529

   - invokeConstructor
   - invokeStatic
   - invokeNonvirtual
   - invokeVirtual
   - saveGlobalRef
Correct. The intent would be to fix the callsites of saveGlobalRef. If we fix invoke* then saveGlobalRef should not be an issue. I didn't want to include this in this fix since it's pretty hairy and would like to
fix this in incremental steps.

The first function is easy to fix.
The last 5 functions are called from the invoker_doInvoke() that we already had a problem with. I'm puzzled with the question: How to synchronize and avoid deadlocks at the same time?
I'm not sure yet. Perhaps locking only while saveGlobalRef is being
called in invoke* functions will help.

I'm inclined to let your fix go (if the testing is Ok) and file one more bug on the remaining sync issues.
Please keep me in the loop about your test results.
Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are all passed.

I also ran the 4 previously failed tests in big loops of 1000 runs:
     com/sun/jdi/InvokeTest.java
     com/sun/jdi/InvokeHangTest.java
     com/sun/jdi/InterfaceMethodsTest.java
com/sun/jdi/OomDebugTest.java (new test introduced in the webrev)
I suggest to run InvokeTest, InvokeHangTest and InterfaceMethodsTest in
a loop. Those never failed for me in such a scenario.

The OomDebugTest.java failed with a timeout (most likely, a deadlock).
Please, find the OomDebugTest.jtr file in attachments.
Correct. This is what I was seeing. See the last comment of the bug:
https://bugs.openjdk.java.net/browse/JDK-8153711

Need to check if it is the same as I'm seeing.


It has the jstack output of the hung OomDebugTestTarg JVM. I'm not
convinced this is the same failure we were seeing in JDK-4858370 since
the stack suggests it's doing a GC upon a newInstance of a primitive
array. It also does not seem like the same issue as the deadlocks
exposed by locking during an invoke, because it was reproducible fairly
consistently with InvokeHangTest.

Agreed.



It looks to me like a new issue. Probably one which was there before,
but is only exposed by the new test. The new test stress-tests the GC
with a debugger attached. Of course, this is going to be hard to prove
since it would just run out of memory prior the patch.

Thanks.
Your analysis seems to be reasonable.
I tend to agree with it but need more time to convince myself.


  Thoughts?

Let me double check the above.
If the analysis is correct then I'd suggest to file new bug and push your fix as is.


Thanks,
Serguei



Cheers,
Severin



Thanks,
Serguei



Thanks for your help!

Cheers,
Severin

Thanks,
Serguei


On 4/28/16 02:00, Severin Gehwolf wrote:
On Tue, 2016-04-19 at 19:32 -0700, serguei.spit...@oracle.com wrote:
Hi Severin,

I postpone a push for this fix.

There are two nsk.jdi test failures (they look like deadlocks):
nsk/jdi/ObjectReference/invokeMethod/invokemethod012 FAIL(TIMEOUT)
   nsk/jdi/Scenarios/invokeMethod/popframes001 FAIL(TIMEOUT)

and one jtreg sun/com/jdi failure (it looks like a deadlock too):
   com/sun/jdi/InvokeHangTes.java


I'll double check if these failures are regressions caused by your fix
or not.
I confirm, the failures above are new regressions introduced by the fix. The tests fail consistently with the fix and do not fail without the fix.
OK this was caused by the locking done in invoker_doInvoke(). Note that holding either of them invoker lock or event handler lock causes this.

Here is a new webrev with those hunks removed. It's sufficient to grab the locks again in invoke_completeInvokeRequest() when clearing the global references in order to not get those failures we've seen when
the fix for JDK-4858370 was in place.

http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/

Testing done:
- com/sun/jdi/InvokeTest.java com/sun/jdi/InvokeHangTest.java and
    sun/jdi/InterfaceMethodsTest.java does not fail in 1500 runs.
  - regular com/sun/jdi test set: no regressions

Note that there are some rare cases where OomDebugTest times out which seems to be caused by the GC, though. See the bug for details. Since this problem is rare, it's still worthwhile having that test included. If it turns out a problem in practice we could consider disabling the
test.

Thoughts?

Cheers,
Severin




Reply via email to