As we started this discussion I've added the open mailing lists back. :)

On 10/30/14 12:42 PM, Daniel D. Daugherty wrote:
On 10/30/14 1:06 PM, serguei.spit...@oracle.com wrote:
Staffan and Dan,

Do you have anything to say?

It feels like we're suppressing a symptom rather than dealing with
the underlying cause. However, I haven't been close to this bug
for years so my memories are rusty here.

I agree.
It feels that not all the aspects of the shutdown sequence were equally designed in the JDI + jdwp agent. There can be shutdown races between the debugger + JDI and debuggee + jdwp agent. I do not see patterns in the code to recognize the shutdown and bail out gracefully.


One question:

line 1051: if (debugInit_isInitComplete() && error == JVMTI_ERROR_WRONG_PHASE) {
    The debugInit_isInitComplete() check means that we only do
    this suppression in the live phase or later, right? Perhaps
    we should do this only when we are post live phase...

Agreed.
It is exactly the case.
In normal case the debugInit_isInitComplete() returns true after VM_INIT event was received. Some agent flag can enforce to postpone the agent initialization until an Exception event is received. In all cases, the initialization happens in the live phase or later (not sure, it can happen in the dead phase). Encountering the JVMTI WRONG_PHASE error means the VM entered the VM_DEAD phase.
This must be a signal to start an agent shutdown.
At this point, I'm not ready to redesign this in the agent.
This fix is only a workaround for nightly stabilization.


    Maybe a flag set at the beginning of the VMDeath event handler
    would be better.


There is already a global flag: gdata->vmDead
I've already tried to use it, but it did not work for me.
Let me check it more.




Is it Ok to push this?

I'm OK with it, but I'm just one voice...

Thanks!
You raised good points.



Dan, should I count on you as a reviewer?

Yes, I've reviewed the changes at this point.

Ok.



I will also need to backport this to 8u40.

You might want to let this bake for a couple of weeks first...

Sure.


Thanks,
Serguei


Dan



Thanks!
Serguei

On 10/30/14 4:16 AM, Dmitry Samersoff wrote:
Serguei,

Looks good for me!

-Dmitry

On 2014-10-30 04:05, serguei.spit...@oracle.com wrote:
The updated webrev:
http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/


The changes are:
   - added a comment recommended by Staffan
- removed the ignore_wrong_phase() call from function classSignature()

The classSignature() function is called in 16 places.
Most of them do not tolerate the NULL in place of returned signature and
will crash.
I'm not comfortable to fix all the occurrences now and suggest to return
to this
issue after gaining experience with more failure cases that are still
expected.
The failure with the classSignature() involved was observed only once in
the nightly
and should be extremely rare reproducible.
I'll file a placeholder bug if necessary.

Thanks,
Serguei

On 10/28/14 6:11 PM, serguei.spit...@oracle.com wrote:
Please, review the fix for:
   https://bugs.openjdk.java.net/browse/JDK-6988950


Open webrev:
http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/



Summary:

    The failing scenario:
      The debugger and the debuggee are well aware a VM shutdown has
been started in the target process.
      The debugger at this point is not expected to send any commands
to the JDWP agent.
      However, the JDI layer (debugger side) and the jdwp agent
(debuggee side)
      are not in sync with the consumer layers.

      One reason is because the test debugger does not invoke the JDI
method VirtualMachine.dispose().
      Another reason is that the Debugger and the debuggee processes
are uneasy to sync in general.

      As a result the following steps are possible:
- The test debugger sends a 'quit' command to the test debuggee
        - The debuggee is normally exiting
        - The jdwp backend reports (over the jdwp protocol) an
anonymous class unload event
        - The JDI InternalEventHandler thread handles the
ClassUnloadEvent event
        - The InternalEventHandler wants to uncache the matching
reference type.
          If there is more than one class with the same host class
signature, it can't distinguish them,
          and so, deletes all references and re-retrieves them again
(see tracing below):
            MY_TRACE: JDI:
VirtualMachineImpl.retrieveClassesBySignature:
sig=Ljava/lang/invoke/LambdaForm$DMH;
        - The jdwp backend debugLoop_run() gets the command from JDI
and calls the functions
          classesForSignature() and classStatus() recursively.
- The classStatus() makes a call to the JVMTI GetClassStatus()
and gets the JVMTI_ERROR_WRONG_PHASE
        - As a result the jdwp backend reports the JVMTI error to the
JDI, and so, the test fails

      For details, see the analysis in bug report closed as a dup of
the bug 6988950:
         https://bugs.openjdk.java.net/browse/JDK-8024865

      Some similar cases can be found in the two bug reports (6988950
and 8024865) describing this issue.

The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error as
it is normal at the VM shutdown.
      The original jdwp backend implementation had a similar approach
for the raw monitor functions.
      Threy use the ignore_vm_death() to workaround the
JVMTI_ERROR_WRONG_PHASE errors.
      For reference, please, see the file: src/share/back/util.c


Testing:
Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests


Thanks,
Serguei





Reply via email to