Re: 4-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)

[email protected] Fri, 07 Nov 2014 11:41:58 -0800

On 11/7/14 2:13 AM, [email protected] wrote:

On 11/7/14 1:59 AM, David Holmes wrote:

On 7/11/2014 7:48 PM, [email protected] wrote:

Hi David,



On 11/6/14 9:18 PM, David Holmes wrote:

Hi Serguei,

I think I get the gist of this approach but I'm not an expert on the
JVM TI or JDWP event model. My main concern would be how the delay to
the completion of cbVMDeath() might impact things - specifically if it
might be a lengthy delay?

1. At the beginning the VirtualMachine comands check ifgdata->vmDead is

true
     and in such case just return with the JDWP_ERROR(VM_DEAD) error or
quietly.
     Normally, the cbVMDeath event callback needs to wait for just one
command.

Please, see the VirtualMachine.c and the following comment in
debugLoop_run():

             } else if (gdata->vmDead &&
              ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) {
                 /* Protect the VM from calls while dead.
                  * VirtualMachine cmdSet quietly ignores some cmds
                  * after VM death, so, it sends it's own errors.
                  */
                 outStream_setError(&out, JDWP_ERROR(VM_DEAD));
             } else {


2. We do not have many choices.
     Without a sync on a command completeness we will continue getting
WRONG_PHASE errors intermittently.
     Another choice is to use already reviewed ignore_wrong_phase
workaround.
     Note, the workaround works Ok not for all the commands.
     I understand, we need to make sure nothing is broken if we choose
this approach. :)

3. What delay would you consider lengthy: 1 sec, 10 sec, 1 min.?

Anything that causes something unexpected to happen :) I'm justlooking at the code and thinking what might go wrong. Really all wecan do is try this and see.


1 min sleep looks too big as it causes timeout failures of some tests.
Launched the nsk.jdi and jtreg com/sun/jdi with 10 sec sleep.
Will see the results tomorrow.

The nsk.jdi and the jtreg com/sun/jdi tests are passed with the 10 secsleep.


Thanks,
Serguei


Thanks!
Serguei


Thanks,
David

     For instance, I can add 10 sec sleep to provoke the command
execution delay and see what can be broken.

With 1 min sleep I see some timeouts in the jtreg com/sun/jditests

though which is probably Ok.

Thanks,
Serguei


Thanks,
David

On 7/11/2014 8:27 AM, [email protected] wrote:

Hi reviewers,

I'm suggesting to review a modified fix:

http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.4/

The 3-rd round fix is not right as it caused deadlocks in severaltests

(in nsk.jdi.testlist and  jtreg com/sun/jdi).

Here is a deadlock example:

-----------------  lwp# 2 / thread# 2 --------------------
  ffffffff7e8dc6a4 lwp_cond_wait (100138748, 100138730, 0, 0)
  ffffffff7dcad148 void os::PlatformEvent::park() (100138700, d4788,
d4400, 0, ffffffff7e357440, 100138730) + 100

ffffffff7dc3151c int Monitor::IWait(Thread*,long)(ffffffff7e3c5b98,

100137000, 0, 1004405d0, 6e750, 0) + a4
  ffffffff7dc324d0 bool Monitor::wait(bool,long,bool) (1004405d0,
100137000, 0, 0, 1, 20000000) + 358
  ffffffff7de6c530 int JavaThread::java_suspend_self() (1004405d0,
100137000, 1, deab, 60000000, 100137000) + c8
  ffffffff7da5f478 int JvmtiRawMonitor::raw_wait(long,bool,Thread*)

(10034bdc0, ffffffffffffffff, ffffffff7e3e6bd0, 100137000, 1, 2) +258

  ffffffff7da2284c jvmtiError
JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*,long) (92800, 10034bdc0,
ffffffffffffffff, 4, 9aeb0, 100137000) + 8c
  ffffffff7aa2f47c debugMonitorWait (ffffffff7ab3ba10, c28, c00,
ffffffff7ab3ad18, ffffffff7ab3ad18, 0) + 3c

ffffffff7aa1c804 enqueueCommand (10034bb90, 102c00,ffffffffffefd118,

ffffffff7ab3ad18, 102c00, ffffffff7ab3bd60) + 14c

ffffffff7aa1e23c eventHelper_reportEvents (d8, 100135d70, 2, 1,1, 2)

+ 10c
  ffffffff7aa181f8 reportEvents (1001371f8, 0, 0, 14, 100135d70, 0) +
138
  ffffffff7aa187b8 event_callback (1001371f8, ffffffff7b0ffa88,

ffffffff7aa23150, ffffffff7aa376a0, ffffffff7ab3ad18, 100441ad0) +360

  ffffffff7aa1b870 cbVMDeath (800, 1001371f8, ffffffff7aa37c48,
ffffffff7ab3ad18, 1018, 1000) + 1d8

ffffffff7da3635c void JvmtiExport::post_vm_death() (1ffc,100137000,

ffffffff7e3e8b30, ffffffff7e357440, 1, 10010cf30) + 534
  ffffffff7d7bb104 void before_exit(JavaThread*) (100137000,
ffffffff7e392350, ffffffff7e3fb938, 6ed99, ffffffff7e357440,
ffffffff7e3e6b70) + 30c
  ffffffff7de72128 bool Threads::destroy_vm() (100137000, 100110a40,
ffffffff7e3f22f4, ffffffff7e3e6ab0, ffffffff7e357440, 30000000) + 100

ffffffff7d8d0664 jni_DestroyJavaVM (100137000, 1ffc,ffffffff7e3e8b30,

ffffffff7e357440, 0, 10013700) + 1bc
  ffffffff7ee08680 JavaMain (ffffffff7e3da790, 0, ffffffff7e3da790,
10035de68, 0, ffffffff7e4143b0) + 860
  ffffffff7e8d8558 _lwp_start (0, 0, 0, 0, 0, 0)

-----------------  lwp# 12 / thread# 12 --------------------
  ffffffff7e8dc6a4 lwp_cond_wait (100349948, 100349930, 0, 0)
  ffffffff7dcad148 void os::PlatformEvent::park() (100349900, d4788,
d4400, 0, ffffffff7e357440, 100349930) + 100

ffffffff7da5f010 int JvmtiRawMonitor::raw_enter(Thread*)(10034a070,

100348800, a, ffffffff7e3de340, 1, ffffffff7e115ff4) + 258
  ffffffff7da22450 jvmtiError
JvmtiEnv::RawMonitorEnter(JvmtiRawMonitor*) (ffffffff7ea05a00,
10034a070, 1c7, 100348800, ffffffff7e357440, 4) + a0
  ffffffff7aa2f288 debugMonitorEnter (10034a070, c18, c00,
ffffffff7ab3ad28, ffffffff7ab3b940, 0) + 38
  ffffffff7aa14134 debugLoop_run (ffffffff7ab3b940, 1000,
ffffffff7ab3ad28, ffffffff7aa360d0, ffffffff5b2ff718, c18) + 11c
  ffffffff7aa2a4f8 connectionInitiated (ffffffff5b504010, 1358, 1000,
ffffffff7ab3ad28, 1, ffffffff7ab3c080) + e0
  ffffffff7aa2a7d4 attachThread (ffffffffffefee48, 101000,
ffffffff5b504010, ffffffff7ab3ad28, 0, 10000000) + 54
  ffffffff7da56b18 void JvmtiAgentThread::call_start_function()
(100348800, ffffffff7e3e8b38, 916f0, ffffffff7e357440, 10034880, 1) +
128

ffffffff7de6a678 void JavaThread::thread_main_inner()(100348800, 3d8,

1003497f8, 100349420, ffffffff5b2ff9f8, 0) + 90
  ffffffff7de6a5b4 void JavaThread::run() (100348800, 100349442, c,
fffffffea5f3e048, 3d8, 1003497f8) + 3ac
  ffffffff7dc9f2e4 java_start (ca800, 100348800, ca904,
ffffffff7e16ff31, ffffffff7e357440, 4797) + 2e4
  ffffffff7e8d8558 _lwp_start (0, 0, 0, 0, 0, 0)

-----------------  lwp# 13 / thread# 13 --------------------
  ffffffff7e8dc6a4 lwp_cond_wait (10034d348, 10034d330, 0, 0)
  ffffffff7dcad148 void os::PlatformEvent::park() (10034d300, d4788,
d4400, 0, ffffffff7e357440, 10034d330) + 100
  ffffffff7da5eac8 int JvmtiRawMonitor::SimpleWait(Thread*,long)
(10034bed0, 10034c000, ffffffffffffffff, 241000, 0, 10034c000) + 100
  ffffffff7da5f300 int JvmtiRawMonitor::raw_wait(long,bool,Thread*)

(10034bed0, ffffffffffffffff, 1, 10034c000, ffffffff7e357440,10034c000)

+ e0
  ffffffff7da2284c jvmtiError
JvmtiEnv::RawMonitorWait(JvmtiRawMonitor*,long) (92800, 10034bed0,
ffffffffffffffff, 4, 9aeb0, 10034c000) + 8c
  ffffffff7aa2f47c debugMonitorWait (ffffffff7ab3ba10, c28, c00,
ffffffff7ab3ad18, ffffffff7ab3b940, 0) + 3c
  ffffffff7aa1d838 doBlockCommandLoop (800, 1038, ffffffff7ab3ad18,
1000, ffffffff7ab3ad18, ffffffff7ab3bd60) + 48

ffffffff7aa1da3c commandLoop (c28, 10034c1f8, c00,ffffffff7ab3ad18,

0, 10000000) + ac
  ffffffff7da56b18 void JvmtiAgentThread::call_start_function()
(10034c000, ffffffff7e3e8b38, 916f0, ffffffff7e357440, 10034c00, 1) +
128

ffffffff7de6a678 void JavaThread::thread_main_inner()(10034c000, 3d8,

10034cfe8, 10034cc10, ffffffff5b0ffbf8, 0) + 90
  ffffffff7de6a5b4 void JavaThread::run() (10034c000, 10034cc28, d,
fffffffea5f3e290, 3d8, 10034cfe8) + 3ac
  ffffffff7dc9f2e4 java_start (ca800, 10034c000, ca904,
ffffffff7e16ff31, ffffffff7e357440, 181a) + 2e4
  ffffffff7e8d8558 _lwp_start (0, 0, 0, 0, 0, 0)


The details:
   - Thread #2:   The cbVMDeath() event handler is waiting on the
commandCompleteLock in the enqueueCommand().
      The call chain is:
         cbVMDeath() -> event_callback() -> reportEvents() ->
eventHelper_reportEvents() -> enqueueCommand().

The enqueueCommand() depends on the commandLoop() that hasto call

      completeCommand(command) for the command being enqueued.
      This has not been set yet:  gdata->vmDead = JNI_TRUE

   - Thread #12: The debugLoop_run blocked on the vmDeathLock enter

- Thread #13: The commandLoop is waiting on theblockCommandLoopLock

in the doBlockCommandLoop().
      It is because blockCommandLoop == JNI_TRUE which is set in the
needBlockCommandLoop()
      if the following condition is true:
          (cmd->commandKind == COMMAND_REPORT_EVENT_COMPOSITE &&
           cmd->u.reportEventComposite.suspendPolicy ==
JDWP_SUSPEND_POLICY(ALL))


It seems, the debugLoop_run() block on the vmDeathLock causes the
commandLoop() to wait indefinitely.

The cbVMDeath() can not proceed because the commandLoop() does notmake

a progress.

The vmDeathLock critical section in the cbVMDeath() event callbackseems

to be an overkill (unnecessary).

A less intrusive synchronization is required here which is to waituntil

the current command is completed
before returning to the JvmtiExport::post_vm_death().

The new approach (see new webrev) is to extend the resumeLock
synchronization pattern
to all VirtualMachine set of commands, not only the resume command.
The resumeLock name is replaced with the vmDeathLock to reflect new
semantics.

In general, we could consider to do the same for the rest of the JDWP
command sets.
But it is better to be careful and see how this change goes first.


Thanks,
Serguei


On 11/5/14 2:27 AM, [email protected] wrote:

Hi David,

Thank you for the concerns!
Testing showed several tests failing with deadlocks.
Scenarios are similar to that you describe.

Trying to understand the details.

Thanks,
Serguei

On 11/4/14 4:09 PM, David Holmes wrote:

Hi Serguei,

On 3/11/2014 5:07 PM, [email protected] wrote:

On 11/2/14 8:58 PM, David Holmes wrote:

On 1/11/2014 8:13 PM, Dmitry Samersoff wrote:
Serguei,
Thank you for good finding. This approach looks much betterfor me.
The fix looks good.

Is it necessary to release vmDeathLock locks at
eventHandler.c:1244 before call

EXIT_ERROR(error,"Can't clear event callbacks on vm death"); ?
I agree this looks necessary, or at least more clean (ifthings are
failing we really don't know what is happening).


Agreed (replied to Dmitry).


More generally I'm concerned about whether any of the code paths
taken
while holding the new lock can result in deadlock - in particular
with
regard to the resumeLock ?


The cbVMDeath() function never holds both vmDeathLock and
resumeLock at
the same time,

so there is no chance for a deadlock that involves both theselocks.


Two more locks used in the cbVMDeath() are the callbackBlock and
callbackLock.
These two locks look completely unrelated to the debugLoop_run().

The debugLoop_run() function also uses the cmdQueueLock.

The debugLoop_run() never holds both vmDeathLock andcmdQueueLock at

the
same time.

So that I do not see any potential to introduce new deadlockwith the

vmDeathLock.

However, it is still easy to overlook something here.
Please, let me know if you see any danger.

I was mainly concerned about what might happen in the call chainforthreadControl_resumeAll() (it certainly sounds like it mightneed to

use a resumeLock :) ). I see direct use of the threadLock and

indirectly the eventHandler lock; but there are further callpaths I

did not explore. Wish there was an easy way to determine the
transitive closure of all locks used from a given call.

Thanks,
David

Thanks,
Serguei


David

-Dmitry



On 2014-11-01 00:07, [email protected] wrote:


It is 3-rd round of review for:
https://bugs.openjdk.java.net/browse/JDK-6988950

New webrev:

http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/






Summary

   For failing scenario, please, refer to the 1-st round RFR
below.

   I've found what is missed in the jdwp agent shutdown and
decided to
switch from a workaround to a real fix.

   The agent VM_DEATH callback sets the gdata field:
gdata->vmDead = 1.

The agent debugLoop_run() has a guard against the VMshutdown:


  165             } else if (gdata->vmDead &&
  166              ((cmd->cmdSet) !=
JDWP_COMMAND_SET(VirtualMachine))) {
  167                 /* Protect the VM from calls while dead.
  168                  * VirtualMachine cmdSet quietly ignores
some
cmds
  169                  * after VM death, so, it sends it's own
errors.
  170                  */
  171 outStream_setError(&out,
JDWP_ERROR(VM_DEAD));


   However, the guard above does not help much if the VM_DEATH
event
happens in the middle of a command execution.
   There is a lack of synchronization here.

   The fix introduces new lock (vmDeathLock) which does not
allow to
execute the commands
   and the VM_DEATH event callback concurrently.
   It should work well for any function that is used in
implementation of
the JDWP_COMMAND_SET(VirtualMachine) .


Testing:

Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREGcom/sun/jdi

tests


Thanks,
Serguei


On 10/29/14 6:05 PM, [email protected] wrote:

The updated webrev:

http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/






The changes are:
   - added a comment recommended by Staffan
   - removed the ignore_wrong_phase() call from function
classSignature()

The classSignature() function is called in 16 places.
Most of them do not tolerate the NULL in place of returned
signature
and will crash.

I'm not comfortable to fix all the occurrences now andsuggest to

return to this

issue after gaining experience with more failure cases thatare

still
expected.

The failure with the classSignature() involved was observedonly

once
in the nightly
and should be extremely rare reproducible.
I'll file a placeholder bug if necessary.

Thanks,
Serguei

On 10/28/14 6:11 PM, [email protected] wrote:

Please, review the fix for:
https://bugs.openjdk.java.net/browse/JDK-6988950


Open webrev:

http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/







Summary:

    The failing scenario:
      The debugger and the debuggee are well aware a VM
shutdown has
been started in the target process.
      The debugger at this point is not expected to send any
commands
to the JDWP agent.

However, the JDI layer (debugger side) and the jdwpagent

(debuggee side)
      are not in sync with the consumer layers.

      One reason is because the test debugger does not invoke
the JDI
method VirtualMachine.dispose().
      Another reason is that the Debugger and the debuggee
processes
are uneasy to sync in general.

      As a result the following steps are possible:

- The test debugger sends a 'quit' command to thetest

debuggee
        - The debuggee is normally exiting

- The jdwp backend reports (over the jdwpprotocol) an

anonymous class unload event
        - The JDI InternalEventHandler thread handles the
ClassUnloadEvent event

- The InternalEventHandler wants to uncache thematching

reference type.
          If there is more than one class with the same host
class
signature, it can't distinguish them,

and so, deletes all references and re-retrievesthem

again
(see tracing below):
            MY_TRACE: JDI:
VirtualMachineImpl.retrieveClassesBySignature:
sig=Ljava/lang/invoke/LambdaForm$DMH;
        - The jdwp backend debugLoop_run() gets the command
from JDI
and calls the functions

classesForSignature() and classStatus()recursively.

        - The classStatus() makes a call to the JVMTI
GetClassStatus()
and gets the JVMTI_ERROR_WRONG_PHASE

- As a result the jdwp backend reports the JVMTIerror

to the
JDI, and so, the test fails

      For details, see the analysis in bug report closed as a
dup of
the bug 6988950:
https://bugs.openjdk.java.net/browse/JDK-8024865

      Some similar cases can be found in the two bug reports
(6988950
and 8024865) describing this issue.

The fix is to skip reporting theJVMTI_ERROR_WRONG_PHASE

error
as it is normal at the VM shutdown.
      The original jdwp backend implementation had a similar
approach
for the raw monitor functions.
      Threy use the ignore_vm_death() to workaround the
JVMTI_ERROR_WRONG_PHASE errors.

For reference, please, see the file:src/share/back/util.c



Testing:

Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREGcom/sun/jdi

tests


Thanks,
Serguei

Re: 4-nd round RFR (XS) 6988950: JDWP exit error JVMTI_ERROR_WRONG_PHASE(112)

Reply via email to