Serguei, Thank you for good finding. This approach looks much better for me.
The fix looks good. Is it necessary to release vmDeathLock locks at eventHandler.c:1244 before call EXIT_ERROR(error,"Can't clear event callbacks on vm death"); ? -Dmitry On 2014-11-01 00:07, serguei.spit...@oracle.com wrote: > > It is 3-rd round of review for: > https://bugs.openjdk.java.net/browse/JDK-6988950 > > New webrev: > > http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.3/ > > > Summary > > For failing scenario, please, refer to the 1-st round RFR below. > > I've found what is missed in the jdwp agent shutdown and decided to > switch from a workaround to a real fix. > > The agent VM_DEATH callback sets the gdata field: gdata->vmDead = 1. > The agent debugLoop_run() has a guard against the VM shutdown: > > 165 } else if (gdata->vmDead && > 166 ((cmd->cmdSet) != JDWP_COMMAND_SET(VirtualMachine))) { > 167 /* Protect the VM from calls while dead. > 168 * VirtualMachine cmdSet quietly ignores some cmds > 169 * after VM death, so, it sends it's own errors. > 170 */ > 171 outStream_setError(&out, JDWP_ERROR(VM_DEAD)); > > > However, the guard above does not help much if the VM_DEATH event > happens in the middle of a command execution. > There is a lack of synchronization here. > > The fix introduces new lock (vmDeathLock) which does not allow to > execute the commands > and the VM_DEATH event callback concurrently. > It should work well for any function that is used in implementation of > the JDWP_COMMAND_SET(VirtualMachine) . > > > Testing: > Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests > > > Thanks, > Serguei > > > On 10/29/14 6:05 PM, serguei.spit...@oracle.com wrote: >> The updated webrev: >> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.2/ >> >> >> The changes are: >> - added a comment recommended by Staffan >> - removed the ignore_wrong_phase() call from function classSignature() >> >> The classSignature() function is called in 16 places. >> Most of them do not tolerate the NULL in place of returned signature >> and will crash. >> I'm not comfortable to fix all the occurrences now and suggest to >> return to this >> issue after gaining experience with more failure cases that are still >> expected. >> The failure with the classSignature() involved was observed only once >> in the nightly >> and should be extremely rare reproducible. >> I'll file a placeholder bug if necessary. >> >> Thanks, >> Serguei >> >> On 10/28/14 6:11 PM, serguei.spit...@oracle.com wrote: >>> Please, review the fix for: >>> https://bugs.openjdk.java.net/browse/JDK-6988950 >>> >>> >>> Open webrev: >>> http://cr.openjdk.java.net/~sspitsyn/webrevs/2014/jdk/6988950-JDWP-wrong-phase.1/ >>> >>> >>> >>> Summary: >>> >>> The failing scenario: >>> The debugger and the debuggee are well aware a VM shutdown has >>> been started in the target process. >>> The debugger at this point is not expected to send any commands >>> to the JDWP agent. >>> However, the JDI layer (debugger side) and the jdwp agent >>> (debuggee side) >>> are not in sync with the consumer layers. >>> >>> One reason is because the test debugger does not invoke the JDI >>> method VirtualMachine.dispose(). >>> Another reason is that the Debugger and the debuggee processes >>> are uneasy to sync in general. >>> >>> As a result the following steps are possible: >>> - The test debugger sends a 'quit' command to the test debuggee >>> - The debuggee is normally exiting >>> - The jdwp backend reports (over the jdwp protocol) an >>> anonymous class unload event >>> - The JDI InternalEventHandler thread handles the >>> ClassUnloadEvent event >>> - The InternalEventHandler wants to uncache the matching >>> reference type. >>> If there is more than one class with the same host class >>> signature, it can't distinguish them, >>> and so, deletes all references and re-retrieves them again >>> (see tracing below): >>> MY_TRACE: JDI: >>> VirtualMachineImpl.retrieveClassesBySignature: >>> sig=Ljava/lang/invoke/LambdaForm$DMH; >>> - The jdwp backend debugLoop_run() gets the command from JDI >>> and calls the functions >>> classesForSignature() and classStatus() recursively. >>> - The classStatus() makes a call to the JVMTI GetClassStatus() >>> and gets the JVMTI_ERROR_WRONG_PHASE >>> - As a result the jdwp backend reports the JVMTI error to the >>> JDI, and so, the test fails >>> >>> For details, see the analysis in bug report closed as a dup of >>> the bug 6988950: >>> https://bugs.openjdk.java.net/browse/JDK-8024865 >>> >>> Some similar cases can be found in the two bug reports (6988950 >>> and 8024865) describing this issue. >>> >>> The fix is to skip reporting the JVMTI_ERROR_WRONG_PHASE error >>> as it is normal at the VM shutdown. >>> The original jdwp backend implementation had a similar approach >>> for the raw monitor functions. >>> Threy use the ignore_vm_death() to workaround the >>> JVMTI_ERROR_WRONG_PHASE errors. >>> For reference, please, see the file: src/share/back/util.c >>> >>> >>> Testing: >>> Run nsk.jdi.testlist, nsk.jdwp.testlist and JTREG com/sun/jdi tests >>> >>> >>> Thanks, >>> Serguei >>> >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources.