I've updated the webrev - no additional synchronisation is required. Please, review: http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.04
-JB- On 10/31/2012 01:59 PM, Jaroslav Bachorik wrote: > On 10/30/2012 05:10 PM, Eamonn McManus wrote: >> This area has historically caused a lot of problems and I am not >> surprised to see that there are more. While I don't know what the best >> way to fix the issue at hand is, I don't think this proposed change is >> it. The reason is that the checkConnection and gotIOException methods >> do blocking operations, and it is generally not a good idea to do >> blocking operations in a synchronized block. Is there a way to avoid >> the race condition without that? > > The important part is calling the gotIOException() method even from the > heart-beat checker. I've tried to return the synchronization block back > to the original state and the test passes with the check period of 10ms > which pushes the probability of data races rather high. > > It seems that the worst that can happen would be one additional > checkConnection() call - in case when the state gets set to TERMINATED > by another thread right after it has been checked in the synchronized > block the loop condition might evaluate to true if the state value has > not been flushed yet. > > I could change the "state" variable to be volatile but I am not sure > whether it's worth the hassle. > > -JB- > >> >> Éamonn >> >> >> 2012/10/29 Jaroslav Bachorik <[email protected]>: >>> I am looking for a sponsor and reviewers. >>> >>> The webrev is available at >>> http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.03 >>> >>> As explained in the issue the failure is caused by the RMI connection >>> heart-beat thread racing against the thread executing the MBean >>> operation and encountering the IOException. The heart beat thread sets >>> the the admin state to "terminated" but does not send the failure >>> notifications. On the other side the operation thread determines the >>> state to be already terminated and skips the notifications as well. >>> >>> The fix adds the call to handle an ioexception, including sending the >>> failure notifications, to the hear-beat connection failure handler. Also >>> it widens the synchronized block since the whole code block checking for >>> the connection failure and recovering must be run atomically, >>> >>> >>> Thanks, >>> >>> -JB- >
