The fix is OK for me.
Shanliang
Jaroslav Bachorik wrote:
On 12/18/2012 08:21 PM, shanliang wrote:
It is good to call
gotIOException((IOException)e);
but no need to call in next
restart((IOException)e);
the method "gotIOException" should call "restart" if necessary,
otherwise "restart" may be called 2 times and generated 2 times
"Reconnection notification".
Yes, it seems that the restart() will be called exactly twice or not at
all. I will remove the restart() call at the line 199.
Updated webrev to
http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.05
-JB-
Shanliang
Jaroslav Bachorik wrote:
I've updated the webrev - no additional synchronisation is required.
Please, review:
http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.04
-JB-
On 10/31/2012 01:59 PM, Jaroslav Bachorik wrote:
On 10/30/2012 05:10 PM, Eamonn McManus wrote:
This area has historically caused a lot of problems and I am not
surprised to see that there are more. While I don't know what the best
way to fix the issue at hand is, I don't think this proposed change is
it. The reason is that the checkConnection and gotIOException methods
do blocking operations, and it is generally not a good idea to do
blocking operations in a synchronized block. Is there a way to avoid
the race condition without that?
The important part is calling the gotIOException() method even from the
heart-beat checker. I've tried to return the synchronization block back
to the original state and the test passes with the check period of 10ms
which pushes the probability of data races rather high.
It seems that the worst that can happen would be one additional
checkConnection() call - in case when the state gets set to TERMINATED
by another thread right after it has been checked in the synchronized
block the loop condition might evaluate to true if the state value has
not been flushed yet.
I could change the "state" variable to be volatile but I am not sure
whether it's worth the hassle.
-JB-
Éamonn
2012/10/29 Jaroslav Bachorik <[email protected]>:
I am looking for a sponsor and reviewers.
The webrev is available at
http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.03
As explained in the issue the failure is caused by the RMI connection
heart-beat thread racing against the thread executing the MBean
operation and encountering the IOException. The heart beat thread sets
the the admin state to "terminated" but does not send the failure
notifications. On the other side the operation thread determines the
state to be already terminated and skips the notifications as well.
The fix adds the call to handle an ioexception, including sending the
failure notifications, to the hear-beat connection failure handler.
Also
it widens the synchronized block since the whole code block
checking for
the connection failure and recovering must be run atomically,
Thanks,
-JB-