On 08/29/2014 11:25 AM, Daniel Fuchs wrote:
Hi Jaroslav,
I am not sure to understand how this solves the problem.
The old code first checked the connection, and if that failed,
sent the FAILED notification, closed the connector, and rethrew
the exception.
This problem seems to have something to do with the way RMI works - the
customer had problems with one set of ties/stubs while the other set of
ties/stubs worked just fine. Seems like in cases of transient network
failures the connection check was not reliable.
The new code directly throws the exception without
checking the connection, and therefore without closing
the connection and sending the FAILED notification.
It only does so for the cases where the connection itself is not the
culprit - error while executing the method on the server, marshalling
problems etc.
So is the fix a change of behavior by which the RMIConnector
will - in some cases - not try to autoclose the connection but
instead simply wait for the caller to explicitely call close()?
Not really - the change is in relying on the RMI providing the
information whether the connection is still usable or not. The code
didn't autoclose the connection when "connection.getDefaultDomain(null)"
didn't throw IOException either.
I'd be interested to hear what Shanliang has to say...
Yep. The code does a lot of things at once and without any spec for
handling failures and recovery we can only rely on the tests.
-JB-
best regards,
-- daniel
On 8/28/14 5:57 PM, Jaroslav Bachorik wrote:
I have taken over this issue from Poonam since she will be unavailable
for the next month or so.
Could I have reviews for this change:
Bug: https://bugs.openjdk.java.net/browse/JDK-8049303
Webrev: http://cr.openjdk.java.net/~jbachorik/8049303/webrev.00
Problem and fix:
By default the JMX client side notification fetch timeout
(jmx.remote.x.notification.fetch.timeout) is 1 minute and the default
server connection timeout (jmx.remote.x.server.connection.timeout) is 2
minutes.
If the client side connector thread makes a notification fetch request
to the server, but a transient network problem prevents the server
response from reaching the client, the client side connector will wait
for a response until the timeout period (1 minute) has expired before
throwing an IOException.
The client side RMIConnector implementation handles the IOException, by
re-checking the connection status to understand whether or not it is
broken. If the connection is not available at that moment, the connector
fails by re-throwing the initial IOException. The problem is that this
re-check of the connection passes because the server side of the
connection doesn't time out until 2 minutes has passed (by default), so
the NotifFetcher thread
dies without posting a failed notification, and the client application
does not get a chance to recover.
The fix is to forward the non connection-related exceptions on the JMX
client side instead of checking the connection status. The
connection-related exceptions will cause closing the session as an
unsuccessful connection check would have done.
Testing:
All the jdk_jmx and jdk_management regression tests passed.
All the related JCK tests passed.
The fix applies cleanly to 8u and 7u repos.
Thanks,
-JB-