On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote: > On 01/10/2013 12:53 PM, shanliang wrote: >> Instead to wait GC, you can also to wait the >> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive >> it, then your listener must be removed too. Of course this solution is > > The problem is that the *NotificationForwarder implementations swallow > this kind of notification and just perform the cleanup. No other > listener will ever receive this notification. > > The "unregisterMBean" operation's semantics is not clearly defined. > Intuitively, when unregistering an MBean all the associated listeners > should be gone before the method returns. But this is not the case - > currently the listeners are sanitized some time after the > "unregisterMBean" operation started, eventually. There is no easy way to > notify the API user that the listeners were removed. I am afraid that in > order to resolve these problems new APIs would need to be introduced and > the whole mechanism of delivering notification should be revisited (as > it was planned for JMX 2.0, anyway). > > As for fixing the test - checking the weak references works fine as well > as increasing the timeout. They both can fail when the system is > extremely busy but the GC based solution will be in general faster than > the one with increased timeout.
Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01 > > -JB- > >> implementation dependent, but the test is already implementation dependent. >> >> Shanliang >> >> >> Jaroslav Bachorik wrote: >>> On 01/10/2013 10:05 AM, shanliang wrote: >>> >>>> Jaroslav Bachorik wrote: >>>> >>>>> On 01/09/2013 03:25 PM, shanliang wrote: >>>>> >>>>> >>>>>> Jaroslav Bachorik wrote: >>>>>> >>>>>>> On 01/09/2013 02:44 PM, shanliang wrote: >>>>>>> >>>>>>> >>>>>>>> Let's forget the JMX implementation at first. If an MBean is >>>>>>>> unregistered, a user at client side calls >>>>>>>> "removeNotificationListener" >>>>>>>> on the MBean, what should happen? if the user calls >>>>>>>> "isRegistered" on >>>>>>>> the MBean, what should happen? >>>>>>>> >>>>>>>> I have done 2 tests, I used only one thread: >>>>>>>> >>>>>>>> 1) >>>>>>>> ...... >>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean)); >>>>>>>> >>>>>>>> I got isRegistered = false; >>>>>>>> >>>>>>>> 2) >>>>>>>> ...... >>>>>>>> localServer.unregisterMBean(myMBean); >>>>>>>> System.out.println("isRegistered = >>>>>>>> "+remoteClientServer.sRegistered(myMBean)); >>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener); >>>>>>>> >>>>>>>> I did not get an exception. >>>>>>>> >>>>>>>> The 1) told that the client could know the MBean was unregistered, >>>>>>>> then >>>>>>>> the client should throw an exception for the call of >>>>>>>> "removeNotificationListener" in 2). >>>>>>>> >>>>>>> Yes, but then it would not test the listener leakage as it was >>>>>>> supposed >>>>>>> to test but rather the fact that the client throws the appropriate >>>>>>> exception. The fact that the mbean was unregistered does not >>>>>>> necessarily >>>>>>> mean that the listeners were released. The main problem remains - the >>>>>>> listeners are being cleaned-up asynchronously and the clean-up >>>>>>> process >>>>>>> might race against the other uses of the JMX API. >>>>>>> >>>>>> client.removeNotificationListener is not a right way here to test >>>>>> listener leak, we could use some other ways, for example we keep the >>>>>> listener in a weak reference, then after the mbean is removed, the >>>>>> weak >>>>>> reference should be empty after some time. Another way is like >>>>>> DeadListenerTest does to check whether clean has done at server side: >>>>>> use reflection to get the "listenerMap" at server side and make >>>>>> sure it >>>>>> is empty, but this need to add a private method to the class >>>>>> ClientNotifForwarder. >>>>>> >>>>> There will still be problems with timing. You need either to wait for >>>>> the GC to kick in to clean up the weak ref. And the listenerMap will >>>>> not >>>>> be purged of the unregistered MBean listeners until the notification is >>>>> generated, processed on the ClientNotificationForwarder and >>>>> forwarded to >>>>> the server. So there goes the timing issue again. >>>>> >>>>> The problem is that the "unregisterMBean" operation does not guarantee >>>>> that the listeners have been unregistered at the time it returns. So, >>>>> one way or the other we will need to wait an arbitrary amount of time >>>>> before checking for the memory leak. >>>>> >>>> Yes we need to wait, but you can use a cycle like: >>>> long maxWaitingTime = 3000; >>>> long startTime = System.currentTimeMillis(); >>>> while ( weakReference.get != null >>>> && System.currentTimeMillis() < startTime + >>>> maxWaitingTime) { >>>> System.gc(); >>>> Thread.sleep(100); >>>> System.gc(); >>>> } >>>> >>>> if (weakReference.get != null) { >>>> // failed >>>> } >>>> >>> >>> Still you need an arbitrary timeout which might be reached under extreme >>> conditions making this test to fail intermittently. But I'd say that's >>> the nature of tests for memory leak fixes, due to the unpredictable >>> nature of the GC runs. Unless you take a heap dump and do a reachability >>> analysis you can not be sure whether a reference is dangling somwehwere >>> or it just hasn't been collected yet :/ >>> >>> -JB- >>> >>> >>>> Shanliang >>>> >>>>> -JB- >>>>> >>>>> >>>>> >>>>>> I think we have 3 things to do here: >>>>>> 1) modify the test to not use removeNotificationListener for testing >>>>>> listener leak >>>>>> 2) create a new bug about a client does not throw an exception >>>>>> after an >>>>>> mbean is unregistered >>>>>> 3) create a bug about a client does not throw a same exception as at >>>>>> server side. >>>>>> >>>>>> I will do 2) and 3), if you like you can continue 1), it might need to >>>>>> do fix also in the JMX implementation. >>>>>> >>>>>> Shanliang >>>>>> >>>>>>> >>>>>>> >>>>>>>> The test "DeadListenerTest" got passed in some machines because >>>>>>>> of the >>>>>>>> timeout for waiting a notification. I think its failure just tells >>>>>>>> a new >>>>>>>> bug. >>>>>>>> >>>>>>>> To set a longer timeout just hides the real bug, and the test might >>>>>>>> fail >>>>>>>> again one day if running condition is changed and you might need >>>>>>>> longer >>>>>>>> timeout again. >>>>>>>> >>>>>>> Yes, I agree with you that extending the timeout just lessens the >>>>>>> likelihood of the race condition and does not prevent it. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Shanliang >>>>>>>> >>>>>>>> Jaroslav Bachorik wrote: >>>>>>>> >>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>> >>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a >>>>>>>>>>>> longer >>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and >>>>>>>>>>>> tested >>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and >>>>>>>>>>> -Xcomp >>>>>>>>>>> switch) and verify that the fix makes the test pass. >>>>>>>>>>> >>>>>>>>>>> The ClientNotifForwarder scans the notifications for >>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and >>>>>>>>>>> removes the >>>>>>>>>>> appropriate notification listeners in a separate thread. Thus, >>>>>>>>>>> calling >>>>>>>>>>> "removeNotificationListener" on the main thread is prone to >>>>>>>>>>> racing. >>>>>>>>>>> >>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for >>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes >>>>>>>>>> the >>>>>>>>>> appropriate notification listeners in a separate thread. This is >>>>>>>>>> for a >>>>>>>>>> client connection to do clean if a user never calls >>>>>>>>>> removeNotificationListener. >>>>>>>>>> >>>>>>>>>> But calling directly removeNotificationListener from a client >>>>>>>>>> should >>>>>>>>>> still get exception if the clean has not been done. As I said, if >>>>>>>>>> the >>>>>>>>>> client checked and found the listener was still there, then the >>>>>>>>>> client >>>>>>>>>> sent a request to its server to remove the listener at server >>>>>>>>>> side, >>>>>>>>>> the >>>>>>>>>> server should find that the MBean in question was not registered, >>>>>>>>>> so the >>>>>>>>>> server should throw an exception. The bug might be here. >>>>>>>>>> >>>>>>>>> This won't work. The server side listeners are removed upon >>>>>>>>> receiving >>>>>>>>> the "unregistered" notification which is delivered from the >>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it >>>>>>>>> runs >>>>>>>>> in a separate executor thread). The result is that the attempt to >>>>>>>>> remove >>>>>>>>> the notification listener on the server will succeed as well >>>>>>>>> failing >>>>>>>>> the >>>>>>>>> test subsequently. >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Shanliang >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The timeout you made longer was used to wait a notification >>>>>>>>>>>> which >>>>>>>>>>>> should >>>>>>>>>>>> never arrive. >>>>>>>>>>>> >>>>>>>>>>> Well, it can be used to allow more time to process the >>>>>>>>>>> "unregister" >>>>>>>>>>> notification, too. >>>>>>>>>>> >>>>>>>>>>> When I think more of this I am more inclined to fix the race >>>>>>>>>>> condition. >>>>>>>>>>> An updated webrev will follow. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> To remove a listener from a client side, we did: >>>>>>>>>>>> 1) at client side, check whether it was added in the client side >>>>>>>>>>>> 2) at server side, check whether the MBean in question was >>>>>>>>>>>> registered in >>>>>>>>>>>> the MBeanServer (!!!) >>>>>>>>>>>> 3) at server side, check whether the listener was added. >>>>>>>>>>>> >>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification. >>>>>>>>>>>> Anyway, >>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to >>>>>>>>>>>> unregister an >>>>>>>>>>>> mbean, then any following call (without any time break) to >>>>>>>>>>>> use the >>>>>>>>>>>> mbean >>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" >>>>>>>>>>>> etc. >>>>>>>>>>>> >>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing >>>>>>>>>>>> MBeam, >>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get >>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should >>>>>>>>>>>> create a >>>>>>>>>>>> bug, because both implemented the same interface >>>>>>>>>>>> MBeanServerConnection. >>>>>>>>>>>> >>>>>>>>>>> Yes, it is rather inconsistent. >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Shanliang >>>>>>>>>>>> >>>>>>>>>>>> Jaroslav Bachorik wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Looking for review and a sponsor. >>>>>>>>>>>>> >>>>>>>>>>>>> Webrev at >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> In this issue the timing is the problem. >>>>>>>>>>>>> MBeanServer.unregisterMBean() >>>>>>>>>>>>> fires the "unregister" notification which is sent to the server >>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister" >>>>>>>>>>>>> notification >>>>>>>>>>>>> has not been yet processed at the time of invoking >>>>>>>>>>>>> removeNotificationListener() and the notification listeners >>>>>>>>>>>>> hasn't >>>>>>>>>>>>> been >>>>>>>>>>>>> cleaned up leading to the test failure. >>>>>>>>>>>>> >>>>>>>>>>>>> There is no synchronization between the client and the >>>>>>>>>>>>> server and >>>>>>>>>>>>> such >>>>>>>>>>>>> race condition can occur occasionally. Normally, the >>>>>>>>>>>>> execution is >>>>>>>>>>>>> fast >>>>>>>>>>>>> enough to behave like the "unregister" notification is >>>>>>>>>>>>> processed >>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems >>>>>>>>>>>>> that >>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains >>>>>>>>>>>>> the CPU >>>>>>>>>>>>> enough to make this problem appear. >>>>>>>>>>>>> >>>>>>>>>>>>> There is no proper fix for this - the only thing that work is >>>>>>>>>>>>> waiting a >>>>>>>>>>>>> bit longer in the main thread to give the notification >>>>>>>>>>>>> processing >>>>>>>>>>>>> thread >>>>>>>>>>>>> some time to clean up the listeners. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> >
