Eamonn,

Thanks for the review! I didn't know about ReferenceQueues so that's
a very interesting idea. I guess I need to get out of the VM codebase
a little more often... :-)

Jumping ahead in the e-mail thread, Jeremy Manson from Google has
offered the use of their fix for the problem. It looks like it uses
ReferenceQueues so I'll be taking a closer look at that today also.

More below...

On 6/11/2010 10:36 AM, Eamonn McManus wrote:
I think an alternative approach to the one here would be to use a global ReferenceQueue and a subclass of WeakReference that has a pointer back to the Logger or LogNode that contains this WeakReference. Then, in the cases where you currently check for stale entries, you could simply poll the ReferenceQueue and visit any Logger or LogNode that it is telling you has a cleared WeakReference.

ReferenceQueue is a very cool class.


There are a couple of subtle points, though. First, you'll need to be careful about synchronization before clearing WeakReferences, since the Logger you're being told about isn't necessarily related to the current one. Logger.kids is protected by a global lock treeLock but LogManager.loggers is protected by the LogManager's lock, so when logManager1.addLogger detects that a logger belonging to logManager2 has a stale WeakReference, it needs to drop the logManager1 lock and acquire the logManager2 one before scanning the reference list. (This means that addLogger has to change from being a synchronized method to have a big synchronized(this) block with a non-synchronized tail-end.)

So you're proposing using ReferenceQueues for all of the stuff
that uses WeakReferences in the Logging API and not just for the
two leaks that I've found. Consistency is a good idea so I'll
keep that in mind also.


Second, since apparently the user can introduce loops in the Logger hierarchy, the reference from the WeakReference subclass back to the parent Logger will itself need to be a WeakReference.

This subtle point is a little too subtle for my brain right now, but
I'll try to keep it in mind also. Since I've got an VM/NSK test that
blows up when the loops aren't accounted for, I'll have a reminder.


Nevertheless, I think this approach is likely to be less complicated than the proposed one, notably because there's no need to protect against infinite recursion; and the performance concerns should be less because you only look for cleared WeakReferences when you know they are there, and you know where to look.

I don't think I'll be able to convince anyone that a fix using
ReferenceQueues is less complicated. The fix I wrote is a pretty
simple, big hammer approach. ReferenceQueues, by your own proposal,
introduce subtleties that are less than obvious.

However, your proposal to use ReferenceQueues is a good idea and I'll
investigate it today. It helps that Jeremy Manson's fix also uses
ReferenceQueues.

Dan



Éamonn

Alan Bateman wrote:
Daniel D. Daugherty wrote:
:

Here is the URL for the webrev:

   http://cr.openjdk.java.net/~dcubed/6942989-webrev/0/

I went through the changes too.

I agree with the performance concern with expunging stale entries each time a LogManager is created. Would it make sense to do this once every N times rather than every time? Or maybe keep a timestamp so that you limit the number of times your attempt to expunge in some time period? (say a maximum of once a minute).

Does LogManager.deleteStaleWeakRefs need to be synchronized with any other operations?

I wonder if Logger.kidsCleanupMarker is the best name for this [ kids and markers usually involve clean-up but that's not what is meant here :-) ]. Maybe rename to something that includes the word "sentinel"?

Is is necessary to use jmap in the regression test? Would it be simpler to have a pure java test that runs with a small heap size? If it leaks then it fails with OOME.

-Alan.




Reply via email to