Eamonn,
Thanks for the review! I didn't know about ReferenceQueues so that's
a very interesting idea. I guess I need to get out of the VM codebase
a little more often... :-)
Jumping ahead in the e-mail thread, Jeremy Manson from Google has
offered the use of their fix for the problem. It looks like it uses
ReferenceQueues so I'll be taking a closer look at that today also.
More below...
On 6/11/2010 10:36 AM, Eamonn McManus wrote:
I think an alternative approach to the one here would be to use a
global ReferenceQueue and a subclass of WeakReference that has a
pointer back to the Logger or LogNode that contains this
WeakReference. Then, in the cases where you currently check for stale
entries, you could simply poll the ReferenceQueue and visit any Logger
or LogNode that it is telling you has a cleared WeakReference.
ReferenceQueue is a very cool class.
There are a couple of subtle points, though. First, you'll need to be
careful about synchronization before clearing WeakReferences, since
the Logger you're being told about isn't necessarily related to the
current one. Logger.kids is protected by a global lock treeLock but
LogManager.loggers is protected by the LogManager's lock, so when
logManager1.addLogger detects that a logger belonging to logManager2
has a stale WeakReference, it needs to drop the logManager1 lock and
acquire the logManager2 one before scanning the reference list. (This
means that addLogger has to change from being a synchronized method to
have a big synchronized(this) block with a non-synchronized tail-end.)
So you're proposing using ReferenceQueues for all of the stuff
that uses WeakReferences in the Logging API and not just for the
two leaks that I've found. Consistency is a good idea so I'll
keep that in mind also.
Second, since apparently the user can introduce loops in the Logger
hierarchy, the reference from the WeakReference subclass back to the
parent Logger will itself need to be a WeakReference.
This subtle point is a little too subtle for my brain right now, but
I'll try to keep it in mind also. Since I've got an VM/NSK test that
blows up when the loops aren't accounted for, I'll have a reminder.
Nevertheless, I think this approach is likely to be less complicated
than the proposed one, notably because there's no need to protect
against infinite recursion; and the performance concerns should be
less because you only look for cleared WeakReferences when you know
they are there, and you know where to look.
I don't think I'll be able to convince anyone that a fix using
ReferenceQueues is less complicated. The fix I wrote is a pretty
simple, big hammer approach. ReferenceQueues, by your own proposal,
introduce subtleties that are less than obvious.
However, your proposal to use ReferenceQueues is a good idea and I'll
investigate it today. It helps that Jeremy Manson's fix also uses
ReferenceQueues.
Dan
Éamonn
Alan Bateman wrote:
Daniel D. Daugherty wrote:
:
Here is the URL for the webrev:
http://cr.openjdk.java.net/~dcubed/6942989-webrev/0/
I went through the changes too.
I agree with the performance concern with expunging stale entries
each time a LogManager is created. Would it make sense to do this
once every N times rather than every time? Or maybe keep a timestamp
so that you limit the number of times your attempt to expunge in some
time period? (say a maximum of once a minute).
Does LogManager.deleteStaleWeakRefs need to be synchronized with any
other operations?
I wonder if Logger.kidsCleanupMarker is the best name for this [ kids
and markers usually involve clean-up but that's not what is meant
here :-) ]. Maybe rename to something that includes the word
"sentinel"?
Is is necessary to use jmap in the regression test? Would it be
simpler to have a pure java test that runs with a small heap size? If
it leaks then it fails with OOME.
-Alan.