RE: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Xueming Shen wrote: I'm not a GC guy, so I might be missing something here, but if close() is being explicitly invoked by some thread, means someone has a strong reference to it, I don't think the finalize() can kick in until that close() returns This is not correct. You can re-publish the reference from another finalizer method and thereby allow another thread to access the object concurrently with the finalizer. Here's a possible sequence of events: 1) GC runs and determines that A and B are finalizable 2) Finalizer thread run A.finalize() 3) A.finalize publishes reference to B in static variable 4a) Another thread reads the static variable and calls B.close() 4b) Finalizer thread runs B.finalize() Regards, Jeroen
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Neil, You now have concurrency issues again. isClosed would need to be volatile if the methods that set/check it aren't synchronized. Multiple threads could call close() at the same time, and threads can call available etc after close has finished its main work. David Neil Richards said the following on 04/08/11 22:36: On Thu, 2011-04-07 at 16:02 -0700, Xueming Shen wrote: It appears it might not be necessary to do the finalize() in ZipFileInflaterInputStream. The ZipFileInflaterInputStream itself does not directly hold any native resource by itself that needs to be released at the end of its life circle, if not closed explicitly. The native resource/memory that need to be taken care of are held by its fields inf and zfin, which should be finalized by the corresponding finalize() of their own classes (again, if not closed explicitly), when their outer ZFIIS object is unreachable. The Inflater class has its own finalize() implemented already to invoke its cleanup method end(), so the only thing need to be addressed is to add the finalize() into ZipFileInputStream class to call its close(), strictly speaking this issue is not the regression caused by #6735255, we have this leak before #6735255. Also, would you like to consider to use WeakHeapMapInputStream, Void instead of handling all the weak reference impl by yourself, the bonus would be that the stalled entries might be cleaned up more frequently. Hi Sherman, Thanks for your continuing analysis of this change. I concur with your assessment above, and agree that making the suggested modifications to the changeset results in the code being simpler and clearer. Please find below an updated changeset incorporating these suggestions (and rebased off jdk7-b136), Let me know if you need anything else to progress this fix forward, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Mon, 2011-04-18 at 13:33 +0100, Neil Richards wrote: I think it's worthwhile trying to clear 'streams' entries from a finalize method in ZFIIS, to help ensure they are cleared in a timely manner. In fact, it's probably clearer (more closely following the pattern elsewhere in the file) for ZFIIS.finalize() to call ZFIIS.close(). See changeset below with this additional tweak, Please let me know what you think on this, Thanks, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID 9e0a3d49e4978264054cbacc9d3023715f1776d3 # Parent aa13e7702cd9d8aca9aa38f1227f966990866944 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff -r aa13e7702cd9 -r 9e0a3d49e497 src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java Tue Mar 29 20:19:55 2011 -0700 +++ b/src/share/classes/java/util/zip/ZipFile.java Wed Mar 16 15:26:48 2011 + @@ -31,11 +31,13 @@ import java.io.EOFException; import java.io.File; import java.nio.charset.Charset; -import java.util.Vector; +import java.util.ArrayDeque; +import java.util.Deque; import java.util.Enumeration; -import java.util.Set; -import java.util.HashSet; +import java.util.HashMap; +import java.util.Map; import java.util.NoSuchElementException; +import java.util.WeakHashMap; import java.security.AccessController; import sun.security.action.GetPropertyAction; import static java.util.zip.ZipConstants64.*; @@ -54,7 +56,7 @@ private long jzfile; // address of jzfile data private String name; // zip file name private int total;// total number of entries -private boolean closeRequested; +private volatile boolean closeRequested = false; private static final int STORED = ZipEntry.STORED; private static final int DEFLATED = ZipEntry.DEFLATED; @@ -314,8 +316,9 @@ // freeEntry releases the C jzentry struct. private static native void freeEntry(long jzfile, long jzentry); -// the outstanding inputstreams that need to be closed. -private SetInputStream streams = new HashSet(); +// the outstanding inputstreams that need to be closed, +// mapped to the inflater objects they use. +private final MapInputStream, Inflater streams = new WeakHashMap(); /** * Returns an input stream for reading the contents of the specified @@ -351,51 +354,21 @@ switch (getEntryMethod(jzentry)) { case STORED: -streams.add(in); +synchronized (streams) { +streams.put(in, null); +} return in; case DEFLATED: -final ZipFileInputStream zfin = in; // MORE: Compute good size for inflater stream: long size = getEntrySize(jzentry) + 2; // Inflater likes a bit of slack if (size 65536) size = 8192; if (size = 0) size = 4096; -InputStream is = new InflaterInputStream(zfin, getInflater(), (int)size) { -private boolean isClosed = false; - -public void close() throws IOException { -if (!isClosed) { -super.close(); -releaseInflater(inf); -isClosed = true; -} -} -// Override fill() method to provide an extra dummy byte -// at the end of the input stream. This is required when -// using the nowrap Inflater option. -protected void fill() throws IOException { -if (eof) { -throw new EOFException( -Unexpected end of ZLIB input stream); -} -len = this.in.read(buf, 0, buf.length); -if (len == -1) { -buf[0] = 0; -len = 1; -eof = true; -} -inf.setInput(buf, 0, len); -} -private boolean eof; - -public int available() throws IOException { -if (isClosed) -return 0; -long avail = zfin.size() - inf.getBytesWritten(); -return avail (long) Integer.MAX_VALUE ? -Integer.MAX_VALUE : (int) avail; -} -
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Fri, 2011-04-08 at 20:53 -0700, Xueming Shen wrote: Regression test /test/java/util/zip/ZipFile/FinalizeInflater.java failed with this fix. The possible solution I can think of now is to do nothing but simply return the inflater back to the cache, but do an additional sanity check when check out private Inflater getInflater() { synchronized (inflaters) { int size = inflaters.size(); while ((size = inflaters.size()) 0) { Inflater inf = inflaters.remove(size - 1); if (!inf.ended()) return inf; } return new Inflater(true); } } http://cr.openjdk.java.net/~sherman/7031076/webrev/ What do you think? have a better idea? Whilst checking on check out might hope to reduce the timing window, I fear that doing so will not entirely eliminate it. (The timing window is essentially between the point the inflater object is placed on the finalization queue for finalization and the point at which that finalization has actually occurred. In that window, if any implementation or user code can cause it to be added to the cache, it could also be retrieved from the cache prior to the point the Finalization thread gets round to calling Inflater.finalize()). Instead, in order to safely cache and reuse inflater objects in ZipFile, I believe it to be necessary for the ZipFile to prevent them from becoming eligible for finalization/GC (until a decision is made not to put them back in the inflater cache due to their having been explicitly ended). It can do this by holding references to the inflater objects that are in use by its input streams, as well as those in the cache (ie. those currently unused). (NB: Previously, the Inflater objects happened to be preserved in this way, due to the ZipFile keeping alive the InputStream objects, which have references to the inflater objects. Thus the challenge is to preserve the Inflater objects - so that they can be cached and reused - without hitting the heap by prolonging the life of the InputStream objects). ZipFile must also only return an inflater to the inflater cache at the point where its end() method can no longer be explicitly called (directly or indirectly) by the user - such as i done in the InflaterFinalization testscase. In other words, it's addition to the cache should only be considered once the input stream that was using it has been completely GC'd. Please see below another formulation of the suggested change which incorporates these features. It adapts the 'streams' map to hold references to the in-use inflater objects as values, and reverts to explicitly dealing with the cleanup of stale entries in the map so as to use that point to drive releaseInflater() with the stored value. (I have chosen to hold these values 'softly', so that they may be released by GC if they are not reused - by being given to the cache - in a timely manner. If the input stream is not explicitly closed, an inflater object may not be reset() until the call to releaseInflater() from clearStaleStreams() - ie. it may continue to hold onto its old buffer until then. As clearStaleStreams() is only driven when a new InputStream is requested, this may be retained for some time. By holding the inflater 'softly', the system can decide to release the inflater's resources by GC'ing it, if heap demand is high, or the system has not asked for new InputStreams from this ZipFile for a while). With releaseInflater() being driven from the cleanup of stale 'streams' entries, it no longer needs to be called from ZFIIS.close(), so, instead, only Inflater.reset() is called from there (providing the inflater object has not explicitly been ended) so that it releases the buffer it has been holding. This allows the synchronization to be simplified, as there's no longer a reason to synchronize on the inflater cache independently of that on the ZipFile itself. (I've also changed the inflater cache to use a Deque, which allows its code to be simplified and should be (marginally) more efficient, according to the API doc). Also, based on the previous observation that object's finalize() methods should concentrate on freeing off their own associated system resources, I've modified ZipFile's close() and finalize() methods such that they both call closeFile(), which deals with closing its system resources). I believe, with these changes, the handling of the caching reuse of ZipFile's Inflater objects is handled in a safe manner, whilst still allowing its InputStreams to be GC'd in a timely manner. Please get back to me with any comments, questions or suggestions on this, Thanks, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 18 April 2011 18:32, Xueming Shen xueming.s...@oracle.com wrote: Hi Neil, All tests passed. I'm starting to push your last patch. I generated the webrev at http://cr.openjdk.java.net/~sherman/7031076/webrev/ It should be exactly the same as your last patch. Thanks, Sherman Hi Sherman, Thanks for all your advice and help in reaching (and committing) a resolution for this issue - it really is much appreciated, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote: Opinion? anything I overlooked/missed? Hi Sherman, Thanks once more for all your help and advice on this - I'm in favour of almost all of what you suggest. :-) I think it's worthwhile trying to clear 'streams' entries from a finalize method in ZFIIS, to help ensure they are cleared in a timely manner. I don't think the Inflater objects need to be held softly in 'inflaterCache', as they will have been reset() at the point they are put into that cache (in releaseInflater()). (I was holding them softly due to a worry over any delay in clearing their 'buf' reference, which is done when reset() is called. Now that the 'streams' entries are being cleared from close() and finalize() - ie. up front - I think this worry is adequately addressed.) I'm worried about changing the object 'streams' refers to in ZipFile.close(), as threads are using that object to synchronize against. I don't believe it is currently giving the guarantee against ConcurrentModificationException being seen from the iterator that I believe you intend it to be, as other threads, calling (for example) ZFIIS.close() at the same time will not be guaranteed to see the update to the value of 'streams'. Instead, I've modified this code so that a real copy of 'streams' is produced, by calling 'new HashMap(streams)'. It is this copy that is iterated through to close() the InputStreams and end() the Inflaters. Once the copy is obtained, 'streams' can be safely clear()'d. Because it is not clear that a Collections.synchronizedMap will give consistent results when fed into the constructor of HashMap, I've used explicit synchronization on 'streams' instead, to ensure 'streams' isn't modified during the construction of 'copy'. As each of the calls to InputStream.close() will synchronize on 'streams' to call its remove() method, I've held that monitor whilst those calls are made. Other minor tweaks are changing 'isClosed' fields to 'closeRequested' (to better describe their use now), changing 'ZipFile.closeRequested' to be volatile (so it can be checked before synchronization, and so that it conforms to the pattern now established elsewhere in the file), and reducing the synchronization block in releaseInflater() (only the call to 'inflateCache.add()' need be protected by synchronization on 'inflaterCache'). Please find below the resultant changeset, Let me know what you think, Thanks, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID c1c7b2dfa0091938bebc0727224690afc892a0b7 # Parent aa13e7702cd9d8aca9aa38f1227f966990866944 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff -r aa13e7702cd9 -r c1c7b2dfa009 src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java Tue Mar 29 20:19:55 2011 -0700 +++ b/src/share/classes/java/util/zip/ZipFile.java Wed Mar 16 15:26:48 2011 + @@ -31,11 +31,13 @@ import java.io.EOFException; import java.io.File; import java.nio.charset.Charset; -import java.util.Vector; +import java.util.ArrayDeque; +import java.util.Deque; import java.util.Enumeration; -import java.util.Set; -import java.util.HashSet; +import java.util.HashMap; +import java.util.Map; import java.util.NoSuchElementException; +import java.util.WeakHashMap; import java.security.AccessController; import sun.security.action.GetPropertyAction; import static java.util.zip.ZipConstants64.*; @@ -54,7 +56,7 @@ private long jzfile; // address of jzfile data private String name; // zip file name private int total;// total number of entries -private boolean closeRequested; +private volatile boolean closeRequested = false; private static final int STORED = ZipEntry.STORED; private static final int DEFLATED = ZipEntry.DEFLATED; @@ -314,8 +316,9 @@ // freeEntry releases the C jzentry struct. private static native void freeEntry(long jzfile, long jzentry); -// the outstanding inputstreams that need to be closed. -private SetInputStream streams = new HashSet(); +// the outstanding inputstreams that need to be closed, +// mapped to the inflater objects they use. +private final MapInputStream, Inflater streams = new WeakHashMap(); /** * Returns an input stream for reading the contents of the specified @@ -351,51 +354,21 @@ switch (getEntryMethod(jzentry)) { case STORED: -streams.add(in); +synchronized (streams) { +streams.put(in, null); +}
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 4/18/2011 5:33 AM, Neil Richards wrote: On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote: Opinion? anything I overlooked/missed? Hi Sherman, Thanks once more for all your help and advice on this - I'm in favour of almost all of what you suggest. :-) I think it's worthwhile trying to clear 'streams' entries from a finalize method in ZFIIS, to help ensure they are cleared in a timely manner. I don't think the Inflater objects need to be held softly in 'inflaterCache', as they will have been reset() at the point they are put into that cache (in releaseInflater()). (I was holding them softly due to a worry over any delay in clearing their 'buf' reference, which is done when reset() is called. Now that the 'streams' entries are being cleared from close() and finalize() - ie. up front - I think this worry is adequately addressed.) I'm worried about changing the object 'streams' refers to in ZipFile.close(), as threads are using that object to synchronize against. I don't believe it is currently giving the guarantee against ConcurrentModificationException being seen from the iterator that I believe you intend it to be, as other threads, calling (for example) ZFIIS.close() at the same time will not be guaranteed to see the update to the value of 'streams'. Instead, I've modified this code so that a real copy of 'streams' is produced, by calling 'new HashMap(streams)'. It is this copy that is iterated through to close() the InputStreams and end() the Inflaters. Once the copy is obtained, 'streams' can be safely clear()'d. Because it is not clear that a Collections.synchronizedMap will give consistent results when fed into the constructor of HashMap, I've used explicit synchronization on 'streams' instead, to ensure 'streams' isn't modified during the construction of 'copy'. As each of the calls to InputStream.close() will synchronize on 'streams' to call its remove() method, I've held that monitor whilst those calls are made. (resent, got some mail issue) Hi Neil, If you are now explicitly synchronize on streams everywhere, I don't think we even need a copy at close(). To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I should think it again), but given we are now using WeakHashMap, those weak reference will get expunged every time that map gets accessed, I doubt that really help lots (have to wait for the GC kicks in anyway, if not closed explicitly) I'm running tests now, with the copy mentioned above. -Sherman Other minor tweaks are changing 'isClosed' fields to 'closeRequested' (to better describe their use now), changing 'ZipFile.closeRequested' to be volatile (so it can be checked before synchronization, and so that it conforms to the pattern now established elsewhere in the file), and reducing the synchronization block in releaseInflater() (only the call to 'inflateCache.add()' need be protected by synchronization on 'inflaterCache'). Please find below the resultant changeset, Let me know what you think, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Neil, All tests passed. I'm starting to push your last patch. I generated the webrev at http://cr.openjdk.java.net/~sherman/7031076/webrev/ It should be exactly the same as your last patch. Thanks, Sherman On 4/18/2011 9:49 AM, Xueming Shen wrote: On 4/18/2011 5:33 AM, Neil Richards wrote: On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote: Opinion? anything I overlooked/missed? Hi Sherman, Thanks once more for all your help and advice on this - I'm in favour of almost all of what you suggest. :-) I think it's worthwhile trying to clear 'streams' entries from a finalize method in ZFIIS, to help ensure they are cleared in a timely manner. I don't think the Inflater objects need to be held softly in 'inflaterCache', as they will have been reset() at the point they are put into that cache (in releaseInflater()). (I was holding them softly due to a worry over any delay in clearing their 'buf' reference, which is done when reset() is called. Now that the 'streams' entries are being cleared from close() and finalize() - ie. up front - I think this worry is adequately addressed.) I'm worried about changing the object 'streams' refers to in ZipFile.close(), as threads are using that object to synchronize against. I don't believe it is currently giving the guarantee against ConcurrentModificationException being seen from the iterator that I believe you intend it to be, as other threads, calling (for example) ZFIIS.close() at the same time will not be guaranteed to see the update to the value of 'streams'. Instead, I've modified this code so that a real copy of 'streams' is produced, by calling 'new HashMap(streams)'. It is this copy that is iterated through to close() the InputStreams and end() the Inflaters. Once the copy is obtained, 'streams' can be safely clear()'d. Because it is not clear that a Collections.synchronizedMap will give consistent results when fed into the constructor of HashMap, I've used explicit synchronization on 'streams' instead, to ensure 'streams' isn't modified during the construction of 'copy'. As each of the calls to InputStream.close() will synchronize on 'streams' to call its remove() method, I've held that monitor whilst those calls are made. (resent, got some mail issue) Hi Neil, If you are now explicitly synchronize on streams everywhere, I don't think we even need a copy at close(). To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I should think it again), but given we are now using WeakHashMap, those weak reference will get expunged every time that map gets accessed, I doubt that really help lots (have to wait for the GC kicks in anyway, if not closed explicitly) I'm running tests now, with the copy mentioned above. -Sherman Other minor tweaks are changing 'isClosed' fields to 'closeRequested' (to better describe their use now), changing 'ZipFile.closeRequested' to be volatile (so it can be checked before synchronization, and so that it conforms to the pattern now established elsewhere in the file), and reducing the synchronization block in releaseInflater() (only the call to 'inflateCache.add()' need be protected by synchronization on 'inflaterCache'). Please find below the resultant changeset, Let me know what you think, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 18 April 2011 17:49, Xueming Shen xueming.s...@oracle.com wrote: If you are now explicitly synchronize on streams everywhere, I don't think we even need a copy at close(). I thought that the copy is needed so that the Map being iterated over (in close()) is not able to be modified by the same thread via the calls to InputStream.close() that it makes (which will try to call streams.remove()), and hence avoid the risk of getting ConcurrentModificationExceptions being thrown. Are you sure that it is safe to avoid doing this copy ? To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I should think it again), but given we are now using WeakHashMap, those weak reference will get expunged every time that map gets accessed, I doubt that really help lots (have to wait for the GC kicks in anyway, if not closed explicitly) As I recall, WeakHashMap only looks to expunge stale entries when one of its methods is called. Therefore, causing it to be accessed (indirectly) from the finalize() methods (by calling close()) helps to ensure that stale entries (with the retained inflater's associated system resources) are not retained for longer than needed. This is true even if the particular access is unlikely to retrieve a live entry. Hope this provides some clarification to the suggested changeset, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Steve, Neil, As we discussed in previous emails that Neil's last patch (to put the inflater into the streams as the value of the map) does solve the FinalizeInflater failure issue (provides a orderly final cleanup between the stream and the inflater. However to leave releaseInfalter() to be solely driven by GC might delay the inflater to be cached/re-used in a timely manner, even in scenario that those streams get closed explicitly by the invoker. Ideally it is desirable that the inflater can be re-used/put back into the cache when ZFIIS.close() gets invoked explicitly. It appears it might be too hard, if not impossible, to arrange the releaseInflater() to be invoked from both clearStaleStreams()/staleStreamQueue and ZFIIS.close(), given the non-guarantee, by the VM/GC, of the timing/order among the clear up of weak reference objects, enqueue them and the invocation of finalize(). So I would suggest a relatively simple solution as showed in webrev at http://cr.openjdk.java.net/~sherman/7031076/webrev/src/share/classes/java/util/zip/ZipFile.java.sdiff.html (1) simply use the WeakHashMap to keep the InputStream, Inflater pair, we don't try to release/put the used inflater back to cache, if the ZFIIS stream does not get closed explicitly (its underlying ZFIS stream and the inflater will then get finalized by themself). (2) release the inflater back to cache in ZFIIS.close(), only if the inflater is sill alive in the streams, which guarantees that the inflater is not being finalized and safe to put back to cache (so long as we have the weak reference - inflater pair in streams, it does not matter if the reference to ZFIIS object is cleared or not). In most cases, I would expect when the ZFIIS.close() is invoked, both ZFIIS object and its inflater are still reachable, so the inflater gets re-used as expected. In weird scenario like what we have in FinalizeInflater, the ZFIIS object and its outer wrapper object are being finalized when ZFIIS.close() gets invoked (from wrapper.finalize()), there is possibility that the inflater might not be put back to cache, if the weak reverence-inflater pair has been expunged out of the streams already (because the weak reference to ZFIIS might have been cleared, so its entry in streams can be expunged any time, by any thread. At this moment, the inflater object might be declared as un-reachable and no longer safe to put back into cache), but I think we can live with this missing. (3) I realized that we actually are not removing the closed ZFIIS objects (only the ZFIS objects are removed when closed) from streams even though those input streams are closed gracefully by the invoker (as preferred, suggested, desired), which means all those ZFIIS streams will stay in streams (strong referenced in current implementation), no matter whether or not they get closed explicitly, until the ZipFile object itself gets closed/finalized. This might contribute to the problem Neil observed at first place (the ZipFile InputStreams accumulat in heap), actually I do have another incident report #7033523 which I have closed as the dup of #7031076. Opinion? anything I overlooked/missed? -Sherman On 04/13/2011 08:25 AM, Steve Poole wrote: On 13/04/11 16:19, Steve Poole wrote: On 12/04/11 20:33, Xueming Shen wrote: Hi Neil, Hi Sherman , Neil is out on vacation so I will do my best to stand in for him. (1) I believe it would be better to keep the synchronization lock for get/releaseInfalter() local instead of using the global ZipFile.this, which I agree is simple. But it also means each/every time when you release the used inflater back to cache, ZipFile.this has to be held and any possible/potential read operation on ZipFile from other thead/ inputstream has to wait (get seems fine, the current impl holds the ZipFile.this anyway before reach the getInflater()). Ok - I agree - seems sensible. (Though I reserve the right to contradict myself later) (2) The resource Infalter mainly holds is the native memory block it alloc-ed at native for zlib, which is not in the Java heap, so I doubt making it softly really helps for GC. Sure, cleanup of those objects themself is a plus, but I'm not sure if it is really worth using SoftReference in this case (it appears you are now invoking clearStaleStreams() from the finalize()). I'd like to keep this in - not all implementations of zlib are equal in where they allocate memory. (3) The releaseInfalter() is now totally driven from clearStaleStreams()/staleStreamQueue, which is under full control of GC. If GC does not kick in for hours/days, infalters can never be put back into its cache after use, even the applications correctly/promptly close those streams after use. And when the GC kicks in, we might see bunch
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 13/04/11 16:19, Steve Poole wrote: On 12/04/11 20:33, Xueming Shen wrote: Hi Neil, Hi Sherman , Neil is out on vacation so I will do my best to stand in for him. (1) I believe it would be better to keep the synchronization lock for get/releaseInfalter() local instead of using the global ZipFile.this, which I agree is simple. But it also means each/every time when you release the used inflater back to cache, ZipFile.this has to be held and any possible/potential read operation on ZipFile from other thead/ inputstream has to wait (get seems fine, the current impl holds the ZipFile.this anyway before reach the getInflater()). Ok - I agree - seems sensible. (Though I reserve the right to contradict myself later) (2) The resource Infalter mainly holds is the native memory block it alloc-ed at native for zlib, which is not in the Java heap, so I doubt making it softly really helps for GC. Sure, cleanup of those objects themself is a plus, but I'm not sure if it is really worth using SoftReference in this case (it appears you are now invoking clearStaleStreams() from the finalize()). I'd like to keep this in - not all implementations of zlib are equal in where they allocate memory. (3) The releaseInfalter() is now totally driven from clearStaleStreams()/staleStreamQueue, which is under full control of GC. If GC does not kick in for hours/days, infalters can never be put back into its cache after use, even the applications correctly/promptly close those streams after use. And when the GC kicks in, we might see bunch (hundreds, thousands) of inflaters get pushed into cache. The approach does solve the timing issue that got us here, but it also now has this native memory cache mechanism totally under the control of/driven by the GC, the java heap management mechanism, which might not be a preferred scenario. Just wonder if we can have a better choice, say with this GC-driven as the backup cleanup and meanwhile still have the ZFIIS.close() to do something to safely put the used inflater back to cache promptly. To put the infalter as the value of the streams map appears to be a good idea/start, now the infalter will not be targeted for finalized until the entry gets cleaned from the map, in which might in fact provide us a sort of orderly (between the stream and its inflater) clearup that the GC/finalizer can't guarantee. We still have couple days before we hit the deadline (to get this one in), so it might worth taking some time on this direction? Dang! - pressed send when I meant save. I understand your comments - on the face of it I agree with what you're suggesting - let me think it through some more though. What is this deadline you are talking about? -Sherman On 04/11/2011 05:15 AM, Neil Richards wrote: On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote: With releaseInflater() being driven from the cleanup of stale 'streams' entries, it no longer needs to be called from ZFIIS.close(), so, instead, only Inflater.reset() is called from there (providing the inflater object has not explicitly been ended) so that it releases the buffer it has been holding. Actually, I have a slight change of heart on this aspect. Because close() may be driven from a finalize() method in user code (as is done in the InflaterFinalizer test case), I believe it is possible (albeit unlikely) for an inflater object to have been reclaimed from 'streams' (by a call to clearStaleStreams()), put into the inflater cache, retrieved from there (by another thread creating an input stream) and having started to be used by that other stream at the point at which the code in close() is run. (This is because weak references will be cleared, and *may* be queued on the ReferenceQueue, prior to finalization). Because of this, it's not actually entirely safe now to call inf.reset() from ZFIIS.close(). Instead, I have added additional calls to clearStaleStreams() from the finalize() methods of both InputStream implementations in the latest (hopefully in the meaning of last) changeset included below. As always, please get back to me with any comments, questions or suggestions on this, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 04/13/2011 08:25 AM, Steve Poole wrote: On 13/04/11 16:19, Steve Poole wrote: On 12/04/11 20:33, Xueming Shen wrote: Hi Neil, Hi Sherman , Neil is out on vacation so I will do my best to stand in for him. (1) I believe it would be better to keep the synchronization lock for get/releaseInfalter() local instead of using the global ZipFile.this, which I agree is simple. But it also means each/every time when you release the used inflater back to cache, ZipFile.this has to be held and any possible/potential read operation on ZipFile from other thead/ inputstream has to wait (get seems fine, the current impl holds the ZipFile.this anyway before reach the getInflater()). Ok - I agree - seems sensible. (Though I reserve the right to contradict myself later) (2) The resource Infalter mainly holds is the native memory block it alloc-ed at native for zlib, which is not in the Java heap, so I doubt making it softly really helps for GC. Sure, cleanup of those objects themself is a plus, but I'm not sure if it is really worth using SoftReference in this case (it appears you are now invoking clearStaleStreams() from the finalize()). I'd like to keep this in - not all implementations of zlib are equal in where they allocate memory. Hi Steve, It might be better/appropriate to use the SoftReference in the inflaterCache instead of the streams, if we want to use the SoftReference for inflater caching purpose. /* * Gets an inflater from the list of available inflaters or allocates * a new one. */ private synchronized Inflater getInflater() { SoftReferenceInflater sref; Inflater inf; while (null != (sref = inflaterCache.poll())) { if ((inf = sref.get()) != null !inf.ended()) { return inf; } } return new Inflater(true); } /* * Releases the specified inflater to the list of available inflaters. */ private synchronized void releaseInflater(Inflater inf) { if ((null != inf) (false == inf.ended())) { inf.reset(); inflaterCache.add(new SoftReference(inf)); } } // List of available Inflater objects for decompression private DequeSoftReferenceInflater inflaterCache = new ArrayDeque(); -Sherman (3) The releaseInfalter() is now totally driven from clearStaleStreams()/staleStreamQueue, which is under full control of GC. If GC does not kick in for hours/days, infalters can never be put back into its cache after use, even the applications correctly/promptly close those streams after use. And when the GC kicks in, we might see bunch (hundreds, thousands) of inflaters get pushed into cache. The approach does solve the timing issue that got us here, but it also now has this native memory cache mechanism totally under the control of/driven by the GC, the java heap management mechanism, which might not be a preferred scenario. Just wonder if we can have a better choice, say with this GC-driven as the backup cleanup and meanwhile still have the ZFIIS.close() to do something to safely put the used inflater back to cache promptly. To put the infalter as the value of the streams map appears to be a good idea/start, now the infalter will not be targeted for finalized until the entry gets cleaned from the map, in which might in fact provide us a sort of orderly (between the stream and its inflater) clearup that the GC/finalizer can't guarantee. We still have couple days before we hit the deadline (to get this one in), so it might worth taking some time on this direction? Dang! - pressed send when I meant save. I understand your comments - on the face of it I agree with what you're suggesting - let me think it through some more though. What is this deadline you are talking about? -Sherman On 04/11/2011 05:15 AM, Neil Richards wrote: On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote: With releaseInflater() being driven from the cleanup of stale 'streams' entries, it no longer needs to be called from ZFIIS.close(), so, instead, only Inflater.reset() is called from there (providing the inflater object has not explicitly been ended) so that it releases the buffer it has been holding. Actually, I have a slight change of heart on this aspect. Because close() may be driven from a finalize() method in user code (as is done in the InflaterFinalizer test case), I believe it is possible (albeit unlikely) for an inflater object to have been reclaimed from 'streams' (by a call to clearStaleStreams()), put into the inflater cache, retrieved from there (by another thread creating an input stream) and having started to be used by that other stream at the point at which the code in close() is run. (This is because weak references will be cleared, and *may* be queued on the
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Neil, (1) I believe it would be better to keep the synchronization lock for get/releaseInfalter() local instead of using the global ZipFile.this, which I agree is simple. But it also means each/every time when you release the used inflater back to cache, ZipFile.this has to be held and any possible/potential read operation on ZipFile from other thead/ inputstream has to wait (get seems fine, the current impl holds the ZipFile.this anyway before reach the getInflater()). (2) The resource Infalter mainly holds is the native memory block it alloc-ed at native for zlib, which is not in the Java heap, so I doubt making it softly really helps for GC. Sure, cleanup of those objects themself is a plus, but I'm not sure if it is really worth using SoftReference in this case (it appears you are now invoking clearStaleStreams() from the finalize()). (3) The releaseInfalter() is now totally driven from clearStaleStreams()/staleStreamQueue, which is under full control of GC. If GC does not kick in for hours/days, infalters can never be put back into its cache after use, even the applications correctly/promptly close those streams after use. And when the GC kicks in, we might see bunch (hundreds, thousands) of inflaters get pushed into cache. The approach does solve the timing issue that got us here, but it also now has this native memory cache mechanism totally under the control of/driven by the GC, the java heap management mechanism, which might not be a preferred scenario. Just wonder if we can have a better choice, say with this GC-driven as the backup cleanup and meanwhile still have the ZFIIS.close() to do something to safely put the used inflater back to cache promptly. To put the infalter as the value of the streams map appears to be a good idea/start, now the infalter will not be targeted for finalized until the entry gets cleaned from the map, in which might in fact provide us a sort of orderly (between the stream and its inflater) clearup that the GC/finalizer can't guarantee. We still have couple days before we hit the deadline (to get this one in), so it might worth taking some time on this direction? -Sherman On 04/11/2011 05:15 AM, Neil Richards wrote: On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote: With releaseInflater() being driven from the cleanup of stale 'streams' entries, it no longer needs to be called from ZFIIS.close(), so, instead, only Inflater.reset() is called from there (providing the inflater object has not explicitly been ended) so that it releases the buffer it has been holding. Actually, I have a slight change of heart on this aspect. Because close() may be driven from a finalize() method in user code (as is done in the InflaterFinalizer test case), I believe it is possible (albeit unlikely) for an inflater object to have been reclaimed from 'streams' (by a call to clearStaleStreams()), put into the inflater cache, retrieved from there (by another thread creating an input stream) and having started to be used by that other stream at the point at which the code in close() is run. (This is because weak references will be cleared, and *may* be queued on the ReferenceQueue, prior to finalization). Because of this, it's not actually entirely safe now to call inf.reset() from ZFIIS.close(). Instead, I have added additional calls to clearStaleStreams() from the finalize() methods of both InputStream implementations in the latest (hopefully in the meaning of last) changeset included below. As always, please get back to me with any comments, questions or suggestions on this, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote: With releaseInflater() being driven from the cleanup of stale 'streams' entries, it no longer needs to be called from ZFIIS.close(), so, instead, only Inflater.reset() is called from there (providing the inflater object has not explicitly been ended) so that it releases the buffer it has been holding. Actually, I have a slight change of heart on this aspect. Because close() may be driven from a finalize() method in user code (as is done in the InflaterFinalizer test case), I believe it is possible (albeit unlikely) for an inflater object to have been reclaimed from 'streams' (by a call to clearStaleStreams()), put into the inflater cache, retrieved from there (by another thread creating an input stream) and having started to be used by that other stream at the point at which the code in close() is run. (This is because weak references will be cleared, and *may* be queued on the ReferenceQueue, prior to finalization). Because of this, it's not actually entirely safe now to call inf.reset() from ZFIIS.close(). Instead, I have added additional calls to clearStaleStreams() from the finalize() methods of both InputStream implementations in the latest (hopefully in the meaning of last) changeset included below. As always, please get back to me with any comments, questions or suggestions on this, Thanks, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID b66c974bcfa957281e69a63cd4019a12c13cacae # Parent aa13e7702cd9d8aca9aa38f1227f966990866944 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff -r aa13e7702cd9 -r b66c974bcfa9 src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java Tue Mar 29 20:19:55 2011 -0700 +++ b/src/share/classes/java/util/zip/ZipFile.java Wed Mar 16 15:26:48 2011 + @@ -30,11 +30,16 @@ import java.io.IOException; import java.io.EOFException; import java.io.File; +import java.lang.ref.ReferenceQueue; +import java.lang.ref.SoftReference; +import java.lang.ref.WeakReference; import java.nio.charset.Charset; -import java.util.Vector; +import java.util.ArrayDeque; +import java.util.Deque; import java.util.Enumeration; -import java.util.Set; -import java.util.HashSet; +import java.util.HashMap; +import java.util.Iterator; +import java.util.Map; import java.util.NoSuchElementException; import java.security.AccessController; import sun.security.action.GetPropertyAction; @@ -314,8 +319,21 @@ // freeEntry releases the C jzentry struct. private static native void freeEntry(long jzfile, long jzentry); -// the outstanding inputstreams that need to be closed. -private SetInputStream streams = new HashSet(); +// the outstanding inputstreams that need to be closed, +// mapped to the inflater objects they use. +private final MapWeakReferenceInputStream, SoftReferenceInflater +streams = new HashMap(); +private final ReferenceQueue? super InputStream staleStreamQueue = +new ReferenceQueue(); +private static final SoftReferenceInflater NULL_INFLATER_REF = +new SoftReference(null); + +private synchronized void clearStaleStreams() { +Object staleStream; +while (null != (staleStream = staleStreamQueue.poll())) { +releaseInflater(streams.remove(staleStream).get()); +} +} /** * Returns an input stream for reading the contents of the specified @@ -339,6 +357,7 @@ ZipFileInputStream in = null; synchronized (this) { ensureOpen(); +clearStaleStreams(); if (!zc.isUTF8() (entry.flag EFS) != 0) { jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false); } else { @@ -351,51 +370,21 @@ switch (getEntryMethod(jzentry)) { case STORED: -streams.add(in); +streams.put( +new WeakReferenceInputStream(in, staleStreamQueue), +NULL_INFLATER_REF); return in; case DEFLATED: -final ZipFileInputStream zfin = in; // MORE: Compute good size for inflater stream: long size = getEntrySize(jzentry) + 2; // Inflater likes a bit of slack if (size 65536) size = 8192; if (size = 0) size = 4096; -InputStream is = new InflaterInputStream(zfin, getInflater(), (int)size) { -private boolean isClosed = false; - -
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi David, You are correct that the ZipFileInflaterInputStream code is not inherently thread-safe (without user synchronization), but then, the contract of InputStream does not require it to be (as Sherman previously observed on this thread). The previous suggested fix caused ZFIIS to interact with the finalize thread, thus it had to care about being thread-safe at least to the extent of safe-guarding that interaction. As Sherman observed, and I agreed, finalization on that object is not really needed, so the inherent cross-thread interaction that would be introduced by the finalize method can be avoided in that case, and so the need to cater for inherent thread-safety can likewise be avoided. With the latest suggested change, if the ZFIIS is driven by multiple threads, it will be done at the user's (developer's) behest, so the user can (should) know to implement suitable synchronization to ensure correct behaviour. Therefore, I believe the same lack of synchronization in ZFIIS as there was in the previous anonymous inner class to be entirely justified in this case, given that it no longer has a finalize() method. Please let me know if you still disagree with this assessment, Thanks, Neil (Sent from my phone, so probably missing my usual signature)
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Neil Richards said the following on 04/09/11 10:04: You are correct that the ZipFileInflaterInputStream code is not inherently thread-safe (without user synchronization), but then, the contract of InputStream does not require it to be (as Sherman previously observed on this thread). The previous suggested fix caused ZFIIS to interact with the finalize thread, thus it had to care about being thread-safe at least to the extent of safe-guarding that interaction. As Sherman observed, and I agreed, finalization on that object is not really needed, so the inherent cross-thread interaction that would be introduced by the finalize method can be avoided in that case, and so the need to cater for inherent thread-safety can likewise be avoided. Ok. David -- With the latest suggested change, if the ZFIIS is driven by multiple threads, it will be done at the user's (developer's) behest, so the user can (should) know to implement suitable synchronization to ensure correct behaviour. Therefore, I believe the same lack of synchronization in ZFIIS as there was in the previous anonymous inner class to be entirely justified in this case, given that it no longer has a finalize() method. Please let me know if you still disagree with this assessment, Thanks, Neil (Sent from my phone, so probably missing my usual signature)
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Neil, Regression test /test/java/util/zip/ZipFile/FinalizeInflater.java failed with this fix. The possible solution I can think of now is to do nothing but simply return the inflater back to the cache, but do an additional sanity check when check out private Inflater getInflater() { synchronized (inflaters) { int size = inflaters.size(); while ((size = inflaters.size()) 0) { Inflater inf = inflaters.remove(size - 1); if (!inf.ended()) return inf; } return new Inflater(true); } } http://cr.openjdk.java.net/~sherman/7031076/webrev/ What do you think? have a better idea? Sherman On 04-08-2011 5:36 AM, Neil Richards wrote: On Thu, 2011-04-07 at 16:02 -0700, Xueming Shen wrote: It appears it might not be necessary to do the finalize() in ZipFileInflaterInputStream. The ZipFileInflaterInputStream itself does not directly hold any native resource by itself that needs to be released at the end of its life circle, if not closed explicitly. The native resource/memory that need to be taken care of are held by its fields inf and zfin, which should be finalized by the corresponding finalize() of their own classes (again, if not closed explicitly), when their outer ZFIIS object is unreachable. The Inflater class has its own finalize() implemented already to invoke its cleanup method end(), so the only thing need to be addressed is to add the finalize() into ZipFileInputStream class to call its close(), strictly speaking this issue is not the regression caused by #6735255, we have this leak before #6735255. Also, would you like to consider to use WeakHeapMapInputStream, Void instead of handling all the weak reference impl by yourself, the bonus would be that the stalled entries might be cleaned up more frequently. Hi Sherman, Thanks for your continuing analysis of this change. I concur with your assessment above, and agree that making the suggested modifications to the changeset results in the code being simpler and clearer. Please find below an updated changeset incorporating these suggestions (and rebased off jdk7-b136), Let me know if you need anything else to progress this fix forward, Thanks, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote: 1. If a call to close() occurs around the same time as finalization occurs then the finalizer thread will set inFinalizer to true, at which point the thread executing close() can see it is true and so may skip the releaseInflater(inf) call. 2. As isClosed is not volatile, and available() is not synchronized, a thread calling available() on a closed stream may not see that it has been closed and so will likely encounter an IOException rather than getting a zero return. Even if #1 is not a practical problem I'd be inclined to make the finalizer synchronized as well. By doing that you're effectively ensuring that premature finalization of the stream won't happen. I tend to agree, especially as it also makes the intention of the code clearer. For #2 it is a tricky call. If you don't actually expect the stream object to be used by multiple threads then using a synchronized block to read isClosed will be cheap in VMs that go to great effort to reduce the cost of unnecessary synchronization (eg biased-locking in hotspot). Otherwise making isClosed volatile is likely the better option. The check at the start of available() guards the logic beyond (which uses a value from the inflater object, which would not be valid if the stream has been closed()). Because of this, I think it would be clearer to synchronize the available() method too. As you say, it is extremely likely that, in practice, this synchronization will never be contended, and so won't impact performance significantly (in modern VMs). Please find below an updated changeset with these modifications, Thanks for your advice in this, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID db52003c128c8706f45e0b2bf9f98d5e905d2090 # Parent 554adcfb615e63e62af530b1c10fcf7813a75b26 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff -r 554adcfb615e -r db52003c128c src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java Wed Mar 16 15:01:07 2011 -0700 +++ b/src/share/classes/java/util/zip/ZipFile.java Wed Mar 16 15:26:48 2011 + @@ -30,11 +30,14 @@ import java.io.IOException; import java.io.EOFException; import java.io.File; +import java.lang.ref.ReferenceQueue; +import java.lang.ref.WeakReference; import java.nio.charset.Charset; import java.util.Vector; import java.util.Enumeration; import java.util.Set; import java.util.HashSet; +import java.util.Iterator; import java.util.NoSuchElementException; import java.security.AccessController; import sun.security.action.GetPropertyAction; @@ -315,7 +318,16 @@ private static native void freeEntry(long jzfile, long jzentry); // the outstanding inputstreams that need to be closed. -private SetInputStream streams = new HashSet(); +private final SetWeakReferenceInputStream streams = new HashSet(); +private final ReferenceQueue? super InputStream staleStreamQueue = +new ReferenceQueue(); + +private void clearStaleStreams() { +Object staleStream; +while (null != (staleStream = staleStreamQueue.poll())) { +streams.remove(staleStream); +} +} /** * Returns an input stream for reading the contents of the specified @@ -339,6 +351,7 @@ ZipFileInputStream in = null; synchronized (this) { ensureOpen(); +clearStaleStreams(); if (!zc.isUTF8() (entry.flag EFS) != 0) { jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false); } else { @@ -351,51 +364,19 @@ switch (getEntryMethod(jzentry)) { case STORED: -streams.add(in); +streams.add( +new WeakReferenceInputStream(in, staleStreamQueue)); return in; case DEFLATED: -final ZipFileInputStream zfin = in; // MORE: Compute good size for inflater stream: long size = getEntrySize(jzentry) + 2; // Inflater likes a bit of slack if (size 65536) size = 8192; if (size = 0) size = 4096; -InputStream is = new InflaterInputStream(zfin, getInflater(), (int)size) { -private boolean isClosed = false; - -public void close() throws IOException { -if (!isClosed) { -super.close(); -releaseInflater(inf); -
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Neil, Neil Richards said the following on 04/07/11 23:30: On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote: 1. If a call to close() occurs around the same time as finalization occurs then the finalizer thread will set inFinalizer to true, at which point the thread executing close() can see it is true and so may skip the releaseInflater(inf) call. 2. As isClosed is not volatile, and available() is not synchronized, a thread calling available() on a closed stream may not see that it has been closed and so will likely encounter an IOException rather than getting a zero return. Even if #1 is not a practical problem I'd be inclined to make the finalizer synchronized as well. By doing that you're effectively ensuring that premature finalization of the stream won't happen. I tend to agree, especially as it also makes the intention of the code clearer. For #2 it is a tricky call. If you don't actually expect the stream object to be used by multiple threads then using a synchronized block to read isClosed will be cheap in VMs that go to great effort to reduce the cost of unnecessary synchronization (eg biased-locking in hotspot). Otherwise making isClosed volatile is likely the better option. The check at the start of available() guards the logic beyond (which uses a value from the inflater object, which would not be valid if the stream has been closed()). Because of this, I think it would be clearer to synchronize the available() method too. As you say, it is extremely likely that, in practice, this synchronization will never be contended, and so won't impact performance significantly (in modern VMs). Please find below an updated changeset with these modifications, Thanks for your advice in this, No problem. From a synchronization perspective this all seems fine now. Cheers, David
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Neil, It appears it might not be necessary to do the finalize() in ZipFileInflaterInputStream. The ZipFileInflaterInputStream itself does not directly hold any native resource by itself that needs to be released at the end of its life circle, if not closed explicitly. The native resource/ memory that need to be taken care of are held by its fields inf and zfin, which should be finalized by the corresponding finalize() of their own classes (again, if not closed explicitly), when their outer ZFIIS object is unreachable. The Inflater class has its own finalize() implemented already to invoke its cleanup method end(), so the only thing need to be addressed is to add the finalize() into ZipFileInputStream class to call its close(), strictly speaking this issue is not the regression caused by #6735255, we have this leak before #6735255. Also, would you like to consider to use WeakHeapMapInputStream, Void instead of handling all the weak reference impl by yourself, the bonus would be that the stalled entries might be cleaned up more frequently. Sherman On 04/07/2011 06:30 AM, Neil Richards wrote: On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote: 1. If a call to close() occurs around the same time as finalization occurs then the finalizer thread will set inFinalizer to true, at which point the thread executing close() can see it is true and so may skip the releaseInflater(inf) call. 2. As isClosed is not volatile, and available() is not synchronized, a thread calling available() on a closed stream may not see that it has been closed and so will likely encounter an IOException rather than getting a zero return. Even if #1 is not a practical problem I'd be inclined to make the finalizer synchronized as well. By doing that you're effectively ensuring that premature finalization of the stream won't happen. I tend to agree, especially as it also makes the intention of the code clearer. For #2 it is a tricky call. If you don't actually expect the stream object to be used by multiple threads then using a synchronized block to read isClosed will be cheap in VMs that go to great effort to reduce the cost of unnecessary synchronization (eg biased-locking in hotspot). Otherwise making isClosed volatile is likely the better option. The check at the start of available() guards the logic beyond (which uses a value from the inflater object, which would not be valid if the stream has been closed()). Because of this, I think it would be clearer to synchronize the available() method too. As you say, it is extremely likely that, in practice, this synchronization will never be contended, and so won't impact performance significantly (in modern VMs). Please find below an updated changeset with these modifications, Thanks for your advice in this, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Neil, Neil Richards said the following on 04/02/11 11:28: David, I can't claim to have reached true enlightenment wrt the section of the JLS that you point to - I suspect I'll need to lie down in a darkened room with a dampened flannel across the brow and ponder the nature of things a while to get there :-) In the meantime, can you (or other knowledgeable GC folk) advise whether it has any implications which would cause the current suggested fix to be unsuitable? (To recap, the current suggestion has the ZipFileInflaterInputStream's close() method synchronized on itself, and called from its finalize() method. The close() method has both the read and write of the 'isClosed' field). Any reassurance (or otherwise) gratefully accepted on this matter, I don't fully understand the object relationships here so I'll just make a couple of observations on the synchronization in this code: public synchronized void close() throws IOException { if (!isClosed) { super.close(); if (false == inFinalizer) releaseInflater(inf); isClosed = true; } } public int available() throws IOException { if (isClosed) return 0; long avail = zfin.size() - inf.getBytesWritten(); ... } protected void finalize() throws IOException { inFinalizer = true; close(); } 1. If a call to close() occurs around the same time as finalization occurs then the finalizer thread will set inFinalizer to true, at which point the thread executing close() can see it is true and so may skip the releaseInflater(inf) call. 2. As isClosed is not volatile, and available() is not synchronized, a thread calling available() on a closed stream may not see that it has been closed and so will likely encounter an IOException rather than getting a zero return. Even if #1 is not a practical problem I'd be inclined to make the finalizer synchronized as well. By doing that you're effectively ensuring that premature finalization of the stream won't happen. For #2 it is a tricky call. If you don't actually expect the stream object to be used by multiple threads then using a synchronized block to read isClosed will be cheap in VMs that go to great effort to reduce the cost of unnecessary synchronization (eg biased-locking in hotspot). Otherwise making isClosed volatile is likely the better option. HTH David
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 4/1/2011 11:56 PM, Jeroen Frijters wrote: Xueming Shen wrote: I'm not a GC guy, so I might be missing something here, but if close() is being explicitly invoked by some thread, means someone has a strong reference to it, I don't think the finalize() can kick in until that close() returns This is not correct. You can re-publish the reference from another finalizer method and thereby allow another thread to access the object concurrently with the finalizer. Here's a possible sequence of events: 1) GC runs and determines that A and B are finalizable 2) Finalizer thread run A.finalize() 3) A.finalize publishes reference to B in static variable Jeroen, are you talking about the object resurrection from finalize()? How do you re-publish/get the reference to B inside A.finalize()? I think you can do that inside B's finalize() to assign this to a global static variable. Regard, -Sherman 4a) Another thread reads the static variable and calls B.close() 4b) Finalizer thread runs B.finalize() Regards, Jeroen
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 04-02-2011 12:47 AM, Xueming Shen wrote: On 4/1/2011 11:56 PM, Jeroen Frijters wrote: Xueming Shen wrote: I'm not a GC guy, so I might be missing something here, but if close() is being explicitly invoked by some thread, means someone has a strong reference to it, I don't think the finalize() can kick in until that close() returns This is not correct. You can re-publish the reference from another finalizer method and thereby allow another thread to access the object concurrently with the finalizer. Here's a possible sequence of events: 1) GC runs and determines that A and B are finalizable 2) Finalizer thread run A.finalize() 3) A.finalize publishes reference to B in static variable Jeroen, are you talking about the object resurrection from finalize()? How do you re-publish/get the reference to B inside A.finalize()? I think you can do that inside B's finalize() to assign this to a global static variable. Jeroen, Guess I replied too quick. I think I got what you meant. A has a reference to B, GC finalizes A and B the same time, then A republish reference to B and let someone else to invoke b.close().. I would have to admit it's a possible use scenario:-) -Sherman Regard, -Sherman 4a) Another thread reads the static variable and calls B.close() 4b) Finalizer thread runs B.finalize() Regards, Jeroen
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Xueming Shen said the following on 04/02/11 17:00: On 4/1/2011 4:17 PM, David Holmes wrote: Xueming Shen said the following on 04/02/11 05:07: ... explicit invocation is cleared. The InputStream is eligible for finalization only after it is weakly reachable, means no more stronger reachable exists, right? Actually no. One of the more obscure corner cases with finalization is that you can actually finalize an object that is still being used. The JLS actually spells this out - see section 12.6.1 and in particular the Discussion within that section. The scenario that Neil and I were discussing is something like this, There is class A class A { void close() { ... } protect void finalize() { ... close(); } } when we are in the middle of A's close() (invoked by someone, not the finalizer), do we need to worry about that A's finalize() being invoked (and then the close()) by the finalizer concurrently. Does you an object that still being used include the scenario like above, which means an object became finalizer-reachable, when still in the middle of the execution (by some alive, non-finalizer-thread) of one of its instance method body? The JLS 12.6.1, if I read it correctly, is for scenario that a reachable object which is strongly referenced by a stack reference can/may become finalizer-reachable sooner than it might be expected, for example, the compiler optimization can null out such reference in the middle of the method body, so that object becomes finalizer-reachable before the execution reach the return point of the method, or ... The execution discussed is not the execution inside the target object's method body. Am I reading it correctly? Otherwise, it becomes a little weird, image, you are in the middle of the execution of an instance method, suddenly, the instance itself is being finalized, all the native resource get released... Yes 12.6.1 refers to the case where the object is only strongly-reachable from a local stack reference. This includes the little weird scenario you describe. A thread can be executing a.close() where 'a' is a local stack reference, and 'a' can be finalized while that is heppening. I'm not saying hotspot will actually do this, but it is permissible within the spec. This situation has been discussed quite a bit in the past on the Java Memory Model mailing list. David -Sherman
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Thanks David! And Neil, it seems my assumption is wrong and we do need the synchronization for the close(). Your latest patch looks fine for me. Thanks, -Sherman On 04-02-2011 6:15 PM, David Holmes wrote: Xueming Shen said the following on 04/02/11 17:00: On 4/1/2011 4:17 PM, David Holmes wrote: Xueming Shen said the following on 04/02/11 05:07: ... explicit invocation is cleared. The InputStream is eligible for finalization only after it is weakly reachable, means no more stronger reachable exists, right? Actually no. One of the more obscure corner cases with finalization is that you can actually finalize an object that is still being used. The JLS actually spells this out - see section 12.6.1 and in particular the Discussion within that section. The scenario that Neil and I were discussing is something like this, There is class A class A { void close() { ... } protect void finalize() { ... close(); } } when we are in the middle of A's close() (invoked by someone, not the finalizer), do we need to worry about that A's finalize() being invoked (and then the close()) by the finalizer concurrently. Does you an object that still being used include the scenario like above, which means an object became finalizer-reachable, when still in the middle of the execution (by some alive, non-finalizer-thread) of one of its instance method body? The JLS 12.6.1, if I read it correctly, is for scenario that a reachable object which is strongly referenced by a stack reference can/may become finalizer-reachable sooner than it might be expected, for example, the compiler optimization can null out such reference in the middle of the method body, so that object becomes finalizer-reachable before the execution reach the return point of the method, or ... The execution discussed is not the execution inside the target object's method body. Am I reading it correctly? Otherwise, it becomes a little weird, image, you are in the middle of the execution of an instance method, suddenly, the instance itself is being finalized, all the native resource get released... Yes 12.6.1 refers to the case where the object is only strongly-reachable from a local stack reference. This includes the little weird scenario you describe. A thread can be executing a.close() where 'a' is a local stack reference, and 'a' can be finalized while that is heppening. I'm not saying hotspot will actually do this, but it is permissible within the spec. This situation has been discussed quite a bit in the past on the Java Memory Model mailing list. David -Sherman
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote: Isn't it true that when the finalize()-close() gets invoked, there should be no strong reference anywhere else that you can use to invoke close() in other thread? It's true that once finalize() has been called, there can't be another explicit call to close(). However, if close() is explicitly called first, it will be called again when finalize() calls it, so one still wants the update to 'isClosed' to be seen by the finalizer thread (in this case). ZipFileInputStream class has to sync on ZipFile.this, the read/close() methods are accessing the underlying/shared bits of the ZipFile. For ZipFileInflaterInputStream.close() case, even we want to make sure the isClose is synced, I would prefer to see a local lock, maybe simply make close() synchronized is better? After assessing it again, I tend to agree - synchronizing on the ZFIIS (instead of the ZipFile) now looks to me to be safe because it does not do anything which synchronizes on the ZipFile object. (Note that if it _did_ cause such synchronization, it would risk deadlock with the code in ZipFile's close() method, which is synchronized on itself when it calls close() on each remaining ZFIIS. It's the difference in the order of acquiring the monitors that would cause the potential for deadlock). As it is, both close() methods synchronize on the 'inflaters' object within their respective synchronized blocks, but that does not risk deadlock. Please find below an updated changeset below, containing the suggested change to synchronization. Thanks once again for your suggestions, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID 4aa4844cd3b9ae747ad736469134a8828ebeb652 # Parent 554adcfb615e63e62af530b1c10fcf7813a75b26 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff --git a/src/share/classes/java/util/zip/ZipFile.java b/src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java +++ b/src/share/classes/java/util/zip/ZipFile.java @@ -30,11 +30,14 @@ import java.io.IOException; import java.io.EOFException; import java.io.File; +import java.lang.ref.ReferenceQueue; +import java.lang.ref.WeakReference; import java.nio.charset.Charset; import java.util.Vector; import java.util.Enumeration; import java.util.Set; import java.util.HashSet; +import java.util.Iterator; import java.util.NoSuchElementException; import java.security.AccessController; import sun.security.action.GetPropertyAction; @@ -315,7 +318,16 @@ private static native void freeEntry(long jzfile, long jzentry); // the outstanding inputstreams that need to be closed. -private SetInputStream streams = new HashSet(); +private final SetWeakReferenceInputStream streams = new HashSet(); +private final ReferenceQueue? super InputStream staleStreamQueue = +new ReferenceQueue(); + +private void clearStaleStreams() { +Object staleStream; +while (null != (staleStream = staleStreamQueue.poll())) { +streams.remove(staleStream); +} +} /** * Returns an input stream for reading the contents of the specified @@ -339,6 +351,7 @@ ZipFileInputStream in = null; synchronized (this) { ensureOpen(); +clearStaleStreams(); if (!zc.isUTF8() (entry.flag EFS) != 0) { jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false); } else { @@ -351,51 +364,19 @@ switch (getEntryMethod(jzentry)) { case STORED: -streams.add(in); +streams.add( +new WeakReferenceInputStream(in, staleStreamQueue)); return in; case DEFLATED: -final ZipFileInputStream zfin = in; // MORE: Compute good size for inflater stream: long size = getEntrySize(jzentry) + 2; // Inflater likes a bit of slack if (size 65536) size = 8192; if (size = 0) size = 4096; -InputStream is = new InflaterInputStream(zfin, getInflater(), (int)size) { -private boolean isClosed = false; - -public void close() throws IOException { -if (!isClosed) { -super.close(); -releaseInflater(inf); -isClosed = true; -} -} -// Override fill()
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 04/01/2011 09:42 AM, Neil Richards wrote: On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote: Isn't it true that when the finalize()-close() gets invoked, there should be no strong reference anywhere else that you can use to invoke close() in other thread? It's true that once finalize() has been called, there can't be another explicit call to close(). However, if close() is explicitly called first, it will be called again when finalize() calls it, so one still wants the update to 'isClosed' to be seen by the finalizer thread (in this case). I'm not a GC guy, so I might be missing something here, but if close() is being explicitly invoked by some thread, means someone has a strong reference to it, I don't think the finalize() can kick in until that close() returns and the strong reference used to make that explicit invocation is cleared. The InputStream is eligible for finalization only after it is weakly reachable, means no more stronger reachable exists, right? -sherman ZipFileInputStream class has to sync on ZipFile.this, the read/close() methods are accessing the underlying/shared bits of the ZipFile. For ZipFileInflaterInputStream.close() case, even we want to make sure the isClose is synced, I would prefer to see a local lock, maybe simply make close() synchronized is better? After assessing it again, I tend to agree - synchronizing on the ZFIIS (instead of the ZipFile) now looks to me to be safe because it does not do anything which synchronizes on the ZipFile object. (Note that if it _did_ cause such synchronization, it would risk deadlock with the code in ZipFile's close() method, which is synchronized on itself when it calls close() on each remaining ZFIIS. It's the difference in the order of acquiring the monitors that would cause the potential for deadlock). As it is, both close() methods synchronize on the 'inflaters' object within their respective synchronized blocks, but that does not risk deadlock. Please find below an updated changeset below, containing the suggested change to synchronization. Thanks once again for your suggestions, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Xueming Shen said the following on 04/02/11 05:07: On 04/01/2011 09:42 AM, Neil Richards wrote: On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote: Isn't it true that when the finalize()-close() gets invoked, there should be no strong reference anywhere else that you can use to invoke close() in other thread? It's true that once finalize() has been called, there can't be another explicit call to close(). However, if close() is explicitly called first, it will be called again when finalize() calls it, so one still wants the update to 'isClosed' to be seen by the finalizer thread (in this case). I'm not a GC guy, so I might be missing something here, but if close() is being explicitly invoked by some thread, means someone has a strong reference to it, I don't think the finalize() can kick in until that close() returns and the strong reference used to make that explicit invocation is cleared. The InputStream is eligible for finalization only after it is weakly reachable, means no more stronger reachable exists, right? Actually no. One of the more obscure corner cases with finalization is that you can actually finalize an object that is still being used. The JLS actually spells this out - see section 12.6.1 and in particular the Discussion within that section. David -sherman ZipFileInputStream class has to sync on ZipFile.this, the read/close() methods are accessing the underlying/shared bits of the ZipFile. For ZipFileInflaterInputStream.close() case, even we want to make sure the isClose is synced, I would prefer to see a local lock, maybe simply make close() synchronized is better? After assessing it again, I tend to agree - synchronizing on the ZFIIS (instead of the ZipFile) now looks to me to be safe because it does not do anything which synchronizes on the ZipFile object. (Note that if it _did_ cause such synchronization, it would risk deadlock with the code in ZipFile's close() method, which is synchronized on itself when it calls close() on each remaining ZFIIS. It's the difference in the order of acquiring the monitors that would cause the potential for deadlock). As it is, both close() methods synchronize on the 'inflaters' object within their respective synchronized blocks, but that does not risk deadlock. Please find below an updated changeset below, containing the suggested change to synchronization. Thanks once again for your suggestions, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Sherman, Thanks for your review and comments on this. On Tue, 2011-03-29 at 12:05 -0700, Xueming Shen wrote: Hi Neil, It appears to be a regression in scenario you described (user application never close the input stream after use and the ZipFile instance being retained during the lifetime of the process). The proposed approach seems to solve this particular problem. Here are my comments regarding the patch. (1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, is it really necessary? Synchronization is used so that an update to 'isClosed' on one thread is seen by another. Adding finalization (ie. a call from finalize() to close()) effectively introduces a new thread into the mix, namely the finalization thread. Whilst you're correct in saying that, in general, InputStream gives no thread-safe guarantees, in this particular case I believe it is valid to make sure updates to this field are seen across threads, to prevent the logic of close() being run twice. I chose to synchronize on ZipFile.this merely for consistency - ie. because that's what ZipFileInputStream (the other InputStream impl in ZipFile) does. I suppose I felt the risks of creating some kind of obscure deadlock scenario was minimised by copying the existing pattern. I can be easily swayed if you think there's a better object to synchronize on, that doesn't run an increased risk of deadlock. (2) Is there any particular reason that we don't want to reuse the Inflater(), if the close comes from the finalize() in your testing evn? One thing we noticed before is that the moment the InputStream gets finalized, the Inflater embedded might have been finalized already, this is the reason why we have a inf.ended() check in releaseInflater(). The ZipFileInflaterInputStream object's finalize() method is called by the finalization thread, after it has been added to the finalization queue by GC. If *it* has been added to the finalization queue (because it's eligible for GC), then so will the Inflater object it is referencing. As finalization gives no guarantees as to the order in which objects are finalized, regardless of whether finalize() has been called on the Inflater object, it is nevertheless the case that its method *will* be called at some point, so the Inflater object *will* be ended. If it has been put back into the inflater cache (because ZFIIS's finalize happened to be called first), then this ended inflater will lie in wait for another ZFIIS to use it, which would result in a particularly unexpected exception being thrown. In general, I believe it to be quite frowned on to resurrect objects which are going through finalization (including those queued up for finalization) and place them back into the live set of object, as weird behaviour often ensues if one does. (Especially as, iirc, finalize() is only called once per object, regardless of whether it is dug back up in this fashion). (3) The attached test case does f.close() in its finally block, I guess it should be f.delete()? When I wrote the testcase, I relied upon File.deleteOnExit() to take care of cleaning up the temporary file(s) created. However, I recall now that I don't think it guarantees deletion on abnormal termination. I agree that calling f.delete() instead of f.close() would be a good improvement to the testcase. Please find below an updated changeset, with this change made and the bug number filled in. Hope this helps to clarify things, Thanks again, Neil -- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU # HG changeset patch # User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com # Date 1300289208 0 # Branch zip-heap # Node ID 8ab5aa0669a176d95502d9cf68027ae9679ccab2 # Parent 554adcfb615e63e62af530b1c10fcf7813a75b26 7031076: Retained ZipFile InputStreams increase heap demand Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd Contributed-by: neil.richa...@ngmr.net diff --git a/src/share/classes/java/util/zip/ZipFile.java b/src/share/classes/java/util/zip/ZipFile.java --- a/src/share/classes/java/util/zip/ZipFile.java +++ b/src/share/classes/java/util/zip/ZipFile.java @@ -30,11 +30,14 @@ import java.io.IOException; import java.io.EOFException; import java.io.File; +import java.lang.ref.ReferenceQueue; +import java.lang.ref.WeakReference; import java.nio.charset.Charset; import java.util.Vector; import java.util.Enumeration; import java.util.Set; import java.util.HashSet; +import java.util.Iterator; import java.util.NoSuchElementException; import java.security.AccessController; import sun.security.action.GetPropertyAction; @@ -315,7 +318,16 @@ private static native void freeEntry(long jzfile, long jzentry); // the outstanding inputstreams that need to
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
On 03/30/2011 12:53 PM, Neil Richards wrote: Hi Sherman, Thanks for your review and comments on this. On Tue, 2011-03-29 at 12:05 -0700, Xueming Shen wrote: Hi Neil, It appears to be a regression in scenario you described (user application never close the input stream after use and the ZipFile instance being retained during the lifetime of the process). The proposed approach seems to solve this particular problem. Here are my comments regarding the patch. (1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, is it really necessary? Synchronization is used so that an update to 'isClosed' on one thread is seen by another. Adding finalization (ie. a call from finalize() to close()) effectively introduces a new thread into the mix, namely the finalization thread. Whilst you're correct in saying that, in general, InputStream gives no thread-safe guarantees, in this particular case I believe it is valid to make sure updates to this field are seen across threads, to prevent the logic of close() being run twice. Isn't it true that when the finalize()-close() gets invoked, there should be no strong reference anywhere else that you can use to invoke close() in other thread? ZipFileInputStream class has to sync on ZipFile.this, the read/close() methods are accessing the underlying/shared bits of the ZipFile. For ZipFileInflaterInputStream.close() case, even we want to make sure the isClose is synced, I would prefer to see a local lock, maybe simply make close() synchronized is better? -Sherman I chose to synchronize on ZipFile.this merely for consistency - ie. because that's what ZipFileInputStream (the other InputStream impl in ZipFile) does. I suppose I felt the risks of creating some kind of obscure deadlock scenario was minimised by copying the existing pattern. I can be easily swayed if you think there's a better object to synchronize on, that doesn't run an increased risk of deadlock. (2) Is there any particular reason that we don't want to reuse the Inflater(), if the close comes from the finalize() in your testing evn? One thing we noticed before is that the moment the InputStream gets finalized, the Inflater embedded might have been finalized already, this is the reason why we have a inf.ended() check in releaseInflater(). The ZipFileInflaterInputStream object's finalize() method is called by the finalization thread, after it has been added to the finalization queue by GC. If *it* has been added to the finalization queue (because it's eligible for GC), then so will the Inflater object it is referencing. As finalization gives no guarantees as to the order in which objects are finalized, regardless of whether finalize() has been called on the Inflater object, it is nevertheless the case that its method *will* be called at some point, so the Inflater object *will* be ended. If it has been put back into the inflater cache (because ZFIIS's finalize happened to be called first), then this ended inflater will lie in wait for another ZFIIS to use it, which would result in a particularly unexpected exception being thrown. In general, I believe it to be quite frowned on to resurrect objects which are going through finalization (including those queued up for finalization) and place them back into the live set of object, as weird behaviour often ensues if one does. (Especially as, iirc, finalize() is only called once per object, regardless of whether it is dug back up in this fashion). (3) The attached test case does f.close() in its finally block, I guess it should be f.delete()? When I wrote the testcase, I relied upon File.deleteOnExit() to take care of cleaning up the temporary file(s) created. However, I recall now that I don't think it guarantees deletion on abnormal termination. I agree that calling f.delete() instead of f.close() would be a good improvement to the testcase. Please find below an updated changeset, with this change made and the bug number filled in. Hope this helps to clarify things, Thanks again, Neil
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Hi Neil, It appears to be a regression in scenario you described (user application never close the input stream after use and the ZipFile instance being retained during the lifetime of the process). The proposed approach seems to solve this particular problem. Here are my comments regarding the patch. (1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, is it really necessary? given the rest of the implementation doesn't guarantee the returned InputStream object is thread-safe (the implementation does make sure the access to the underlying ZipFile is thread-safe, though). The new finalize() now also invokes close(), which means the ZipFile object now gets locked twice (assume the good manner application does invoke the close() after use) for each InputStream after use. (2) Is there any particular reason that we don't want to reuse the Inflater(), if the close comes from the finalize() in your testing evn? One thing we noticed before is that the moment the InputStream gets finalized, the Inflater embedded might have been finalized already, this is the reason why we have a inf.ended() check in releaseInflater(). The disadvantage/limitation of existing Inflater caching mechanism is that it does not have a cap to limit the maximum number of inflater it should cache. Are you worried too many inflaters get cached, if they only get collected during GC/finalize()? This actually indicates that the cache mechanism does not work at all in your scenario, even with the proposed fix. An alternative solution (better?) is to make ZFIIS to check the bytes read during its read() method, and pro-actively close the stream when it reaches the end of the stream, same as what ZipFileInputStream does, which I believe should solve the problem in most use scenario (except, the application does not bother reading the InputSteeam to its end, which is unlikely). (3) The attached test case does f.close() in its finally block, I guess it should be f.delete()? -Sherman On 03/23/2011 02:46 PM, Neil Richards wrote: Hi, I've noticed that the fix introduced to address bug 6735255 [1] [2] had an unfortunate side-effect. That fix stores references to the InputStream objects returned from ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that the close() method may be called upon those objects when ZipFile.close() is called. It does this to conform to the existing API description, and to avoid a native memory leak which would occur if the InputStream is GC'd without its close() being called. However, by holding these InputStreams in a set within their ZipFile object, their lifecycle is now prolonged until the ZipFile object either has its close() method called, or is finalized (prior to GC). I've found scenarios (in user code) were ZipFile objects are held onto for a long time (eg. the entire lifetime of the process), but where InputStream objects from that ZipFile (for individual entries in the zip file) are obtained, used, then abandoned on a large number of occasions. (An example here might be a user-defined, long-lasting class loader, which might retain a ZipFile instance for each jar file in its search path, but which will create transitory input streams to read out particular files from those (large) jar files). I see that the InputStream objects returned from ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of sizeable byte arrays, one 512 bytes in size, the other 1002 bytes. So by prolonging the life span of these objects, the fix for 6735255 can inadvertently cause a significant increase in the demand/usage on the heap (in practice, running into many Mb's). (This is not so if the user code is good-mannered enough to explicitly always call close() on the InputStreams in question. However, experience shows that user code cannot be relied upon to behave so benignly). I believe this introduced problem can be addressed by: 1. Holding references to the InputStreams in the 'streams' Set within ZipFile via WeakReferences, so they may be GC'd as soon as they are no longer externally (strongly) referenced. 2. Adding finalization logic to the InputStream implementation classes, which ensures their close() method is called prior to GC. 3. Prevent Inflater objects from being returned to the pool (in the ZipFile object) if close() has been called from finalize(). To that end, please find below an 'hg export' of a proposed change which implements these aspects, together with a testcase to demonstrate the problem/fix. Any comments / suggestions on this gratefully received, Thanks, Neil [1] http://bugs.sun.com/view_bug.do?bug_id=6735255 [2] http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/49478a651a28
Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand
Neil Richards wrote: Hi, I've noticed that the fix introduced to address bug 6735255 [1] [2] had an unfortunate side-effect. That fix stores references to the InputStream objects returned from ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that the close() method may be called upon those objects when ZipFile.close() is called. It does this to conform to the existing API description, and to avoid a native memory leak which would occur if the InputStream is GC'd without its close() being called. However, by holding these InputStreams in a set within their ZipFile object, their lifecycle is now prolonged until the ZipFile object either has its close() method called, or is finalized (prior to GC). I've created 7031076 to track this and generated a webrev from the change-set that you attached: http://cr.openjdk.java.net/~alanb/7031076/webrev/ Sherman mentioned off-list that he plans to look at this. -Alan.