RE: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-26 Thread Jeroen Frijters
Xueming Shen wrote:
 I'm not a GC guy, so I might be missing something here, but if close()
 is being explicitly invoked by some thread, means someone has a strong
 reference to it, I don't think the finalize() can kick in until that
 close() returns

This is not correct. You can re-publish the reference from another finalizer 
method and thereby allow another thread to access the object concurrently with 
the finalizer.

Here's a possible sequence of events:
1) GC runs and determines that A and B are finalizable
2) Finalizer thread run A.finalize()
3) A.finalize publishes reference to B in static variable
4a) Another thread reads the static variable and calls B.close()
4b) Finalizer thread runs B.finalize()

Regards,
Jeroen



Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-26 Thread David Holmes

Neil,

You now have concurrency issues again. isClosed would need to be 
volatile if the methods that set/check it aren't synchronized. Multiple 
threads could call close() at the same time, and threads can call 
available etc after close has finished its main work.


David

Neil Richards said the following on 04/08/11 22:36:

On Thu, 2011-04-07 at 16:02 -0700, Xueming Shen wrote:
It appears it might not be necessary to do the finalize() in 
ZipFileInflaterInputStream. The ZipFileInflaterInputStream itself does

not directly hold any native resource by itself that needs to be
released at the end of its life circle, if not closed explicitly. The
native resource/memory that need to be taken care of are held by its
fields inf and zfin, which should be finalized by the
corresponding finalize() of their own classes (again, if not closed
explicitly), when their outer ZFIIS object is unreachable. The Inflater 
class has its own finalize() implemented already to invoke its cleanup

method end(), so the only thing need to be addressed is to add the
finalize() into ZipFileInputStream class to call its close(), strictly
speaking this issue is not the regression caused by #6735255, we have
this leak before #6735255.

Also, would you like to consider to use WeakHeapMapInputStream, Void 
instead of handling all the weak reference impl by yourself, the bonus

would be that the stalled entries might be cleaned up more frequently.



Hi Sherman,
Thanks for your continuing analysis of this change.

I concur with your assessment above, and agree that making the suggested
modifications to the changeset results in the code being simpler and
clearer.

Please find below an updated changeset incorporating these suggestions
(and rebased off jdk7-b136),

Let me know if you need anything else to progress this fix forward,
Thanks,
Neil



Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-26 Thread Neil Richards
On Mon, 2011-04-18 at 13:33 +0100, Neil Richards wrote:
 I think it's worthwhile trying to clear 'streams' entries from a
 finalize method in ZFIIS, to help ensure they are cleared in a timely
 manner.

In fact, it's probably clearer (more closely following the pattern
elsewhere in the file) for ZFIIS.finalize() to call ZFIIS.close().

See changeset below with this additional tweak,
Please let me know what you think on this,
Thanks,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID 9e0a3d49e4978264054cbacc9d3023715f1776d3
# Parent  aa13e7702cd9d8aca9aa38f1227f966990866944
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff -r aa13e7702cd9 -r 9e0a3d49e497 
src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java  Tue Mar 29 20:19:55 
2011 -0700
+++ b/src/share/classes/java/util/zip/ZipFile.java  Wed Mar 16 15:26:48 
2011 +
@@ -31,11 +31,13 @@
 import java.io.EOFException;
 import java.io.File;
 import java.nio.charset.Charset;
-import java.util.Vector;
+import java.util.ArrayDeque;
+import java.util.Deque;
 import java.util.Enumeration;
-import java.util.Set;
-import java.util.HashSet;
+import java.util.HashMap;
+import java.util.Map;
 import java.util.NoSuchElementException;
+import java.util.WeakHashMap;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
 import static java.util.zip.ZipConstants64.*;
@@ -54,7 +56,7 @@
 private long jzfile;  // address of jzfile data
 private String name;  // zip file name
 private int total;// total number of entries
-private boolean closeRequested;
+private volatile boolean closeRequested = false;
 
 private static final int STORED = ZipEntry.STORED;
 private static final int DEFLATED = ZipEntry.DEFLATED;
@@ -314,8 +316,9 @@
 // freeEntry releases the C jzentry struct.
 private static native void freeEntry(long jzfile, long jzentry);
 
-// the outstanding inputstreams that need to be closed.
-private SetInputStream streams = new HashSet();
+// the outstanding inputstreams that need to be closed, 
+// mapped to the inflater objects they use.
+private final MapInputStream, Inflater streams = new WeakHashMap();
 
 /**
  * Returns an input stream for reading the contents of the specified
@@ -351,51 +354,21 @@
 
 switch (getEntryMethod(jzentry)) {
 case STORED:
-streams.add(in);
+synchronized (streams) {
+streams.put(in, null);
+}
 return in;
 case DEFLATED:
-final ZipFileInputStream zfin = in;
 // MORE: Compute good size for inflater stream:
 long size = getEntrySize(jzentry) + 2; // Inflater likes a bit 
of slack
 if (size  65536) size = 8192;
 if (size = 0) size = 4096;
-InputStream is = new InflaterInputStream(zfin, getInflater(), 
(int)size) {
-private boolean isClosed = false;
-
-public void close() throws IOException {
-if (!isClosed) {
-super.close();
-releaseInflater(inf);
-isClosed = true;
-}
-}
-// Override fill() method to provide an extra dummy byte
-// at the end of the input stream. This is required when
-// using the nowrap Inflater option.
-protected void fill() throws IOException {
-if (eof) {
-throw new EOFException(
-Unexpected end of ZLIB input stream);
-}
-len = this.in.read(buf, 0, buf.length);
-if (len == -1) {
-buf[0] = 0;
-len = 1;
-eof = true;
-}
-inf.setInput(buf, 0, len);
-}
-private boolean eof;
-
-public int available() throws IOException {
-if (isClosed)
-return 0;
-long avail = zfin.size() - inf.getBytesWritten();
-return avail  (long) Integer.MAX_VALUE ?
-Integer.MAX_VALUE : (int) avail;
-}
-  

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-26 Thread Neil Richards
On Fri, 2011-04-08 at 20:53 -0700, Xueming Shen wrote:
 Regression test /test/java/util/zip/ZipFile/FinalizeInflater.java failed 
 with this fix.
 
 The possible solution I can think of now is to do nothing but simply 
 return the inflater back to the cache, but do an additional sanity
 check when check out
 
 private Inflater getInflater() {
 
  synchronized (inflaters) {
  int size = inflaters.size();
  while ((size = inflaters.size())  0) {
  Inflater inf = inflaters.remove(size - 1);
  if (!inf.ended())
  return inf;
  }
  return new Inflater(true);
  }
 }
 
 
 http://cr.openjdk.java.net/~sherman/7031076/webrev/
 
 What do you think? have a better idea?

Whilst checking on check out might hope to reduce the timing window, I
fear that doing so will not entirely eliminate it.

(The timing window is essentially between the point the inflater object
is placed on the finalization queue for finalization and the point at
which that finalization has actually occurred. In that window, if any
implementation or user code can cause it to be added to the cache, it
could also be retrieved from the cache prior to the point the
Finalization thread gets round to calling Inflater.finalize()).

Instead, in order to safely cache and reuse inflater objects in ZipFile,
I believe it to be necessary for the ZipFile to prevent them from
becoming eligible for finalization/GC (until a decision is made not to
put them back in the inflater cache due to their having been explicitly
ended).

It can do this by holding references to the inflater objects that are in
use by its input streams, as well as those in the cache (ie. those
currently unused).

(NB: Previously, the Inflater objects happened to be preserved in this
way, due to the ZipFile keeping alive the InputStream objects, which
have references to the inflater objects. Thus the challenge is to
preserve the Inflater objects - so that they can be cached and reused -
without hitting the heap by prolonging the life of the InputStream
objects).

ZipFile must also only return an inflater to the inflater cache at the
point where its end() method can no longer be explicitly called
(directly or indirectly) by the user - such as i done in the
InflaterFinalization testscase.
In other words, it's addition to the cache should only be considered
once the input stream that was using it has been completely GC'd.


Please see below another formulation of the suggested change which
incorporates these features.

It adapts the 'streams' map to hold references to the in-use inflater
objects as values, and reverts to explicitly dealing with the cleanup of
stale entries in the map so as to use that point to drive
releaseInflater() with the stored value.

(I have chosen to hold these values 'softly', so that they may be
released by GC if they are not reused - by being given to the cache - in
a timely manner. If the input stream is not explicitly closed, an
inflater object may not be reset() until the call to releaseInflater()
from clearStaleStreams() - ie. it may continue to hold onto its old
buffer until then. As clearStaleStreams() is only driven when a new
InputStream is requested, this may be retained for some time. By holding
the inflater 'softly', the system can decide to release the inflater's
resources by GC'ing it, if heap demand is high, or the system has not
asked for new InputStreams from this ZipFile for a while).

With releaseInflater() being driven from the cleanup of stale 'streams'
entries, it no longer needs to be called from ZFIIS.close(), so,
instead, only Inflater.reset() is called from there (providing the
inflater object has not explicitly been ended) so that it releases the
buffer it has been holding.

This allows the synchronization to be simplified, as there's no longer a
reason to synchronize on the inflater cache independently of that on the
ZipFile itself.

(I've also changed the inflater cache to use a Deque, which allows its
code to be simplified and should be (marginally) more efficient,
according to the API doc).

Also, based on the previous observation that object's finalize() methods
should concentrate on freeing off their own associated system resources,
I've modified ZipFile's close() and finalize() methods such that they
both call closeFile(), which deals with closing its system resources).


I believe, with these changes, the handling of the caching  reuse of
ZipFile's Inflater objects is handled in a safe manner, whilst still
allowing its InputStreams to be GC'd in a timely manner.

Please get back to me with any comments, questions or suggestions on
this,
Thanks,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-26 Thread Neil Richards
On 18 April 2011 18:32, Xueming Shen xueming.s...@oracle.com wrote:
  Hi Neil,

 All tests passed.

 I'm starting to push your last patch. I generated the webrev at

 http://cr.openjdk.java.net/~sherman/7031076/webrev/

 It should be exactly the same as your last patch.

 Thanks,
 Sherman

Hi Sherman,
Thanks for all your advice and help in reaching (and committing) a
resolution for this issue - it really is much appreciated,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-18 Thread Neil Richards
On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote:
 Opinion? anything I overlooked/missed?

Hi Sherman,
Thanks once more for all your help and advice on this - I'm in favour of
almost all of what you suggest. :-)

I think it's worthwhile trying to clear 'streams' entries from a
finalize method in ZFIIS, to help ensure they are cleared in a timely
manner.

I don't think the Inflater objects need to be held softly in
'inflaterCache', as they will have been reset() at the point they are
put into that cache (in releaseInflater()).

(I was holding them softly due to a worry over any delay in clearing
their 'buf' reference, which is done when reset() is called. Now that
the 'streams' entries are being cleared from close() and finalize() -
ie. up front - I think this worry is adequately addressed.)


I'm worried about changing the object 'streams' refers to in
ZipFile.close(), as threads are using that object to synchronize
against.
I don't believe it is currently giving the guarantee against
ConcurrentModificationException being seen from the iterator that I
believe you intend it to be, as other threads, calling (for example)
ZFIIS.close() at the same time will not be guaranteed to see the update
to the value of 'streams'.

Instead, I've modified this code so that a real copy of 'streams' is
produced, by calling 'new HashMap(streams)'. It is this copy that is
iterated through to close() the InputStreams and end() the Inflaters.
Once the copy is obtained, 'streams' can be safely clear()'d.

Because it is not clear that a Collections.synchronizedMap will give
consistent results when fed into the constructor of HashMap, I've used
explicit synchronization on 'streams' instead, to ensure 'streams' isn't
modified during the construction of 'copy'.
As each of the calls to InputStream.close() will synchronize on
'streams' to call its remove() method, I've held that monitor whilst
those calls are made.


Other minor tweaks are changing 'isClosed' fields to
'closeRequested' (to better describe their use now), changing
'ZipFile.closeRequested' to be volatile (so it can be checked before
synchronization, and so that it conforms to the pattern now established
elsewhere in the file), and reducing the synchronization block in
releaseInflater() (only the call to 'inflateCache.add()' need be
protected by synchronization on 'inflaterCache').

Please find below the resultant changeset,
Let me know what you think,
Thanks,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID c1c7b2dfa0091938bebc0727224690afc892a0b7
# Parent  aa13e7702cd9d8aca9aa38f1227f966990866944
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff -r aa13e7702cd9 -r c1c7b2dfa009 
src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java  Tue Mar 29 20:19:55 
2011 -0700
+++ b/src/share/classes/java/util/zip/ZipFile.java  Wed Mar 16 15:26:48 
2011 +
@@ -31,11 +31,13 @@
 import java.io.EOFException;
 import java.io.File;
 import java.nio.charset.Charset;
-import java.util.Vector;
+import java.util.ArrayDeque;
+import java.util.Deque;
 import java.util.Enumeration;
-import java.util.Set;
-import java.util.HashSet;
+import java.util.HashMap;
+import java.util.Map;
 import java.util.NoSuchElementException;
+import java.util.WeakHashMap;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
 import static java.util.zip.ZipConstants64.*;
@@ -54,7 +56,7 @@
 private long jzfile;  // address of jzfile data
 private String name;  // zip file name
 private int total;// total number of entries
-private boolean closeRequested;
+private volatile boolean closeRequested = false;
 
 private static final int STORED = ZipEntry.STORED;
 private static final int DEFLATED = ZipEntry.DEFLATED;
@@ -314,8 +316,9 @@
 // freeEntry releases the C jzentry struct.
 private static native void freeEntry(long jzfile, long jzentry);
 
-// the outstanding inputstreams that need to be closed.
-private SetInputStream streams = new HashSet();
+// the outstanding inputstreams that need to be closed, 
+// mapped to the inflater objects they use.
+private final MapInputStream, Inflater streams = new WeakHashMap();
 
 /**
  * Returns an input stream for reading the contents of the specified
@@ -351,51 +354,21 @@
 
 switch (getEntryMethod(jzentry)) {
 case STORED:
-streams.add(in);
+synchronized (streams) {
+streams.put(in, null);
+}
  

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-18 Thread Xueming Shen

 On 4/18/2011 5:33 AM, Neil Richards wrote:

On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote:

Opinion? anything I overlooked/missed?

Hi Sherman,
Thanks once more for all your help and advice on this - I'm in favour of
almost all of what you suggest. :-)

I think it's worthwhile trying to clear 'streams' entries from a
finalize method in ZFIIS, to help ensure they are cleared in a timely
manner.

I don't think the Inflater objects need to be held softly in
'inflaterCache', as they will have been reset() at the point they are
put into that cache (in releaseInflater()).

(I was holding them softly due to a worry over any delay in clearing
their 'buf' reference, which is done when reset() is called. Now that
the 'streams' entries are being cleared from close() and finalize() -
ie. up front - I think this worry is adequately addressed.)


I'm worried about changing the object 'streams' refers to in
ZipFile.close(), as threads are using that object to synchronize
against.
I don't believe it is currently giving the guarantee against
ConcurrentModificationException being seen from the iterator that I
believe you intend it to be, as other threads, calling (for example)
ZFIIS.close() at the same time will not be guaranteed to see the update
to the value of 'streams'.

Instead, I've modified this code so that a real copy of 'streams' is
produced, by calling 'new HashMap(streams)'. It is this copy that is
iterated through to close() the InputStreams and end() the Inflaters.
Once the copy is obtained, 'streams' can be safely clear()'d.

Because it is not clear that a Collections.synchronizedMap will give
consistent results when fed into the constructor of HashMap, I've used
explicit synchronization on 'streams' instead, to ensure 'streams' isn't
modified during the construction of 'copy'.
As each of the calls to InputStream.close() will synchronize on
'streams' to call its remove() method, I've held that monitor whilst
those calls are made.


(resent, got some mail issue)

Hi Neil,

If you are now explicitly synchronize on streams everywhere, I don't 
think we even need a copy

at close().

To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I 
should think it again), but
given we are now using WeakHashMap, those weak reference will get 
expunged every time that
map gets accessed, I doubt that really help lots (have to wait for the 
GC kicks in anyway, if not

closed explicitly)

I'm running tests now, with the copy mentioned above.

-Sherman




Other minor tweaks are changing 'isClosed' fields to
'closeRequested' (to better describe their use now), changing
'ZipFile.closeRequested' to be volatile (so it can be checked before
synchronization, and so that it conforms to the pattern now established
elsewhere in the file), and reducing the synchronization block in
releaseInflater() (only the call to 'inflateCache.add()' need be
protected by synchronization on 'inflaterCache').

Please find below the resultant changeset,
Let me know what you think,
Thanks,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-18 Thread Xueming Shen

 Hi Neil,

All tests passed.

I'm starting to push your last patch. I generated the webrev at

http://cr.openjdk.java.net/~sherman/7031076/webrev/

It should be exactly the same as your last patch.

Thanks,
Sherman

On 4/18/2011 9:49 AM, Xueming Shen wrote:

 On 4/18/2011 5:33 AM, Neil Richards wrote:

On Thu, 2011-04-14 at 14:48 -0700, Xueming Shen wrote:

Opinion? anything I overlooked/missed?

Hi Sherman,
Thanks once more for all your help and advice on this - I'm in favour of
almost all of what you suggest. :-)

I think it's worthwhile trying to clear 'streams' entries from a
finalize method in ZFIIS, to help ensure they are cleared in a timely
manner.

I don't think the Inflater objects need to be held softly in
'inflaterCache', as they will have been reset() at the point they are
put into that cache (in releaseInflater()).

(I was holding them softly due to a worry over any delay in clearing
their 'buf' reference, which is done when reset() is called. Now that
the 'streams' entries are being cleared from close() and finalize() -
ie. up front - I think this worry is adequately addressed.)


I'm worried about changing the object 'streams' refers to in
ZipFile.close(), as threads are using that object to synchronize
against.
I don't believe it is currently giving the guarantee against
ConcurrentModificationException being seen from the iterator that I
believe you intend it to be, as other threads, calling (for example)
ZFIIS.close() at the same time will not be guaranteed to see the update
to the value of 'streams'.

Instead, I've modified this code so that a real copy of 'streams' is
produced, by calling 'new HashMap(streams)'. It is this copy that is
iterated through to close() the InputStreams and end() the Inflaters.
Once the copy is obtained, 'streams' can be safely clear()'d.

Because it is not clear that a Collections.synchronizedMap will give
consistent results when fed into the constructor of HashMap, I've used
explicit synchronization on 'streams' instead, to ensure 'streams' isn't
modified during the construction of 'copy'.
As each of the calls to InputStream.close() will synchronize on
'streams' to call its remove() method, I've held that monitor whilst
those calls are made.


(resent, got some mail issue)

Hi Neil,

If you are now explicitly synchronize on streams everywhere, I don't 
think we even need a copy

at close().

To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I 
should think it again), but
given we are now using WeakHashMap, those weak reference will get 
expunged every time that
map gets accessed, I doubt that really help lots (have to wait for the 
GC kicks in anyway, if not

closed explicitly)

I'm running tests now, with the copy mentioned above.

-Sherman




Other minor tweaks are changing 'isClosed' fields to
'closeRequested' (to better describe their use now), changing
'ZipFile.closeRequested' to be volatile (so it can be checked before
synchronization, and so that it conforms to the pattern now established
elsewhere in the file), and reducing the synchronization block in
releaseInflater() (only the call to 'inflateCache.add()' need be
protected by synchronization on 'inflaterCache').

Please find below the resultant changeset,
Let me know what you think,
Thanks,
Neil







Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-18 Thread Neil Richards
On 18 April 2011 17:49, Xueming Shen xueming.s...@oracle.com wrote:
 If you are now explicitly synchronize on streams everywhere, I don't think
 we even need a copy
 at close().

I thought that the copy is needed so that the Map being iterated over
(in close()) is not able to be modified by the same thread via the
calls to InputStream.close() that it makes (which will try to call
streams.remove()), and hence avoid the risk of getting
ConcurrentModificationExceptions being thrown.

Are you sure that it is safe to avoid doing this copy ?

 To add close() in ZFIIS.finalize() appears to be safe now (?, maybe I
 should think it again), but
 given we are now using WeakHashMap, those weak reference will get expunged
 every time that
 map gets accessed, I doubt that really help lots (have to wait for the GC
 kicks in anyway, if not
 closed explicitly)

As I recall, WeakHashMap only looks to expunge stale entries when one
of its methods is called.
Therefore, causing it to be accessed (indirectly) from the finalize()
methods (by calling close()) helps to ensure that stale entries (with
the retained inflater's associated system resources) are not retained
for longer than needed.

This is true even if the particular access is unlikely to retrieve a
live entry.

Hope this provides some clarification to the suggested changeset,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-14 Thread Xueming Shen

Steve, Neil,

As we discussed in previous emails that Neil's last patch (to put the 
inflater into the streams as
the value of the map) does solve the FinalizeInflater failure issue 
(provides a orderly final cleanup
between the stream and the inflater. However to leave 
releaseInfalter() to be solely driven
by GC might delay the inflater to be cached/re-used in a timely manner, 
even in scenario that
those streams get closed explicitly by the invoker. Ideally it is 
desirable that the inflater can be

re-used/put back into the cache when ZFIIS.close() gets invoked explicitly.

It appears it might be too hard, if not impossible, to arrange the 
releaseInflater() to be invoked from
both clearStaleStreams()/staleStreamQueue and ZFIIS.close(), given the 
non-guarantee, by the VM/GC,
of the timing/order among the clear up of weak reference objects, 
enqueue them and the invocation of
finalize(). So I would suggest a relatively simple solution as showed in 
webrev at


http://cr.openjdk.java.net/~sherman/7031076/webrev/src/share/classes/java/util/zip/ZipFile.java.sdiff.html

(1) simply use the WeakHashMap to keep the InputStream, Inflater pair, 
we don't try
 to release/put the used inflater back to cache, if the ZFIIS 
stream does not get closed explicitly
 (its underlying ZFIS stream and the inflater will then get 
finalized by themself).


(2) release the inflater back to cache in ZFIIS.close(), only if the 
inflater is sill alive in the
 streams, which guarantees that the inflater is not being 
finalized and safe to put back
 to cache (so long as we have the weak reference - inflater pair 
in streams, it does not
 matter if the reference to ZFIIS object is cleared or not). In 
most cases, I would expect
 when the ZFIIS.close() is invoked, both ZFIIS object and its 
inflater are still reachable, so
 the inflater gets re-used as expected. In weird scenario like 
what we have in
 FinalizeInflater, the ZFIIS object and its outer wrapper object 
are being finalized when
 ZFIIS.close() gets invoked (from wrapper.finalize()), there is 
possibility that the inflater might
 not be put back to cache, if the weak reverence-inflater pair has 
been expunged out of the
 streams already (because the weak reference to ZFIIS might have 
been cleared, so its
 entry in streams can be expunged any time, by any thread. At 
this moment, the inflater
 object might be declared as un-reachable and no longer safe to 
put back into cache), but

 I think we can live with this missing.

(3) I realized that we actually are not removing the closed ZFIIS 
objects (only the ZFIS objects
 are removed when closed) from streams even though those input 
streams are closed
 gracefully by the invoker (as preferred, suggested, desired), 
which means all those ZFIIS
 streams will stay in streams (strong referenced in current 
implementation), no matter
 whether or not they get closed explicitly, until  the ZipFile 
object itself gets closed/finalized.
 This might contribute to the problem Neil observed at first place 
(the ZipFile InputStreams
 accumulat in heap), actually I do have another incident report 
#7033523 which I have

 closed as the dup of #7031076.

Opinion? anything I overlooked/missed?

-Sherman



On 04/13/2011 08:25 AM, Steve Poole wrote:

On 13/04/11 16:19, Steve Poole wrote:

On 12/04/11 20:33, Xueming Shen wrote:

Hi Neil,



Hi Sherman ,  Neil is out on vacation so  I will do my best to stand 
in for him.


(1) I believe it would be better to keep the synchronization lock 
for get/releaseInfalter()
 local instead of using the global ZipFile.this, which I 
agree is simple. But it also
 means each/every time when you release the used inflater back 
to cache, ZipFile.this
 has to be held and any possible/potential read operation on 
ZipFile from other thead/
 inputstream has to wait (get seems fine, the current impl 
holds the ZipFile.this

 anyway before reach the getInflater()).
Ok - I agree - seems sensible.  (Though I reserve the right to 
contradict myself later)


(2) The resource Infalter mainly holds is the native memory block 
it alloc-ed at native
 for zlib, which is not in the Java heap, so I doubt making it 
softly really helps for GC.
 Sure, cleanup of those objects themself is a plus, but I'm 
not sure if it is really worth
 using SoftReference in this case (it appears you are now 
invoking clearStaleStreams()

 from the finalize()).
I'd like to keep this in -  not all implementations of zlib are equal 
in where they allocate memory.


(3) The releaseInfalter() is now totally driven from 
clearStaleStreams()/staleStreamQueue,
 which is under full control of GC. If GC does not kick in for 
hours/days, infalters can never
 be put back into its cache after use, even the applications 
correctly/promptly close those
 streams after use. And when the GC kicks in, we might see 
bunch 

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-13 Thread Steve Poole

On 13/04/11 16:19, Steve Poole wrote:

On 12/04/11 20:33, Xueming Shen wrote:

Hi Neil,



Hi Sherman ,  Neil is out on vacation so  I will do my best to stand 
in for him.


(1) I believe it would be better to keep the synchronization lock for 
get/releaseInfalter()
 local instead of using the global ZipFile.this, which I 
agree is simple. But it also
 means each/every time when you release the used inflater back to 
cache, ZipFile.this
 has to be held and any possible/potential read operation on 
ZipFile from other thead/
 inputstream has to wait (get seems fine, the current impl 
holds the ZipFile.this

 anyway before reach the getInflater()).
Ok - I agree - seems sensible.  (Though I reserve the right to 
contradict myself later)


(2) The resource Infalter mainly holds is the native memory block 
it alloc-ed at native
 for zlib, which is not in the Java heap, so I doubt making it 
softly really helps for GC.
 Sure, cleanup of those objects themself is a plus, but I'm not 
sure if it is really worth
 using SoftReference in this case (it appears you are now 
invoking clearStaleStreams()

 from the finalize()).
I'd like to keep this in -  not all implementations of zlib are equal 
in where they allocate memory.


(3) The releaseInfalter() is now totally driven from 
clearStaleStreams()/staleStreamQueue,
 which is under full control of GC. If GC does not kick in for 
hours/days, infalters can never
 be put back into its cache after use, even the applications 
correctly/promptly close those
 streams after use. And when the GC kicks in, we might see 
bunch (hundreds, thousands)
 of inflaters get pushed into cache. The approach does solve the 
timing issue that got
 us here, but it also now has this native memory cache 
mechanism totally under the
 control of/driven by the GC, the java heap management mechanism, 
which might not be
 a preferred scenario. Just wonder if we can have a better 
choice, say with this GC-driven
 as the backup cleanup and meanwhile still have the ZFIIS.close() 
to do something to safely
 put the used inflater back to cache promptly. To put the 
infalter as the value of the streams
 map appears to be a good idea/start, now the infalter will not 
be targeted for finalized
 until the entry gets cleaned from the map, in which might in 
fact provide us a sort of
 orderly (between the stream and its inflater) clearup that 
the GC/finalizer can't
 guarantee. We still have couple days before we hit the 
deadline (to get this one in), so it

 might worth taking some time on this direction?



Dang! - pressed send when I meant save.   I understand your comments - 
on the face of it I agree with what you're suggesting - let me think it 
through some more though.

What is this deadline  you are talking about?


-Sherman





On 04/11/2011 05:15 AM, Neil Richards wrote:

On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote:
With releaseInflater() being driven from the cleanup of stale 
'streams'

entries, it no longer needs to be called from ZFIIS.close(), so,
instead, only Inflater.reset() is called from there (providing the
inflater object has not explicitly been ended) so that it releases the
buffer it has been holding.

Actually, I have a slight change of heart on this aspect.

Because close() may be driven from a finalize() method in user code (as
is done in the InflaterFinalizer test case), I believe it is possible
(albeit unlikely) for an inflater object to have been reclaimed from
'streams' (by a call to clearStaleStreams()), put into the inflater
cache, retrieved from there (by another thread creating an input 
stream)
and having started to be used by that other stream at the point at 
which

the code in close() is run.

(This is because weak references will be cleared, and *may* be 
queued on

the ReferenceQueue, prior to finalization).

Because of this, it's not actually entirely safe now to call 
inf.reset()

from ZFIIS.close().

Instead, I have added additional calls to clearStaleStreams() from the
finalize() methods of both InputStream implementations in the latest
(hopefully in the meaning of last) changeset included below.

As always, please get back to me with any comments, questions or
suggestions on this,
Thanks,
Neil









Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-13 Thread Xueming Shen

On 04/13/2011 08:25 AM, Steve Poole wrote:

On 13/04/11 16:19, Steve Poole wrote:

On 12/04/11 20:33, Xueming Shen wrote:

Hi Neil,



Hi Sherman ,  Neil is out on vacation so  I will do my best to stand 
in for him.


(1) I believe it would be better to keep the synchronization lock 
for get/releaseInfalter()
 local instead of using the global ZipFile.this, which I 
agree is simple. But it also
 means each/every time when you release the used inflater back 
to cache, ZipFile.this
 has to be held and any possible/potential read operation on 
ZipFile from other thead/
 inputstream has to wait (get seems fine, the current impl 
holds the ZipFile.this

 anyway before reach the getInflater()).
Ok - I agree - seems sensible.  (Though I reserve the right to 
contradict myself later)


(2) The resource Infalter mainly holds is the native memory block 
it alloc-ed at native
 for zlib, which is not in the Java heap, so I doubt making it 
softly really helps for GC.
 Sure, cleanup of those objects themself is a plus, but I'm 
not sure if it is really worth
 using SoftReference in this case (it appears you are now 
invoking clearStaleStreams()

 from the finalize()).
I'd like to keep this in -  not all implementations of zlib are equal 
in where they allocate memory.


Hi Steve,

It might be better/appropriate to use the SoftReference in the 
inflaterCache instead of the

streams, if we want to use the SoftReference for inflater caching purpose.

/*
 * Gets an inflater from the list of available inflaters or allocates
 * a new one.
 */
private synchronized Inflater getInflater() {
SoftReferenceInflater sref;
Inflater inf;
while (null != (sref = inflaterCache.poll())) {
if ((inf = sref.get()) != null  !inf.ended()) {
return inf;
}
}
return new Inflater(true);
}

/*
 * Releases the specified inflater to the list of available inflaters.
 */
private synchronized void releaseInflater(Inflater inf) {
if ((null != inf)  (false == inf.ended())) {
inf.reset();
inflaterCache.add(new SoftReference(inf));
}
}

// List of available Inflater objects for decompression
private DequeSoftReferenceInflater inflaterCache = new 
ArrayDeque();


-Sherman



(3) The releaseInfalter() is now totally driven from 
clearStaleStreams()/staleStreamQueue,
 which is under full control of GC. If GC does not kick in for 
hours/days, infalters can never
 be put back into its cache after use, even the applications 
correctly/promptly close those
 streams after use. And when the GC kicks in, we might see 
bunch (hundreds, thousands)
 of inflaters get pushed into cache. The approach does solve the 
timing issue that got
 us here, but it also now has this native memory cache 
mechanism totally under the
 control of/driven by the GC, the java heap management 
mechanism, which might not be
 a preferred scenario. Just wonder if we can have a better 
choice, say with this GC-driven
 as the backup cleanup and meanwhile still have the 
ZFIIS.close() to do something to safely
 put the used inflater back to cache promptly. To put the 
infalter as the value of the streams
 map appears to be a good idea/start, now the infalter will 
not be targeted for finalized
 until the entry gets cleaned from the map, in which might in 
fact provide us a sort of
 orderly (between the stream and its inflater) clearup 
that the GC/finalizer can't
 guarantee. We still have couple days before we hit the 
deadline (to get this one in), so it

 might worth taking some time on this direction?



Dang! - pressed send when I meant save.   I understand your comments - 
on the face of it I agree with what you're suggesting - let me think 
it through some more though.

What is this deadline  you are talking about?


-Sherman





On 04/11/2011 05:15 AM, Neil Richards wrote:

On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote:
With releaseInflater() being driven from the cleanup of stale 
'streams'

entries, it no longer needs to be called from ZFIIS.close(), so,
instead, only Inflater.reset() is called from there (providing the
inflater object has not explicitly been ended) so that it releases 
the

buffer it has been holding.

Actually, I have a slight change of heart on this aspect.

Because close() may be driven from a finalize() method in user code 
(as

is done in the InflaterFinalizer test case), I believe it is possible
(albeit unlikely) for an inflater object to have been reclaimed from
'streams' (by a call to clearStaleStreams()), put into the inflater
cache, retrieved from there (by another thread creating an input 
stream)
and having started to be used by that other stream at the point at 
which

the code in close() is run.

(This is because weak references will be cleared, and *may* be 
queued on

the 

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-12 Thread Xueming Shen

Hi Neil,

(1) I believe it would be better to keep the synchronization lock for 
get/releaseInfalter()
 local instead of using the global ZipFile.this, which I agree 
is simple. But it also
 means each/every time when you release the used inflater back to 
cache, ZipFile.this
 has to be held and any possible/potential read operation on 
ZipFile from other thead/
 inputstream has to wait (get seems fine, the current impl holds 
the ZipFile.this

 anyway before reach the getInflater()).

(2) The resource Infalter mainly holds is the native memory block it 
alloc-ed at native
 for zlib, which is not in the Java heap, so I doubt making it 
softly really helps for GC.
 Sure, cleanup of those objects themself is a plus, but I'm not 
sure if it is really worth
 using SoftReference in this case (it appears you are now invoking 
clearStaleStreams()

 from the finalize()).

(3) The releaseInfalter() is now totally driven from 
clearStaleStreams()/staleStreamQueue,
 which is under full control of GC. If GC does not kick in for 
hours/days, infalters can never
 be put back into its cache after use, even the applications 
correctly/promptly close those
 streams after use. And when the GC kicks in, we might see bunch 
(hundreds, thousands)
 of inflaters get pushed into cache. The approach does solve the 
timing issue that got
 us here, but it also now has this native memory cache mechanism 
totally under the
 control of/driven by the GC, the java heap management mechanism, 
which might not be
 a preferred scenario. Just wonder if we can have a better choice, 
say with this GC-driven
 as the backup cleanup and meanwhile still have the ZFIIS.close() 
to do something to safely
 put the used inflater back to cache promptly. To put the infalter 
as the value of the streams
 map appears to be a good idea/start, now the infalter will not 
be targeted for finalized
 until the entry gets cleaned from the map, in which might in fact 
provide us a sort of
 orderly (between the stream and its inflater) clearup that 
the GC/finalizer can't
 guarantee. We still have couple days before we hit the deadline 
(to get this one in), so it

 might worth taking some time on this direction?

-Sherman





On 04/11/2011 05:15 AM, Neil Richards wrote:

On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote:

With releaseInflater() being driven from the cleanup of stale 'streams'
entries, it no longer needs to be called from ZFIIS.close(), so,
instead, only Inflater.reset() is called from there (providing the
inflater object has not explicitly been ended) so that it releases the
buffer it has been holding.

Actually, I have a slight change of heart on this aspect.

Because close() may be driven from a finalize() method in user code (as
is done in the InflaterFinalizer test case), I believe it is possible
(albeit unlikely) for an inflater object to have been reclaimed from
'streams' (by a call to clearStaleStreams()), put into the inflater
cache, retrieved from there (by another thread creating an input stream)
and having started to be used by that other stream at the point at which
the code in close() is run.

(This is because weak references will be cleared, and *may* be queued on
the ReferenceQueue, prior to finalization).

Because of this, it's not actually entirely safe now to call inf.reset()
from ZFIIS.close().

Instead, I have added additional calls to clearStaleStreams() from the
finalize() methods of both InputStream implementations in the latest
(hopefully in the meaning of last) changeset included below.

As always, please get back to me with any comments, questions or
suggestions on this,
Thanks,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-11 Thread Neil Richards
On Sun, 2011-04-10 at 18:28 +0100, Neil Richards wrote:
 With releaseInflater() being driven from the cleanup of stale 'streams'
 entries, it no longer needs to be called from ZFIIS.close(), so,
 instead, only Inflater.reset() is called from there (providing the
 inflater object has not explicitly been ended) so that it releases the
 buffer it has been holding.

Actually, I have a slight change of heart on this aspect.

Because close() may be driven from a finalize() method in user code (as
is done in the InflaterFinalizer test case), I believe it is possible
(albeit unlikely) for an inflater object to have been reclaimed from
'streams' (by a call to clearStaleStreams()), put into the inflater
cache, retrieved from there (by another thread creating an input stream)
and having started to be used by that other stream at the point at which
the code in close() is run.

(This is because weak references will be cleared, and *may* be queued on
the ReferenceQueue, prior to finalization).

Because of this, it's not actually entirely safe now to call inf.reset()
from ZFIIS.close().

Instead, I have added additional calls to clearStaleStreams() from the
finalize() methods of both InputStream implementations in the latest
(hopefully in the meaning of last) changeset included below.

As always, please get back to me with any comments, questions or
suggestions on this,
Thanks,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID b66c974bcfa957281e69a63cd4019a12c13cacae
# Parent  aa13e7702cd9d8aca9aa38f1227f966990866944
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff -r aa13e7702cd9 -r b66c974bcfa9 
src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java  Tue Mar 29 20:19:55 
2011 -0700
+++ b/src/share/classes/java/util/zip/ZipFile.java  Wed Mar 16 15:26:48 
2011 +
@@ -30,11 +30,16 @@
 import java.io.IOException;
 import java.io.EOFException;
 import java.io.File;
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.SoftReference;
+import java.lang.ref.WeakReference;
 import java.nio.charset.Charset;
-import java.util.Vector;
+import java.util.ArrayDeque;
+import java.util.Deque;
 import java.util.Enumeration;
-import java.util.Set;
-import java.util.HashSet;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.Map;
 import java.util.NoSuchElementException;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
@@ -314,8 +319,21 @@
 // freeEntry releases the C jzentry struct.
 private static native void freeEntry(long jzfile, long jzentry);
 
-// the outstanding inputstreams that need to be closed.
-private SetInputStream streams = new HashSet();
+// the outstanding inputstreams that need to be closed, 
+// mapped to the inflater objects they use.
+private final MapWeakReferenceInputStream, SoftReferenceInflater
+streams = new HashMap();
+private final ReferenceQueue? super InputStream staleStreamQueue =
+new ReferenceQueue();
+private static final SoftReferenceInflater NULL_INFLATER_REF =
+new SoftReference(null);
+
+private synchronized void clearStaleStreams() {
+Object staleStream;
+while (null != (staleStream = staleStreamQueue.poll())) {
+releaseInflater(streams.remove(staleStream).get());
+}
+}
 
 /**
  * Returns an input stream for reading the contents of the specified
@@ -339,6 +357,7 @@
 ZipFileInputStream in = null;
 synchronized (this) {
 ensureOpen();
+clearStaleStreams();
 if (!zc.isUTF8()  (entry.flag  EFS) != 0) {
 jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false);
 } else {
@@ -351,51 +370,21 @@
 
 switch (getEntryMethod(jzentry)) {
 case STORED:
-streams.add(in);
+streams.put(
+new WeakReferenceInputStream(in, staleStreamQueue),
+NULL_INFLATER_REF);
 return in;
 case DEFLATED:
-final ZipFileInputStream zfin = in;
 // MORE: Compute good size for inflater stream:
 long size = getEntrySize(jzentry) + 2; // Inflater likes a bit 
of slack
 if (size  65536) size = 8192;
 if (size = 0) size = 4096;
-InputStream is = new InflaterInputStream(zfin, getInflater(), 
(int)size) {
-private boolean isClosed = false;
-
-

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-08 Thread Neil Richards
Hi David,

You are correct that the ZipFileInflaterInputStream code is not inherently
thread-safe (without user synchronization), but then, the contract of
InputStream does not require it to be (as Sherman previously observed on
this thread).

The previous suggested fix caused ZFIIS to interact with the finalize
thread, thus it had to care about being thread-safe at least to the extent
of safe-guarding that interaction.

As Sherman observed, and I agreed, finalization on that object is not really
needed, so the inherent cross-thread interaction that would be introduced by
the finalize method can be avoided in that case, and so the need to cater
for inherent thread-safety can likewise be avoided.

With the latest suggested change, if the ZFIIS is driven by multiple
threads, it will be done at the user's (developer's) behest, so the user can
(should) know to implement suitable synchronization to ensure correct
behaviour.

Therefore, I believe the same lack of synchronization in ZFIIS as there was
in the previous anonymous inner class to be entirely justified in this case,
given that it no longer has a finalize() method.

Please let me know if you still disagree with this assessment,
Thanks,
Neil

(Sent from my phone, so probably missing my usual signature)


Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-08 Thread David Holmes

Neil Richards said the following on 04/09/11 10:04:
You are correct that the ZipFileInflaterInputStream code is not 
inherently thread-safe (without user synchronization), but then, the 
contract of InputStream does not require it to be (as Sherman previously 
observed on this thread).


The previous suggested fix caused ZFIIS to interact with the finalize 
thread, thus it had to care about being thread-safe at least to the 
extent of safe-guarding that interaction.


As Sherman observed, and I agreed, finalization on that object is not 
really needed, so the inherent cross-thread interaction that would be 
introduced by the finalize method can be avoided in that case, and so 
the need to cater for inherent thread-safety can likewise be avoided.


Ok.

David
--


With the latest suggested change, if the ZFIIS is driven by multiple 
threads, it will be done at the user's (developer's) behest, so the user 
can (should) know to implement suitable synchronization to ensure 
correct behaviour.


Therefore, I believe the same lack of synchronization in ZFIIS as there 
was in the previous anonymous inner class to be entirely justified in 
this case, given that it no longer has a finalize() method.


Please let me know if you still disagree with this assessment,
Thanks,
Neil

(Sent from my phone, so probably missing my usual signature)



Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-08 Thread Xueming Shen

Neil,

Regression test /test/java/util/zip/ZipFile/FinalizeInflater.java failed 
with this fix.


The possible solution I can think of now is to do nothing but simply 
return the inflater back

to the cache, but do an additional sanity check when check out

private Inflater getInflater() {

synchronized (inflaters) {
int size = inflaters.size();
while ((size = inflaters.size())  0) {
Inflater inf = inflaters.remove(size - 1);
if (!inf.ended())
return inf;
}
return new Inflater(true);
}
}


http://cr.openjdk.java.net/~sherman/7031076/webrev/

What do you think? have a better idea?

Sherman


On 04-08-2011 5:36 AM, Neil Richards wrote:

On Thu, 2011-04-07 at 16:02 -0700, Xueming Shen wrote:

It appears it might not be necessary to do the finalize() in
ZipFileInflaterInputStream. The ZipFileInflaterInputStream itself does
not directly hold any native resource by itself that needs to be
released at the end of its life circle, if not closed explicitly. The
native resource/memory that need to be taken care of are held by its
fields inf and zfin, which should be finalized by the
corresponding finalize() of their own classes (again, if not closed
explicitly), when their outer ZFIIS object is unreachable. The Inflater
class has its own finalize() implemented already to invoke its cleanup
method end(), so the only thing need to be addressed is to add the
finalize() into ZipFileInputStream class to call its close(), strictly
speaking this issue is not the regression caused by #6735255, we have
this leak before #6735255.

Also, would you like to consider to use WeakHeapMapInputStream, Void
instead of handling all the weak reference impl by yourself, the bonus
would be that the stalled entries might be cleaned up more frequently.


Hi Sherman,
Thanks for your continuing analysis of this change.

I concur with your assessment above, and agree that making the suggested
modifications to the changeset results in the code being simpler and
clearer.

Please find below an updated changeset incorporating these suggestions
(and rebased off jdk7-b136),

Let me know if you need anything else to progress this fix forward,
Thanks,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-07 Thread Neil Richards
On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote:
 1. If a call to close() occurs around the same time as finalization 
 occurs then the finalizer thread will set inFinalizer to true, at which 
 point the thread executing close() can see it is true and so may skip 
 the releaseInflater(inf) call.
 
 2. As isClosed is not volatile, and available() is not synchronized, a 
 thread calling available() on a closed stream may not see that it has 
 been closed and so will likely encounter an IOException rather than 
 getting a zero return.
 
 
 Even if #1 is not a practical problem I'd be inclined to make the 
 finalizer synchronized as well. By doing that you're effectively 
 ensuring that premature finalization of the stream won't happen.

I tend to agree, especially as it also makes the intention of the code
clearer.

 
 For #2 it is a tricky call. If you don't actually expect the stream 
 object to be used by multiple threads then using a synchronized block to 
 read isClosed will be cheap in VMs that go to great effort to reduce the 
 cost of unnecessary synchronization (eg biased-locking in hotspot). 
 Otherwise making isClosed volatile is likely the better option.

The check at the start of available() guards the logic beyond (which
uses a value from the inflater object, which would not be valid if the
stream has been closed()).

Because of this, I think it would be clearer to synchronize the
available() method too.

As you say, it is extremely likely that, in practice, this
synchronization will never be contended, and so won't impact performance
significantly (in modern VMs).

Please find below an updated changeset with these modifications,

Thanks for your advice in this,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID db52003c128c8706f45e0b2bf9f98d5e905d2090
# Parent  554adcfb615e63e62af530b1c10fcf7813a75b26
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff -r 554adcfb615e -r db52003c128c 
src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java  Wed Mar 16 15:01:07 
2011 -0700
+++ b/src/share/classes/java/util/zip/ZipFile.java  Wed Mar 16 15:26:48 
2011 +
@@ -30,11 +30,14 @@
 import java.io.IOException;
 import java.io.EOFException;
 import java.io.File;
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.WeakReference;
 import java.nio.charset.Charset;
 import java.util.Vector;
 import java.util.Enumeration;
 import java.util.Set;
 import java.util.HashSet;
+import java.util.Iterator;
 import java.util.NoSuchElementException;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
@@ -315,7 +318,16 @@
 private static native void freeEntry(long jzfile, long jzentry);
 
 // the outstanding inputstreams that need to be closed.
-private SetInputStream streams = new HashSet();
+private final SetWeakReferenceInputStream streams = new HashSet();
+private final ReferenceQueue? super InputStream staleStreamQueue =
+new ReferenceQueue();
+
+private void clearStaleStreams() {
+Object staleStream;
+while (null != (staleStream = staleStreamQueue.poll())) {
+streams.remove(staleStream);
+}
+}
 
 /**
  * Returns an input stream for reading the contents of the specified
@@ -339,6 +351,7 @@
 ZipFileInputStream in = null;
 synchronized (this) {
 ensureOpen();
+clearStaleStreams();
 if (!zc.isUTF8()  (entry.flag  EFS) != 0) {
 jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false);
 } else {
@@ -351,51 +364,19 @@
 
 switch (getEntryMethod(jzentry)) {
 case STORED:
-streams.add(in);
+streams.add(
+new WeakReferenceInputStream(in, staleStreamQueue));
 return in;
 case DEFLATED:
-final ZipFileInputStream zfin = in;
 // MORE: Compute good size for inflater stream:
 long size = getEntrySize(jzentry) + 2; // Inflater likes a bit 
of slack
 if (size  65536) size = 8192;
 if (size = 0) size = 4096;
-InputStream is = new InflaterInputStream(zfin, getInflater(), 
(int)size) {
-private boolean isClosed = false;
-
-public void close() throws IOException {
-if (!isClosed) {
-super.close();
-releaseInflater(inf);
-   

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-07 Thread David Holmes

Hi Neil,

Neil Richards said the following on 04/07/11 23:30:

On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote:
1. If a call to close() occurs around the same time as finalization 
occurs then the finalizer thread will set inFinalizer to true, at which 
point the thread executing close() can see it is true and so may skip 
the releaseInflater(inf) call.


2. As isClosed is not volatile, and available() is not synchronized, a 
thread calling available() on a closed stream may not see that it has 
been closed and so will likely encounter an IOException rather than 
getting a zero return.



Even if #1 is not a practical problem I'd be inclined to make the 
finalizer synchronized as well. By doing that you're effectively 
ensuring that premature finalization of the stream won't happen.


I tend to agree, especially as it also makes the intention of the code
clearer.

For #2 it is a tricky call. If you don't actually expect the stream 
object to be used by multiple threads then using a synchronized block to 
read isClosed will be cheap in VMs that go to great effort to reduce the 
cost of unnecessary synchronization (eg biased-locking in hotspot). 
Otherwise making isClosed volatile is likely the better option.


The check at the start of available() guards the logic beyond (which
uses a value from the inflater object, which would not be valid if the
stream has been closed()).

Because of this, I think it would be clearer to synchronize the
available() method too.

As you say, it is extremely likely that, in practice, this
synchronization will never be contended, and so won't impact performance
significantly (in modern VMs).

Please find below an updated changeset with these modifications,

Thanks for your advice in this,


No problem. From a synchronization perspective this all seems fine now.

Cheers,
David


Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-07 Thread Xueming Shen

Neil,

It appears it might not be necessary to do the finalize() in 
ZipFileInflaterInputStream. The
ZipFileInflaterInputStream itself does not directly hold any native 
resource by itself that
needs to be released at the end of its life circle, if not closed 
explicitly. The native resource/
memory that need to be taken care of are held by its fields inf and 
zfin, which should
be finalized by the corresponding finalize() of their own classes 
(again, if not closed
explicitly), when their outer ZFIIS object is unreachable. The Inflater 
class has its own finalize()
implemented already to invoke its cleanup method end(), so the only 
thing need to be addressed
is to add the finalize() into ZipFileInputStream class to call its 
close(), strictly speaking this issue
is not the regression caused by #6735255, we have this leak before 
#6735255.


Also, would you like to consider to use WeakHeapMapInputStream, Void 
instead of
handling all the weak reference impl by yourself, the bonus would be 
that the stalled entries

might be cleaned up more frequently.

Sherman


On 04/07/2011 06:30 AM, Neil Richards wrote:

On Mon, 2011-04-04 at 09:04 +1000, David Holmes wrote:

1. If a call to close() occurs around the same time as finalization
occurs then the finalizer thread will set inFinalizer to true, at which
point the thread executing close() can see it is true and so may skip
the releaseInflater(inf) call.

2. As isClosed is not volatile, and available() is not synchronized, a
thread calling available() on a closed stream may not see that it has
been closed and so will likely encounter an IOException rather than
getting a zero return.


Even if #1 is not a practical problem I'd be inclined to make the
finalizer synchronized as well. By doing that you're effectively
ensuring that premature finalization of the stream won't happen.

I tend to agree, especially as it also makes the intention of the code
clearer.


For #2 it is a tricky call. If you don't actually expect the stream
object to be used by multiple threads then using a synchronized block to
read isClosed will be cheap in VMs that go to great effort to reduce the
cost of unnecessary synchronization (eg biased-locking in hotspot).
Otherwise making isClosed volatile is likely the better option.

The check at the start of available() guards the logic beyond (which
uses a value from the inflater object, which would not be valid if the
stream has been closed()).

Because of this, I think it would be clearer to synchronize the
available() method too.

As you say, it is extremely likely that, in practice, this
synchronization will never be contended, and so won't impact performance
significantly (in modern VMs).

Please find below an updated changeset with these modifications,

Thanks for your advice in this,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-03 Thread David Holmes

Hi Neil,

Neil Richards said the following on 04/02/11 11:28:

David, I can't claim to have reached true enlightenment wrt the section
of the JLS that you point to - I suspect I'll need to lie down in a
darkened room with a dampened flannel across the brow and ponder the
nature of things a while to get there :-)

In the meantime, can you (or other knowledgeable GC folk) advise whether
it has any implications which would cause the current suggested fix to
be unsuitable?
(To recap, the current suggestion has the ZipFileInflaterInputStream's
close() method synchronized on itself, and called from its finalize()
method. The close() method has both the read and write of the 'isClosed'
field).

Any reassurance (or otherwise) gratefully accepted on this matter,


I don't fully understand the object relationships here so I'll just make 
a couple of observations on the synchronization in this code:


public synchronized void close() throws IOException {
if (!isClosed) {
super.close();
if (false == inFinalizer)
releaseInflater(inf);
isClosed = true;
}
}

public int available() throws IOException {
if (isClosed)
return 0;
long avail = zfin.size() - inf.getBytesWritten();
...
}

protected void finalize() throws IOException {
inFinalizer = true;
close();
}

1. If a call to close() occurs around the same time as finalization 
occurs then the finalizer thread will set inFinalizer to true, at which 
point the thread executing close() can see it is true and so may skip 
the releaseInflater(inf) call.


2. As isClosed is not volatile, and available() is not synchronized, a 
thread calling available() on a closed stream may not see that it has 
been closed and so will likely encounter an IOException rather than 
getting a zero return.



Even if #1 is not a practical problem I'd be inclined to make the 
finalizer synchronized as well. By doing that you're effectively 
ensuring that premature finalization of the stream won't happen.


For #2 it is a tricky call. If you don't actually expect the stream 
object to be used by multiple threads then using a synchronized block to 
read isClosed will be cheap in VMs that go to great effort to reduce the 
cost of unnecessary synchronization (eg biased-locking in hotspot). 
Otherwise making isClosed volatile is likely the better option.


HTH

David


Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-02 Thread Xueming Shen

 On 4/1/2011 11:56 PM, Jeroen Frijters wrote:

Xueming Shen wrote:

I'm not a GC guy, so I might be missing something here, but if close()
is being explicitly invoked by some thread, means someone has a strong
reference to it, I don't think the finalize() can kick in until that
close() returns

This is not correct. You can re-publish the reference from another finalizer 
method and thereby allow another thread to access the object concurrently with 
the finalizer.

Here's a possible sequence of events:
1) GC runs and determines that A and B are finalizable
2) Finalizer thread run A.finalize()
3) A.finalize publishes reference to B in static variable


Jeroen, are you talking about the object resurrection from finalize()?

How do you re-publish/get the reference to B inside A.finalize()? I 
think you can do that inside

B's finalize() to assign this to a global static variable.

Regard,
-Sherman


4a) Another thread reads the static variable and calls B.close()
4b) Finalizer thread runs B.finalize()

Regards,
Jeroen






Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-02 Thread Xueming Shen

On 04-02-2011 12:47 AM, Xueming Shen wrote:

 On 4/1/2011 11:56 PM, Jeroen Frijters wrote:

Xueming Shen wrote:

I'm not a GC guy, so I might be missing something here, but if close()
is being explicitly invoked by some thread, means someone has a strong
reference to it, I don't think the finalize() can kick in until that
close() returns
This is not correct. You can re-publish the reference from another 
finalizer method and thereby allow another thread to access the 
object concurrently with the finalizer.


Here's a possible sequence of events:
1) GC runs and determines that A and B are finalizable
2) Finalizer thread run A.finalize()
3) A.finalize publishes reference to B in static variable


Jeroen, are you talking about the object resurrection from finalize()?

How do you re-publish/get the reference to B inside A.finalize()? I 
think you can do that inside

B's finalize() to assign this to a global static variable.


Jeroen,

Guess I replied too quick. I think I got what you meant. A has a 
reference to B,  GC finalizes A
and B the same time, then A republish reference to B and let someone 
else to invoke b.close()..

I would have to admit it's a possible use scenario:-)

-Sherman



Regard,
-Sherman


4a) Another thread reads the static variable and calls B.close()
4b) Finalizer thread runs B.finalize()

Regards,
Jeroen








Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-02 Thread David Holmes

Xueming Shen said the following on 04/02/11 17:00:

 On 4/1/2011 4:17 PM, David Holmes wrote:

Xueming Shen said the following on 04/02/11 05:07:

...
explicit invocation is cleared. The InputStream is eligible for 
finalization only after it is

weakly reachable, means no more stronger reachable exists, right?


Actually no. One of the more obscure corner cases with finalization is 
that you can actually finalize an object that is still being used. The 
JLS actually spells this out - see section 12.6.1 and in particular 
the Discussion within that section.


The scenario that Neil and I were discussing is something like this,

There is class A

class A {
void close() {
...
}

protect void finalize() {
...
close();
}

}

when we are in the middle of A's close() (invoked by someone, not the 
finalizer), do we need to worry about that
A's finalize() being invoked (and then the close())  by the finalizer 
concurrently.


Does you an object that still being used include the scenario like 
above, which means an object became
finalizer-reachable,  when still in the middle of the execution (by some 
alive, non-finalizer-thread) of one of its

instance method body?

The JLS 12.6.1, if I read it correctly, is for scenario that a reachable 
object which is strongly referenced by a
stack reference can/may become finalizer-reachable sooner than it might 
be expected, for example, the
compiler optimization can null out such reference in the middle of the 
method body, so that object becomes
finalizer-reachable before the execution reach the return point of the 
method, or ... The execution discussed
is not the execution inside the target object's method body. Am I 
reading it correctly? Otherwise, it becomes a
little weird, image, you are in the middle of the execution of an 
instance method, suddenly, the instance itself

is being finalized, all the native resource get released...


Yes 12.6.1 refers to the case where the object is only 
strongly-reachable from a local stack reference. This includes the 
little weird scenario you describe. A thread can be executing 
a.close() where 'a' is a local stack reference, and 'a' can be finalized 
while that is heppening.


I'm not saying hotspot will actually do this, but it is permissible 
within the spec. This situation has been discussed quite a bit in the 
past on the Java Memory Model mailing list.


David


-Sherman






Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-02 Thread Xueming Shen

Thanks David!

And Neil,  it seems my assumption is wrong and we do need the 
synchronization for the close().

Your latest patch looks fine for me.

Thanks,
-Sherman

On 04-02-2011 6:15 PM, David Holmes wrote:

Xueming Shen said the following on 04/02/11 17:00:

 On 4/1/2011 4:17 PM, David Holmes wrote:

Xueming Shen said the following on 04/02/11 05:07:

...
explicit invocation is cleared. The InputStream is eligible for 
finalization only after it is

weakly reachable, means no more stronger reachable exists, right?


Actually no. One of the more obscure corner cases with finalization 
is that you can actually finalize an object that is still being 
used. The JLS actually spells this out - see section 12.6.1 and in 
particular the Discussion within that section.


The scenario that Neil and I were discussing is something like this,

There is class A

class A {
void close() {
...
}

protect void finalize() {
...
close();
}

}

when we are in the middle of A's close() (invoked by someone, not the 
finalizer), do we need to worry about that
A's finalize() being invoked (and then the close())  by the finalizer 
concurrently.


Does you an object that still being used include the scenario like 
above, which means an object became
finalizer-reachable,  when still in the middle of the execution (by 
some alive, non-finalizer-thread) of one of its

instance method body?

The JLS 12.6.1, if I read it correctly, is for scenario that a 
reachable object which is strongly referenced by a
stack reference can/may become finalizer-reachable sooner than it 
might be expected, for example, the
compiler optimization can null out such reference in the middle of 
the method body, so that object becomes
finalizer-reachable before the execution reach the return point of 
the method, or ... The execution discussed
is not the execution inside the target object's method body. Am I 
reading it correctly? Otherwise, it becomes a
little weird, image, you are in the middle of the execution of an 
instance method, suddenly, the instance itself

is being finalized, all the native resource get released...


Yes 12.6.1 refers to the case where the object is only 
strongly-reachable from a local stack reference. This includes the 
little weird scenario you describe. A thread can be executing 
a.close() where 'a' is a local stack reference, and 'a' can be 
finalized while that is heppening.


I'm not saying hotspot will actually do this, but it is permissible 
within the spec. This situation has been discussed quite a bit in the 
past on the Java Memory Model mailing list.


David


-Sherman








Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-01 Thread Neil Richards
On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote:
 Isn't it true that when the finalize()-close() gets invoked, there 
 should be no strong reference anywhere else that you can use to invoke
 close() in other thread?

It's true that once finalize() has been called, there can't be another
explicit call to close().
However, if close() is explicitly called first, it will be called again
when finalize() calls it, so one still wants the update to 'isClosed' to
be seen by the finalizer thread (in this case).

 ZipFileInputStream class has to sync on ZipFile.this, the read/close() 
 methods are accessing the underlying/shared bits of the ZipFile. For 
 ZipFileInflaterInputStream.close() case, even we want to make sure the
 isClose is synced,  I would prefer to see a local lock, maybe simply
 make close() synchronized is better?

After assessing it again, I tend to agree - synchronizing on the ZFIIS
(instead of the ZipFile) now looks to me to be safe because it does not
do anything which synchronizes on the ZipFile object.

(Note that if it _did_ cause such synchronization, it would risk
deadlock with the code in ZipFile's close() method, which is
synchronized on itself when it calls close() on each remaining ZFIIS.
It's the difference in the order of acquiring the monitors that would
cause the potential for deadlock).

As it is, both close() methods synchronize on the 'inflaters' object
within their respective synchronized blocks, but that does not risk
deadlock.

Please find below an updated changeset below, containing the suggested
change to synchronization.

Thanks once again for your suggestions,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID 4aa4844cd3b9ae747ad736469134a8828ebeb652
# Parent  554adcfb615e63e62af530b1c10fcf7813a75b26
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff --git a/src/share/classes/java/util/zip/ZipFile.java 
b/src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java
+++ b/src/share/classes/java/util/zip/ZipFile.java
@@ -30,11 +30,14 @@
 import java.io.IOException;
 import java.io.EOFException;
 import java.io.File;
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.WeakReference;
 import java.nio.charset.Charset;
 import java.util.Vector;
 import java.util.Enumeration;
 import java.util.Set;
 import java.util.HashSet;
+import java.util.Iterator;
 import java.util.NoSuchElementException;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
@@ -315,7 +318,16 @@
 private static native void freeEntry(long jzfile, long jzentry);
 
 // the outstanding inputstreams that need to be closed.
-private SetInputStream streams = new HashSet();
+private final SetWeakReferenceInputStream streams = new HashSet();
+private final ReferenceQueue? super InputStream staleStreamQueue =
+new ReferenceQueue();
+
+private void clearStaleStreams() {
+Object staleStream;
+while (null != (staleStream = staleStreamQueue.poll())) {
+streams.remove(staleStream);
+}
+}
 
 /**
  * Returns an input stream for reading the contents of the specified
@@ -339,6 +351,7 @@
 ZipFileInputStream in = null;
 synchronized (this) {
 ensureOpen();
+clearStaleStreams();
 if (!zc.isUTF8()  (entry.flag  EFS) != 0) {
 jzentry = getEntry(jzfile, zc.getBytesUTF8(entry.name), false);
 } else {
@@ -351,51 +364,19 @@
 
 switch (getEntryMethod(jzentry)) {
 case STORED:
-streams.add(in);
+streams.add(
+new WeakReferenceInputStream(in, staleStreamQueue));
 return in;
 case DEFLATED:
-final ZipFileInputStream zfin = in;
 // MORE: Compute good size for inflater stream:
 long size = getEntrySize(jzentry) + 2; // Inflater likes a bit 
of slack
 if (size  65536) size = 8192;
 if (size = 0) size = 4096;
-InputStream is = new InflaterInputStream(zfin, getInflater(), 
(int)size) {
-private boolean isClosed = false;
-
-public void close() throws IOException {
-if (!isClosed) {
-super.close();
-releaseInflater(inf);
-isClosed = true;
-}
-}
-// Override fill() 

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-01 Thread Xueming Shen

On 04/01/2011 09:42 AM, Neil Richards wrote:

On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote:

Isn't it true that when the finalize()-close() gets invoked, there
should be no strong reference anywhere else that you can use to invoke
close() in other thread?

It's true that once finalize() has been called, there can't be another
explicit call to close().
However, if close() is explicitly called first, it will be called again
when finalize() calls it, so one still wants the update to 'isClosed' to
be seen by the finalizer thread (in this case).


I'm not a GC guy, so I might be missing something here, but if close() 
is being explicitly
invoked by some thread, means someone has a strong reference to it, I 
don't think the
finalize() can kick in until that close() returns and the strong 
reference used to make that
explicit invocation is cleared. The InputStream is eligible for 
finalization only after it is

weakly reachable, means no more stronger reachable exists, right?

-sherman


ZipFileInputStream class has to sync on ZipFile.this, the read/close()
methods are accessing the underlying/shared bits of the ZipFile. For
ZipFileInflaterInputStream.close() case, even we want to make sure the
isClose is synced,  I would prefer to see a local lock, maybe simply
make close() synchronized is better?

After assessing it again, I tend to agree - synchronizing on the ZFIIS
(instead of the ZipFile) now looks to me to be safe because it does not
do anything which synchronizes on the ZipFile object.

(Note that if it _did_ cause such synchronization, it would risk
deadlock with the code in ZipFile's close() method, which is
synchronized on itself when it calls close() on each remaining ZFIIS.
It's the difference in the order of acquiring the monitors that would
cause the potential for deadlock).

As it is, both close() methods synchronize on the 'inflaters' object
within their respective synchronized blocks, but that does not risk
deadlock.

Please find below an updated changeset below, containing the suggested
change to synchronization.

Thanks once again for your suggestions,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-04-01 Thread David Holmes

Xueming Shen said the following on 04/02/11 05:07:

On 04/01/2011 09:42 AM, Neil Richards wrote:

On Wed, 2011-03-30 at 13:31 -0700, Xueming Shen wrote:

Isn't it true that when the finalize()-close() gets invoked, there
should be no strong reference anywhere else that you can use to invoke
close() in other thread?

It's true that once finalize() has been called, there can't be another
explicit call to close().
However, if close() is explicitly called first, it will be called again
when finalize() calls it, so one still wants the update to 'isClosed' to
be seen by the finalizer thread (in this case).


I'm not a GC guy, so I might be missing something here, but if close() 
is being explicitly
invoked by some thread, means someone has a strong reference to it, I 
don't think the
finalize() can kick in until that close() returns and the strong 
reference used to make that
explicit invocation is cleared. The InputStream is eligible for 
finalization only after it is

weakly reachable, means no more stronger reachable exists, right?


Actually no. One of the more obscure corner cases with finalization is 
that you can actually finalize an object that is still being used. The 
JLS actually spells this out - see section 12.6.1 and in particular the 
Discussion within that section.


David


-sherman


ZipFileInputStream class has to sync on ZipFile.this, the read/close()
methods are accessing the underlying/shared bits of the ZipFile. For
ZipFileInflaterInputStream.close() case, even we want to make sure the
isClose is synced,  I would prefer to see a local lock, maybe simply
make close() synchronized is better?

After assessing it again, I tend to agree - synchronizing on the ZFIIS
(instead of the ZipFile) now looks to me to be safe because it does not
do anything which synchronizes on the ZipFile object.

(Note that if it _did_ cause such synchronization, it would risk
deadlock with the code in ZipFile's close() method, which is
synchronized on itself when it calls close() on each remaining ZFIIS.
It's the difference in the order of acquiring the monitors that would
cause the potential for deadlock).

As it is, both close() methods synchronize on the 'inflaters' object
within their respective synchronized blocks, but that does not risk
deadlock.

Please find below an updated changeset below, containing the suggested
change to synchronization.

Thanks once again for your suggestions,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-03-30 Thread Neil Richards
Hi Sherman,
Thanks for your review and comments on this.

On Tue, 2011-03-29 at 12:05 -0700, Xueming Shen wrote:
 Hi Neil,
 
 It appears to be a regression in scenario you described (user 
 application never close the input stream after use and the ZipFile
 instance being retained during the lifetime of the process). The
 proposed approach seems to solve this particular problem.
 
 Here are my comments regarding the patch.
 
 (1) ZipFileInflaterInputStream.close() now synchronizes on 
   Zipfile.this, is it really necessary?

Synchronization is used so that an update to 'isClosed' on one thread is
seen by another.

Adding finalization (ie. a call from finalize() to close()) effectively
introduces a new thread into the mix, namely the finalization thread.
Whilst you're correct in saying that, in general, InputStream gives no
thread-safe guarantees, in this particular case I believe it is valid to
make sure updates to this field are seen across threads, to prevent the
logic of close() being run twice.

I chose to synchronize on ZipFile.this merely for consistency - ie.
because that's what ZipFileInputStream (the other InputStream impl in
ZipFile) does. 

I suppose I felt the risks of creating some kind of obscure deadlock
scenario was minimised by copying the existing pattern. 

I can be easily swayed if you think there's a better object to
synchronize on, that doesn't run an increased risk of deadlock.

 (2) Is there any particular reason that we don't want to reuse the 
 Inflater(), if the close comes from the finalize() in your testing 
 evn? One thing we noticed before is that the moment the
 InputStream gets finalized, the Inflater embedded might have been
 finalized already, this is the reason why we have a inf.ended() 
 check in releaseInflater().

The ZipFileInflaterInputStream object's finalize() method is called by
the finalization thread, after it has been added to the finalization
queue by GC. 

If *it* has been added to the finalization queue (because it's eligible
for GC), then so will the Inflater object it is referencing.

As finalization gives no guarantees as to the order in which objects are
finalized, regardless of whether finalize() has been called on the
Inflater object, it is nevertheless the case that its method *will* be
called at some point, so the Inflater object *will* be ended.

If it has been put back into the inflater cache (because ZFIIS's
finalize happened to be called first), then this ended inflater will lie
in wait for another ZFIIS to use it, which would result in a
particularly unexpected exception being thrown.

In general, I believe it to be quite frowned on to resurrect objects
which are going through finalization (including those queued up for
finalization) and place them back into the live set of object, as weird
behaviour often ensues if one does.

(Especially as, iirc, finalize() is only called once per object,
regardless of whether it is dug back up in this fashion). 

 (3) The attached test case does f.close() in its finally block, I 
 guess it should be f.delete()?

When I wrote the testcase, I relied upon File.deleteOnExit() to take
care of cleaning up the temporary file(s) created.

However, I recall now that I don't think it guarantees deletion on
abnormal termination.

I agree that calling f.delete() instead of f.close() would be a good
improvement to the testcase.

Please find below an updated changeset, with this change made and the
bug number filled in.

Hope this helps to clarify things,
Thanks again,
Neil

-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


# HG changeset patch
# User Neil Richards neil.richa...@ngmr.net, neil_richa...@uk.ibm.com
# Date 1300289208 0
# Branch zip-heap
# Node ID 8ab5aa0669a176d95502d9cf68027ae9679ccab2
# Parent  554adcfb615e63e62af530b1c10fcf7813a75b26
7031076: Retained ZipFile InputStreams increase heap demand
Summary: Allow unreferenced ZipFile InputStreams to be finalized, GC'd
Contributed-by: neil.richa...@ngmr.net

diff --git a/src/share/classes/java/util/zip/ZipFile.java 
b/src/share/classes/java/util/zip/ZipFile.java
--- a/src/share/classes/java/util/zip/ZipFile.java
+++ b/src/share/classes/java/util/zip/ZipFile.java
@@ -30,11 +30,14 @@
 import java.io.IOException;
 import java.io.EOFException;
 import java.io.File;
+import java.lang.ref.ReferenceQueue;
+import java.lang.ref.WeakReference;
 import java.nio.charset.Charset;
 import java.util.Vector;
 import java.util.Enumeration;
 import java.util.Set;
 import java.util.HashSet;
+import java.util.Iterator;
 import java.util.NoSuchElementException;
 import java.security.AccessController;
 import sun.security.action.GetPropertyAction;
@@ -315,7 +318,16 @@
 private static native void freeEntry(long jzfile, long jzentry);
 
 // the outstanding inputstreams that need to 

Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-03-30 Thread Xueming Shen

On 03/30/2011 12:53 PM, Neil Richards wrote:

Hi Sherman,
Thanks for your review and comments on this.

On Tue, 2011-03-29 at 12:05 -0700, Xueming Shen wrote:

Hi Neil,

It appears to be a regression in scenario you described (user
application never close the input stream after use and the ZipFile
instance being retained during the lifetime of the process). The
proposed approach seems to solve this particular problem.

Here are my comments regarding the patch.

(1) ZipFileInflaterInputStream.close() now synchronizes on
   Zipfile.this, is it really necessary?

Synchronization is used so that an update to 'isClosed' on one thread is
seen by another.

Adding finalization (ie. a call from finalize() to close()) effectively
introduces a new thread into the mix, namely the finalization thread.
Whilst you're correct in saying that, in general, InputStream gives no
thread-safe guarantees, in this particular case I believe it is valid to
make sure updates to this field are seen across threads, to prevent the
logic of close() being run twice.


Isn't it true that when the finalize()-close() gets invoked, there 
should be no strong

reference anywhere else that you can use to invoke close() in other thread?
ZipFileInputStream class has to sync on ZipFile.this, the read/close() 
methods are
accessing the underlying/shared bits of the ZipFile. For 
ZipFileInflaterInputStream.close()
case, even we want to make sure the isClose is synced,  I would prefer 
to see a local

lock, maybe simply make close() synchronized is better?

-Sherman


I chose to synchronize on ZipFile.this merely for consistency - ie.
because that's what ZipFileInputStream (the other InputStream impl in
ZipFile) does.

I suppose I felt the risks of creating some kind of obscure deadlock
scenario was minimised by copying the existing pattern.

I can be easily swayed if you think there's a better object to
synchronize on, that doesn't run an increased risk of deadlock.


(2) Is there any particular reason that we don't want to reuse the
 Inflater(), if the close comes from the finalize() in your testing
 evn? One thing we noticed before is that the moment the
 InputStream gets finalized, the Inflater embedded might have been
 finalized already, this is the reason why we have a inf.ended()
 check in releaseInflater().

The ZipFileInflaterInputStream object's finalize() method is called by
the finalization thread, after it has been added to the finalization
queue by GC.

If *it* has been added to the finalization queue (because it's eligible
for GC), then so will the Inflater object it is referencing.

As finalization gives no guarantees as to the order in which objects are
finalized, regardless of whether finalize() has been called on the
Inflater object, it is nevertheless the case that its method *will* be
called at some point, so the Inflater object *will* be ended.

If it has been put back into the inflater cache (because ZFIIS's
finalize happened to be called first), then this ended inflater will lie
in wait for another ZFIIS to use it, which would result in a
particularly unexpected exception being thrown.

In general, I believe it to be quite frowned on to resurrect objects
which are going through finalization (including those queued up for
finalization) and place them back into the live set of object, as weird
behaviour often ensues if one does.

(Especially as, iirc, finalize() is only called once per object,
regardless of whether it is dug back up in this fashion).


(3) The attached test case does f.close() in its finally block, I
guess it should be f.delete()?

When I wrote the testcase, I relied upon File.deleteOnExit() to take
care of cleaning up the temporary file(s) created.

However, I recall now that I don't think it guarantees deletion on
abnormal termination.

I agree that calling f.delete() instead of f.close() would be a good
improvement to the testcase.

Please find below an updated changeset, with this change made and the
bug number filled in.

Hope this helps to clarify things,
Thanks again,
Neil





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-03-29 Thread Xueming Shen

Hi Neil,

It appears to be a regression in scenario you described (user 
application never close the
input stream after use and the ZipFile instance being retained during 
the lifetime of the

process). The proposed approach seems to solve this particular problem.

Here are my comments regarding the patch.

(1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, 
is it really necessary?
 given the rest of the implementation doesn't guarantee the 
returned InputStream object
 is thread-safe (the implementation does make sure the access to 
the underlying
 ZipFile is thread-safe, though). The new finalize() now also 
invokes close(), which means
 the ZipFile object now gets locked twice (assume the good manner 
application does

 invoke the close() after use) for each InputStream after use.

(2) Is there any particular reason that we don't want to reuse the 
Inflater(), if the close
 comes from the finalize() in your testing evn? One thing we 
noticed before is that the
 moment  the InputStream gets finalized, the Inflater embedded 
might have been
 finalized already, this is the reason why we have a inf.ended() 
check in releaseInflater().


 The disadvantage/limitation of existing Inflater caching mechanism 
is that it does not
 have a cap to limit the maximum number of inflater it should 
cache. Are you worried
 too many inflaters get cached, if they only get collected during 
GC/finalize()? This actually
 indicates that the cache mechanism does not work at all in your 
scenario, even with the
 proposed fix. An alternative solution (better?) is to make ZFIIS 
to check the bytes read
 during its  read() method, and pro-actively close the stream when 
it reaches the end of
 the stream,  same as what ZipFileInputStream does, which I believe 
should solve the
 problem in most use scenario (except, the application does not 
bother reading the

 InputSteeam to its end, which is unlikely).

(3) The attached test case does f.close() in its finally block, I 
guess it should be f.delete()?


-Sherman



On 03/23/2011 02:46 PM, Neil Richards wrote:

Hi,
I've noticed that the fix introduced to address bug 6735255 [1] [2] had
an unfortunate side-effect.

That fix stores references to the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
the close() method may be called upon those objects when ZipFile.close()
is called.

It does this to conform to the existing API description, and to avoid a
native memory leak which would occur if the InputStream is GC'd without
its close() being called.

However, by holding these InputStreams in a set within their ZipFile
object, their lifecycle is now prolonged until the ZipFile object either
has its close() method called, or is finalized (prior to GC).

I've found scenarios (in user code) were ZipFile objects are held onto
for a long time (eg. the entire lifetime of the process), but where
InputStream objects from that ZipFile (for individual entries in the zip
file) are obtained, used, then abandoned on a large number of occasions.

(An example here might be a user-defined, long-lasting class loader,
which might retain a ZipFile instance for each jar file in its search
path, but which will create transitory input streams to read out
particular files from those (large) jar files).

I see that the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of
sizeable byte arrays, one 512 bytes in size, the other 1002 bytes.

So by prolonging the life span of these objects, the fix for 6735255 can
inadvertently cause a significant increase in the demand/usage on the
heap (in practice, running into many Mb's).

(This is not so if the user code is good-mannered enough to explicitly
always call close() on the InputStreams in question.
However, experience shows that user code cannot be relied upon to behave
so benignly).

I believe this introduced problem can be addressed by:
  1. Holding references to the InputStreams in the 'streams' Set
 within ZipFile via WeakReferences, so they may be GC'd as soon
 as they are no longer externally (strongly) referenced.
  2. Adding finalization logic to the InputStream implementation
 classes, which ensures their close() method is called prior to
 GC.
  3. Prevent Inflater objects from being returned to the pool (in the
 ZipFile object) if close() has been called from finalize().

To that end, please find below an 'hg export' of a proposed change which
implements these aspects, together with a testcase to demonstrate the
problem/fix.

Any comments / suggestions on this gratefully received,
Thanks,
Neil

[1] http://bugs.sun.com/view_bug.do?bug_id=6735255
[2] http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/49478a651a28





Re: Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

2011-03-25 Thread Alan Bateman

Neil Richards wrote:

Hi,
I've noticed that the fix introduced to address bug 6735255 [1] [2] had
an unfortunate side-effect.

That fix stores references to the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
the close() method may be called upon those objects when ZipFile.close()
is called.

It does this to conform to the existing API description, and to avoid a
native memory leak which would occur if the InputStream is GC'd without
its close() being called.

However, by holding these InputStreams in a set within their ZipFile
object, their lifecycle is now prolonged until the ZipFile object either
has its close() method called, or is finalized (prior to GC).

  
I've created 7031076 to track this and generated a webrev from the 
change-set that you attached:

 http://cr.openjdk.java.net/~alanb/7031076/webrev/

Sherman mentioned off-list that he plans to look at this.

-Alan.