Thank you!  I opened https://issues.apache.org/jira/browse/IO-802

On 2023/06/13 02:14:51 Phil Steitz wrote:
> On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz <phil.ste...@gmail.com> wrote:
> 
> >
> >
> > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison <talli...@apache.org> wrote:
> >
> >> All,
> >>
> >>   Since the refactoring to avoid threadlocal in skipFully, we are now
> >> getting errors in one of our unit tests on Tika [0].  We're still on
> >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
> >>
> >>    I was able to reproduce the problem outside of Tika [1].  To reproduce
> >> the problem, the stream has to be wrapped in java's InflaterInputStream,
> >> and it has to be run multithreaded.  The problem is easily reproducible
> >> with only two threads, but there is variation on which iteration triggers
> >> the problem even with the same seed (what you'd expect from a
> >> multi-threading bug).
> >>
> >>   I don't think this is a bug in commons-io, per se, but it is mildly
> >> frightening from a data reliability perspective that the combination of
> >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
> >> different data.
> >>
> >>   Is this a bug in Java, a bug in commons-io or something else?  What
> >> should we do?  Thank you!
> >>
> >
> > Interesting.  Taking a quick look at the source for both IOUtils and
> > InflaterInputStream, it looks to me like the InflaterInputStream may be
> > effectively assuming that it has exclusive access to the buffer that it is
> > reading into.   I looked at a couple of jdk sources and they both look like
> > this:
> >
> > public int read(byte[] b, int off, int len) throws IOException {
> >         ensureOpen();
> >         if (b == null) {
> >             throw new NullPointerException();
> >         } else if (off < 0 || len < 0 || len > b.length - off) {
> >             throw new IndexOutOfBoundsException();
> >         } else if (len == 0) {
> >             return 0;
> >         }
> >         try {
> >             int n;
> >             while ((n = inf.inflate(b, off, len)) == 0) {
> >                 if (inf.finished() || inf.needsDictionary()) {
> >                     reachEOF = true;
> >                     return -1;
> >                 }
> >                 if (inf.needsInput()) {
> >                     fill();
> >                 }
> >             }
> >             return n;
> >         } catch (DataFormatException e) {
> >             String s = e.getMessage();
> >             throw new ZipException(s != null ? s : "Invalid ZLIB data
> > format");
> >         }
> >     }
> >
> > I can see how zero-ing the byte array it is reading to could cause
> > problems here.  I don't see any doc anywhere saying that either this Reader
> > or the interface in general has this constraint, so maybe it is a jdk bug,
> > but it is what it is and I think to be safe it would be better to add back
> > the ThreadLocal on the write only bufffer in IOUtills or change it to not
> > zero the array.  I don't see why the zeroing is necessary, but I don't know
> > much about [io].
> >
> 
> Sorry, just eliminating the zero-ing probably wont work because concurrent
> writes could make buffer segments not inflatable IIUC what is going on.  So
> I would just restore the protection.
> 
> >
> > Phil
> >
> >
> >
> >>        Best,
> >>
> >>            Tim
> >>
> >>
> >>
> >>
> >> [0] https://issues.apache.org/jira/browse/TIKA-4065
> >> [1]
> >>
> >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
> >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to