Thank you! I opened https://issues.apache.org/jira/browse/IO-802
On 2023/06/13 02:14:51 Phil Steitz wrote: > On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz <phil.ste...@gmail.com> wrote: > > > > > > > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison <talli...@apache.org> wrote: > > > >> All, > >> > >> Since the refactoring to avoid threadlocal in skipFully, we are now > >> getting errors in one of our unit tests on Tika [0]. We're still on > >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. > >> > >> I was able to reproduce the problem outside of Tika [1]. To reproduce > >> the problem, the stream has to be wrapped in java's InflaterInputStream, > >> and it has to be run multithreaded. The problem is easily reproducible > >> with only two threads, but there is variation on which iteration triggers > >> the problem even with the same seed (what you'd expect from a > >> multi-threading bug). > >> > >> I don't think this is a bug in commons-io, per se, but it is mildly > >> frightening from a data reliability perspective that the combination of > >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to > >> different data. > >> > >> Is this a bug in Java, a bug in commons-io or something else? What > >> should we do? Thank you! > >> > > > > Interesting. Taking a quick look at the source for both IOUtils and > > InflaterInputStream, it looks to me like the InflaterInputStream may be > > effectively assuming that it has exclusive access to the buffer that it is > > reading into. I looked at a couple of jdk sources and they both look like > > this: > > > > public int read(byte[] b, int off, int len) throws IOException { > > ensureOpen(); > > if (b == null) { > > throw new NullPointerException(); > > } else if (off < 0 || len < 0 || len > b.length - off) { > > throw new IndexOutOfBoundsException(); > > } else if (len == 0) { > > return 0; > > } > > try { > > int n; > > while ((n = inf.inflate(b, off, len)) == 0) { > > if (inf.finished() || inf.needsDictionary()) { > > reachEOF = true; > > return -1; > > } > > if (inf.needsInput()) { > > fill(); > > } > > } > > return n; > > } catch (DataFormatException e) { > > String s = e.getMessage(); > > throw new ZipException(s != null ? s : "Invalid ZLIB data > > format"); > > } > > } > > > > I can see how zero-ing the byte array it is reading to could cause > > problems here. I don't see any doc anywhere saying that either this Reader > > or the interface in general has this constraint, so maybe it is a jdk bug, > > but it is what it is and I think to be safe it would be better to add back > > the ThreadLocal on the write only bufffer in IOUtills or change it to not > > zero the array. I don't see why the zeroing is necessary, but I don't know > > much about [io]. > > > > Sorry, just eliminating the zero-ing probably wont work because concurrent > writes could make buffer segments not inflatable IIUC what is going on. So > I would just restore the protection. > > > > > Phil > > > > > > > >> Best, > >> > >> Tim > >> > >> > >> > >> > >> [0] https://issues.apache.org/jira/browse/TIKA-4065 > >> [1] > >> > >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org