Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Mon, 11 Apr 2022 16:04:48 GMT, Alan Bateman wrote: >> Volker Simonis has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Adapted wording and copyrights based on @jaikiran review > > src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 110: > >> 108: * @param len the maximum number of bytes read >> 109: * @return the actual number of bytes read, or -1 if the end of the >> 110: * compressed input stream is reached > > We should probably adjust the `@return` description, as was done for > InflaterInputStream, otherwise the method. description will say "the > returning number of inflated bytes" and the return description will say "the > actual number of bytes read". Fixed (also in `ZipInputStream`). - PR: https://git.openjdk.java.net/jdk/pull/7986
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Mon, 11 Apr 2022 16:07:15 GMT, Alan Bateman wrote: >> Volker Simonis has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Adapted wording and copyrights based on @jaikiran review > > src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 103: > >> 101: * elements {@code buf[off+}n{@code ]} through {@code >> buf[off+}len{@code -1]} >> 102: * are undefined and an implementation is free to change them >> during the inflate >> 103: * operation. If the return value is -1 or an exception is thrown >> the whole > > The sentence is very long. How about putting "an implementation is free ..." > in parentheses after "undefined". Done (also in `ZipInputStream` and `GZipInputStream` which both have the same sentence). - PR: https://git.openjdk.java.net/jdk/pull/7986
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Mon, 11 Apr 2022 16:05:33 GMT, Alan Bateman wrote: >> Volker Simonis has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Adapted wording and copyrights based on @jaikiran review > > src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 104: > >> 102: * are undefined and an implementation is free to change them >> during the inflate >> 103: * operation. If the return value is -1 or an exception is thrown >> the whole >> 104: * content of {@code buf} is undefined. > > "whole content" doesn't seem right, it should be buf[off] to but[off + len - > 1] is undefined. Good catch. Fixed and also updated the corresponding part for `InflaterInputStream::read`. - PR: https://git.openjdk.java.net/jdk/pull/7986
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Tue, 12 Apr 2022 07:10:54 GMT, Jaikiran Pai wrote: >> src/java.base/share/classes/java/util/zip/ZipFile.java line 347: >> >>> 345: * >>> 346: * @implNote This implementation returns an instance of >>> 347: * {@link java.util.zip.InflaterInputStream}. >> >> What is the reasoning for this note, do we really want it in the API docs? > > Hello Alan, this change was done by Volker for the point that I raised in the > comment here https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691 > Hello Alan, this change was done by Volker for the point that I raised in the > comment here [#7986 > (comment)](https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691) I think it creates the temptation to cast the returned input stream to InflaterInputStream, it might be better to leave it out. - PR: https://git.openjdk.java.net/jdk/pull/7986
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Mon, 11 Apr 2022 16:03:08 GMT, Alan Bateman wrote: >> Volker Simonis has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Adapted wording and copyrights based on @jaikiran review > > src/java.base/share/classes/java/util/zip/ZipFile.java line 347: > >> 345: * >> 346: * @implNote This implementation returns an instance of >> 347: * {@link java.util.zip.InflaterInputStream}. > > What is the reasoning for this note, do we really want it in the API docs? Hello Alan, this change was done by Volker for the point that I raised in the comment here https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691 - PR: https://git.openjdk.java.net/jdk/pull/7986
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
On Mon, 11 Apr 2022 10:43:30 GMT, Volker Simonis wrote: >> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` >> to highlight that it might write more bytes than the returned number of >> inflated bytes into the buffer `b`. >> >> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, >> int len)` will leave the content beyond the last read byte in the read >> buffer `b` unaffected. However, the overridden `read` method in >> `InflaterInputStream` passes the read buffer `b` to >> `Inflater::inflate(byte[] b, int off, int len)` which doesn't provide this >> guarantee. Depending on implementation details, `Inflater::inflate` might >> write more than the returned number of inflated bytes into the buffer `b`. >> >> ### TL;DR >> >> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater >> functionality. `Inflater::inflate(byte[] output, int off, int len)` >> currently calls zlib's native `inflate(..)` function and passes the address >> of `output[off]` and `len` to it via JNI. >> >> The specification of zlib's `inflate(..)` function (i.e. the [API >> documentation in the original zlib >> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) >> doesn't give any guarantees with regard to usage of the output buffer. It >> only states that upon completion the function will return the number of >> bytes that have been written (i.e. "inflated") into the output buffer. >> >> The original zlib implementation only wrote as many bytes into the output >> buffer as it inflated. However, this is not a hard requirement and newer, >> more performant implementations of the zlib library like >> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) >> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes >> of the output buffer than they actually inflate as a scratch buffer. See >> https://github.com/simonis/zlib-chromium for a more detailed description of >> their approach and its performance benefit. >> >> These new zlib versions can still be used transparently from Java (e.g. by >> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because >> they still fully comply to specification of `Inflater::inflate(..)`. >> However, we might run into problems when using the `Inflater` functionality >> from the `InflaterInputStream` class. `InflaterInputStream` is derived from >> from `InputStream` and as such, its `read(byte[] b, int off, int len)` >> method is quite constrained. It specifically specifies that if *k* bytes >> have been read, then "these bytes will be stored in elements `b[off]` >> through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through >> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int >> off, int len)` (which is constrained by `InputStream::read(..)`'s >> specification) calls `Inflater::inflate(byte[] b, int off, int len)` and >> directly passes its output buffer down to the native zlib `inflate(..)` >> method which is free to change the bytes beyond `b[off+` *k*`]` (where *k* is the number of inflated bytes). >> >> From a practical point of view, I don't see this as a big problem, because >> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never >> know how many bytes will be written into the output buffer `b` (and in fact >> its content can always be completely overwritten). It therefore makes no >> sense to depend on any data there being untouched after the call. Also, >> having used zlib-cloudflare productively for about two years, we haven't >> seen real-world issues because of this behavior yet. However, from a >> specification point of view it is easy to artificially construct a program >> which violates `InflaterInputStream::read(..)`'s postcondition if using one >> of the alterantive zlib implementations. A recently integrated JTreg test >> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails >> with zlib-chromium but can fixed easily to run with alternative >> implementations as well (see >> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)). > > Volker Simonis has updated the pull request incrementally with one additional > commit since the last revision: > > Adapted wording and copyrights based on @jaikiran review src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 103: > 101: * elements {@code buf[off+}n{@code ]} through {@code > buf[off+}len{@code -1]} > 102: * are undefined and an implementation is free to change them during > the inflate > 103: * operation. If the return value is -1 or an exception is thrown > the whole The sentence is very long. How about putting "an implementation is free ..." in parentheses after "undefined". src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 104: > 102: * are undefined
Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]
> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` to > highlight that it might write more bytes than the returned number of > inflated bytes into the buffer `b`. > > The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, > int len)` will leave the content beyond the last read byte in the read buffer > `b` unaffected. However, the overridden `read` method in > `InflaterInputStream` passes the read buffer `b` to `Inflater::inflate(byte[] > b, int off, int len)` which doesn't provide this guarantee. Depending on > implementation details, `Inflater::inflate` might write more than the > returned number of inflated bytes into the buffer `b`. > > ### TL;DR > > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater > functionality. `Inflater::inflate(byte[] output, int off, int len)` currently > calls zlib's native `inflate(..)` function and passes the address of > `output[off]` and `len` to it via JNI. > > The specification of zlib's `inflate(..)` function (i.e. the [API > documentation in the original zlib > implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) > doesn't give any guarantees with regard to usage of the output buffer. It > only states that upon completion the function will return the number of bytes > that have been written (i.e. "inflated") into the output buffer. > > The original zlib implementation only wrote as many bytes into the output > buffer as it inflated. However, this is not a hard requirement and newer, > more performant implementations of the zlib library like > [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) > or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes > of the output buffer than they actually inflate as a scratch buffer. See > https://github.com/simonis/zlib-chromium for a more detailed description of > their approach and its performance benefit. > > These new zlib versions can still be used transparently from Java (e.g. by > putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because > they still fully comply to specification of `Inflater::inflate(..)`. However, > we might run into problems when using the `Inflater` functionality from the > `InflaterInputStream` class. `InflaterInputStream` is derived from from > `InputStream` and as such, its `read(byte[] b, int off, int len)` method is > quite constrained. It specifically specifies that if *k* bytes have been > read, then "these bytes will be stored in elements `b[off]` through > `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through `b[off+len-1]` > **unaffected**". But `InflaterInputStream::read(byte[] b, int off, int len)` > (which is constrained by `InputStream::read(..)`'s specification) calls > `Inflater::inflate(byte[] b, int off, int len)` and directly passes its > output buffer down to the native zlib `inflate(..)` method which is free to > change the bytes beyond `b[off+`* k*`]` (where *k* is the number of inflated bytes). > > From a practical point of view, I don't see this as a big problem, because > callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never > know how many bytes will be written into the output buffer `b` (and in fact > its content can always be completely overwritten). It therefore makes no > sense to depend on any data there being untouched after the call. Also, > having used zlib-cloudflare productively for about two years, we haven't seen > real-world issues because of this behavior yet. However, from a specification > point of view it is easy to artificially construct a program which violates > `InflaterInputStream::read(..)`'s postcondition if using one of the > alterantive zlib implementations. A recently integrated JTreg test > (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails > with zlib-chromium but can fixed easily to run with alternative > implementations as well (see > [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)). Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Adapted wording and copyrights based on @jaikiran review - Changes: - all: https://git.openjdk.java.net/jdk/pull/7986/files - new: https://git.openjdk.java.net/jdk/pull/7986/files/7c671aa5..1bf6bc1b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7986=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7986=03-04 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/7986.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7986/head:pull/7986 PR: https://git.openjdk.java.net/jdk/pull/7986