Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-12 Thread Volker Simonis
On Mon, 11 Apr 2022 16:04:48 GMT, Alan Bateman  wrote:

>> Volker Simonis has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Adapted wording and copyrights based on @jaikiran review
>
> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 110:
> 
>> 108:  * @param len the maximum number of bytes read
>> 109:  * @return  the actual number of bytes read, or -1 if the end of the
>> 110:  *  compressed input stream is reached
> 
> We should probably adjust the `@return` description, as was done for 
> InflaterInputStream, otherwise the method. description will say "the 
> returning number of inflated bytes" and the return description will say "the 
> actual number of bytes read".

Fixed (also in `ZipInputStream`).

-

PR: https://git.openjdk.java.net/jdk/pull/7986


Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-12 Thread Volker Simonis
On Mon, 11 Apr 2022 16:07:15 GMT, Alan Bateman  wrote:

>> Volker Simonis has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Adapted wording and copyrights based on @jaikiran review
>
> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 103:
> 
>> 101:  * elements {@code buf[off+}n{@code ]} through {@code 
>> buf[off+}len{@code -1]}
>> 102:  * are undefined and an implementation is free to change them 
>> during the inflate
>> 103:  * operation. If the return value is -1 or an exception is thrown 
>> the whole
> 
> The sentence is very long. How about putting "an implementation is free ..." 
> in parentheses after "undefined".

Done (also in `ZipInputStream` and `GZipInputStream` which both have the same 
sentence).

-

PR: https://git.openjdk.java.net/jdk/pull/7986


Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-12 Thread Volker Simonis
On Mon, 11 Apr 2022 16:05:33 GMT, Alan Bateman  wrote:

>> Volker Simonis has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Adapted wording and copyrights based on @jaikiran review
>
> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 104:
> 
>> 102:  * are undefined and an implementation is free to change them 
>> during the inflate
>> 103:  * operation. If the return value is -1 or an exception is thrown 
>> the whole
>> 104:  * content of {@code buf} is undefined.
> 
> "whole content" doesn't seem right, it should be buf[off] to but[off + len - 
> 1] is undefined.

Good catch. Fixed and also updated the corresponding part for 
`InflaterInputStream::read`.

-

PR: https://git.openjdk.java.net/jdk/pull/7986


Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-12 Thread Alan Bateman
On Tue, 12 Apr 2022 07:10:54 GMT, Jaikiran Pai  wrote:

>> src/java.base/share/classes/java/util/zip/ZipFile.java line 347:
>> 
>>> 345:  *
>>> 346:  * @implNote This implementation returns an instance of
>>> 347:  * {@link java.util.zip.InflaterInputStream}.
>> 
>> What is the reasoning for this note, do we really want it in the API docs?
>
> Hello Alan, this change was done by Volker for the point that I raised in the 
> comment here https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691

> Hello Alan, this change was done by Volker for the point that I raised in the 
> comment here [#7986 
> (comment)](https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691)

I think it creates the temptation to cast the returned input stream to 
InflaterInputStream, it might be better to leave it out.

-

PR: https://git.openjdk.java.net/jdk/pull/7986


Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-12 Thread Jaikiran Pai
On Mon, 11 Apr 2022 16:03:08 GMT, Alan Bateman  wrote:

>> Volker Simonis has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Adapted wording and copyrights based on @jaikiran review
>
> src/java.base/share/classes/java/util/zip/ZipFile.java line 347:
> 
>> 345:  *
>> 346:  * @implNote This implementation returns an instance of
>> 347:  * {@link java.util.zip.InflaterInputStream}.
> 
> What is the reasoning for this note, do we really want it in the API docs?

Hello Alan, this change was done by Volker for the point that I raised in the 
comment here https://github.com/openjdk/jdk/pull/7986#issuecomment-1081319691

-

PR: https://git.openjdk.java.net/jdk/pull/7986


Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-11 Thread Alan Bateman
On Mon, 11 Apr 2022 10:43:30 GMT, Volker Simonis  wrote:

>> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` 
>> to highlight that it might  write more bytes than the returned number of 
>> inflated bytes into the buffer `b`.
>> 
>> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, 
>> int len)` will leave the content beyond the last read byte in the read 
>> buffer `b` unaffected. However, the overridden `read` method in 
>> `InflaterInputStream` passes the read buffer `b` to 
>> `Inflater::inflate(byte[] b, int off, int len)` which doesn't provide this 
>> guarantee. Depending on implementation details, `Inflater::inflate` might 
>> write more than the returned number of inflated bytes into the buffer `b`.
>> 
>> ### TL;DR
>> 
>> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater 
>> functionality. `Inflater::inflate(byte[] output, int off, int len)` 
>> currently calls zlib's native `inflate(..)` function and passes the address 
>> of `output[off]` and `len` to it via JNI.
>> 
>> The specification of zlib's `inflate(..)` function (i.e. the [API 
>> documentation in the original zlib 
>> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
>>  doesn't give any guarantees with regard to usage of the output buffer. It 
>> only states that upon completion the function will return the number of 
>> bytes that have been written (i.e. "inflated") into the output buffer.
>> 
>> The original zlib implementation only wrote as many bytes into the output 
>> buffer as it inflated. However, this is not a hard requirement and newer, 
>> more performant implementations of the zlib library like 
>> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
>>  or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes 
>> of the output buffer than they actually inflate as a scratch buffer. See 
>> https://github.com/simonis/zlib-chromium for a more detailed description of 
>> their approach and its performance benefit.
>> 
>> These new zlib versions can still be used transparently from Java (e.g. by 
>> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because 
>> they still fully comply to specification of `Inflater::inflate(..)`. 
>> However, we might run into problems when using the `Inflater` functionality 
>> from the `InflaterInputStream` class. `InflaterInputStream` is derived from 
>> from `InputStream` and as such, its `read(byte[] b, int off, int len)` 
>> method is quite constrained. It specifically specifies that if *k* bytes 
>> have been read, then "these bytes will be stored in elements `b[off]` 
>> through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through 
>> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int 
>> off, int len)` (which is constrained by `InputStream::read(..)`'s 
>> specification) calls `Inflater::inflate(byte[] b, int off, int len)` and 
>> directly passes its output buffer down to the native zlib `inflate(..)` 
>> method which is free to change the bytes beyond `b[off+`
 *k*`]` (where *k* is the number of inflated bytes).
>> 
>> From a practical point of view, I don't see this as a big problem, because 
>> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never 
>> know how many bytes will be written into the output buffer `b` (and in fact 
>> its content can always be completely overwritten). It therefore makes no 
>> sense to depend on any data there being untouched after the call. Also, 
>> having used zlib-cloudflare productively for about two years, we haven't 
>> seen real-world issues because of this behavior yet. However, from a 
>> specification point of view it is easy to artificially construct a program 
>> which violates `InflaterInputStream::read(..)`'s postcondition if using one 
>> of the alterantive zlib implementations. A recently integrated JTreg test 
>> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails 
>> with zlib-chromium but can fixed easily to run with alternative 
>> implementations as well (see 
>> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)).
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adapted wording and copyrights based on @jaikiran review

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 103:

> 101:  * elements {@code buf[off+}n{@code ]} through {@code 
> buf[off+}len{@code -1]}
> 102:  * are undefined and an implementation is free to change them during 
> the inflate
> 103:  * operation. If the return value is -1 or an exception is thrown 
> the whole

The sentence is very long. How about putting "an implementation is free ..." in 
parentheses after "undefined".

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 104:

> 102:  * are undefined 

Re: RFR: 8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..) [v5]

2022-04-11 Thread Volker Simonis
> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` to 
> highlight that it might  write more bytes than the returned number of 
> inflated bytes into the buffer `b`.
> 
> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, 
> int len)` will leave the content beyond the last read byte in the read buffer 
> `b` unaffected. However, the overridden `read` method in 
> `InflaterInputStream` passes the read buffer `b` to `Inflater::inflate(byte[] 
> b, int off, int len)` which doesn't provide this guarantee. Depending on 
> implementation details, `Inflater::inflate` might write more than the 
> returned number of inflated bytes into the buffer `b`.
> 
> ### TL;DR
> 
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater 
> functionality. `Inflater::inflate(byte[] output, int off, int len)` currently 
> calls zlib's native `inflate(..)` function and passes the address of 
> `output[off]` and `len` to it via JNI.
> 
> The specification of zlib's `inflate(..)` function (i.e. the [API 
> documentation in the original zlib 
> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
>  doesn't give any guarantees with regard to usage of the output buffer. It 
> only states that upon completion the function will return the number of bytes 
> that have been written (i.e. "inflated") into the output buffer.
> 
> The original zlib implementation only wrote as many bytes into the output 
> buffer as it inflated. However, this is not a hard requirement and newer, 
> more performant implementations of the zlib library like 
> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
>  or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes 
> of the output buffer than they actually inflate as a scratch buffer. See 
> https://github.com/simonis/zlib-chromium for a more detailed description of 
> their approach and its performance benefit.
> 
> These new zlib versions can still be used transparently from Java (e.g. by 
> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because 
> they still fully comply to specification of `Inflater::inflate(..)`. However, 
> we might run into problems when using the `Inflater` functionality from the 
> `InflaterInputStream` class. `InflaterInputStream` is derived from from 
> `InputStream` and as such, its `read(byte[] b, int off, int len)` method is 
> quite constrained. It specifically specifies that if *k* bytes have been 
> read, then "these bytes will be stored in elements `b[off]` through 
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through `b[off+len-1]` 
> **unaffected**". But `InflaterInputStream::read(byte[] b, int off, int len)` 
> (which is constrained by `InputStream::read(..)`'s specification) calls 
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes its 
> output buffer down to the native zlib `inflate(..)` method which is free to 
> change the bytes beyond `b[off+`*
 k*`]` (where *k* is the number of inflated bytes).
> 
> From a practical point of view, I don't see this as a big problem, because 
> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never 
> know how many bytes will be written into the output buffer `b` (and in fact 
> its content can always be completely overwritten). It therefore makes no 
> sense to depend on any data there being untouched after the call. Also, 
> having used zlib-cloudflare productively for about two years, we haven't seen 
> real-world issues because of this behavior yet. However, from a specification 
> point of view it is easy to artificially construct a program which violates 
> `InflaterInputStream::read(..)`'s postcondition if using one of the 
> alterantive zlib implementations. A recently integrated JTreg test 
> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails 
> with zlib-chromium but can fixed easily to run with alternative 
> implementations as well (see 
> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)).

Volker Simonis has updated the pull request incrementally with one additional 
commit since the last revision:

  Adapted wording and copyrights based on @jaikiran review

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7986/files
  - new: https://git.openjdk.java.net/jdk/pull/7986/files/7c671aa5..1bf6bc1b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7986=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7986=03-04

  Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7986.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7986/head:pull/7986

PR: https://git.openjdk.java.net/jdk/pull/7986