Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-28 Thread Volker Simonis
On Mon, Mar 28, 2022 at 7:33 PM Alan Bateman  wrote:
>
> On 28/03/2022 11:02, Volker Simonis wrote:
> > :
> > As I wrote before, the extra data written into the output buffer isn't
> > sensitive because it can only originate from the history buffer (aka
> > "sliding window"). Also, this data is already exposed today if the
> > `Inflater` class is being used stand-alone, because in contrast to
> > `InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't
> > guarantee to leave the content beyond the last read byte unaffected.
> > Finally, the referenced zlib-chromium implementation with the
> > mentioned behavior is the default zlib implementation in on Android
> > and Chrome browsers.
> If you are satisfied that flipping bits during an inflate operation
> cannot never lead to something bad happening then okay. I'ts important
> to ask about such as matters.
>

Sure. Thanks for your comments.

> -Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-28 Thread Alan Bateman

On 28/03/2022 11:02, Volker Simonis wrote:

:
As I wrote before, the extra data written into the output buffer isn't
sensitive because it can only originate from the history buffer (aka
"sliding window"). Also, this data is already exposed today if the
`Inflater` class is being used stand-alone, because in contrast to
`InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't
guarantee to leave the content beyond the last read byte unaffected.
Finally, the referenced zlib-chromium implementation with the
mentioned behavior is the default zlib implementation in on Android
and Chrome browsers.
If you are satisfied that flipping bits during an inflate operation 
cannot never lead to something bad happening then okay. I'ts important 
to ask about such as matters.


-Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-28 Thread Volker Simonis
On Mon, Mar 28, 2022 at 10:53 AM Alan Bateman  wrote:
>
> On 22/03/2022 12:28, Volker Simonis wrote:
> > :
> > I don't really understand this concern? Do you mean what happens if
> > another thread is changing the content of the output buffer during an
> > inflate? I think such a use case has never been well-defined and
> > amending the specification won't change anything for such a situation.
> The setup means that user code has access to temporary storage used by
> the inflater library. It's important that nothing sensitive leaks, also
> important that flipping bits in any of the bytes in that temporary
> buffer doesn't lead to something that is considered a security issue. If
> you are confident that nothing bad can happen they great, I'm just
> pointing out things to consider when allow for the behavior discussed here.

As I wrote before, the extra data written into the output buffer isn't
sensitive because it can only originate from the history buffer (aka
"sliding window"). Also, this data is already exposed today if the
`Inflater` class is being used stand-alone, because in contrast to
`InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't
guarantee to leave the content beyond the last read byte unaffected.
Finally, the referenced zlib-chromium implementation with the
mentioned behavior is the default zlib implementation in on Android
and Chrome browsers.

I've created a pull request and an associated CSR  for this issue under:
https://github.com/openjdk/jdk/pull/7986
https://bugs.openjdk.java.net/browse/JDK-8283758

Finally I've also created a PR to fix the ZipFSOutputStreamTest
mentioned in my initial mail:
https://github.com/openjdk/jdk/pull/7984

Can you please kindly take a look and review?

Thank you and best regards,
Volker

PS: just saw you've already approved https://github.com/openjdk/jdk/pull/7984 :)

>
> -Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-28 Thread Alan Bateman

On 22/03/2022 12:28, Volker Simonis wrote:

:
I don't really understand this concern? Do you mean what happens if
another thread is changing the content of the output buffer during an
inflate? I think such a use case has never been well-defined and
amending the specification won't change anything for such a situation.
The setup means that user code has access to temporary storage used by 
the inflater library. It's important that nothing sensitive leaks, also 
important that flipping bits in any of the bytes in that temporary 
buffer doesn't lead to something that is considered a security issue. If 
you are confident that nothing bad can happen they great, I'm just 
pointing out things to consider when allow for the behavior discussed here.


-Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-22 Thread Lance Andersen


On Mar 22, 2022, at 12:28 PM, Volker Simonis 
mailto:volker.simo...@gmail.com>> wrote:

On Mon, Mar 21, 2022 at 8:24 PM Lance Andersen
mailto:lance.ander...@oracle.com>> wrote:

Hi Lance,

Thanks for looking into this issue. Please find my answers inline:

Hi Volker,

I have read through what you have provided/pointed to, thank you, and on the 
surface what you are suggesting sounds reasonable.

That being said given that this API dates back to 1997ish, I think we have to 
be careful not introduce any regressions with existing applications with the 
proposal you suggest(even though it is just relaxes the spec), as you mentioned 
their is a jtreg test that  seems to fail.

Have you run the JCK with your ZLIB implementation?  I only skimmed the tests 
but looks like there might be a couple of tests which validate the array’s 
contents.

Yes, I did run the JCK tests with the optimized zlib implementations
and couldn't find any problems related to this issue. We've also using
the optimized version of zlib in production for quite a while without
any problems. However, I did found a problem related to a test which
was copied from "test/jdk/java/util/zip/DeflateIn_InflateOut.java".

That was the test on a quick scan I wondered about actually...
That jtreg test was fixed with "8240226: DeflateIn_InflateOut.java
test incorrectly assumes size of compressed" [2] in JDK 15 but
apparently that fix didn't made it into the JCK version. The test has
now been excluded from JCK 17/18.

Finally, the currently failing "nio/zipfs/ZipFSOutputStreamTest.java"
[3] jtreg test is not failing because it specifically checks that the
bytes in the output buffer beyond the last inflated byte remains
untouched. It's just because they use a poor, "lazy" method of
comparing inflated content to the original content and can easily be
fixed. Instead of:
```
while ((numRead = is.read(buf)) != -1) {
   totalRead += numRead;
   // verify the content
   Assert.assertEquals(Arrays.mismatch(buf, chunk), -1, "Unexpected content");
}
```
just use:
```
while ((numRead = is.read(buf)) != -1) {
   totalRead += numRead;
   // verify the content
   Assert.assertEquals(Arrays.mismatch(buf, 0, numRead, chunk, 0,
numRead), -1, "Unexpected content");
}
```

Why don’t you generate a PR to fix this separate from the change in spec 
proposal?

[1] 
https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/DeflateIn_InflateOut.java__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuKmMKov2g$
[2] https://bugs.openjdk.java.net/browse/JDK-8240226
[3] 
https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuJSeRPJgw$


We don’t have a lot of test coverage so if we were to consider moving forward 
with your proposal, I believe additional test coverage would be warranted.

Sure, I'll be happy to add some more testing. Do you have specific
ideas? In fact my suggestion relaxes the specification of read(..)
which would be hard to check :)

Just some general coverage of the API would be great not expecting tests for 
the relaxed specification

I think you are probably at a point where you can craft a draft CSR and PR and 
we can continue to review and discuss from there

Best
Lance

Thank you and best regards,
Volker

Best
Lance


On Mar 4, 2022, at 5:04 AM, Volker Simonis 
mailto:volker.simo...@gmail.com>> wrote:

`java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
functionality. `Inflater::inflate(byte[] output, int off, int len)`
currently calls zlib's native `inflate(..)` function and passes the
address of `output[off]` and `len` to it via JNI.

The specification of zlib's `inflate(..)` function (i.e. the [API
documentation in the original zlib
implementation](https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuKV0E1HqA$
 ))
doesn't give any guarantees with regard to usage of the output buffer.
It only states that upon completion the function will return the
number of bytes that have been written (i.e. "inflated") into the
output buffer.

The original zlib implementation only wrote as many bytes into the
output buffer as it inflated. However, this is not a hard requirement
and newer, more performant implementations of the zlib library like
[zlib-chromium](https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuJVkxWFew$
 )
or 
[zlib-cloudflare](https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuIpQS_D6w$
 ) can use more
bytes of the output buffer than they actually inflate as a scratch
buffer. See 

Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-22 Thread Volker Simonis
On Mon, Mar 21, 2022 at 8:24 PM Lance Andersen
 wrote:

Hi Lance,

Thanks for looking into this issue. Please find my answers inline:

> Hi Volker,
>
> I have read through what you have provided/pointed to, thank you, and on the 
> surface what you are suggesting sounds reasonable.
>
> That being said given that this API dates back to 1997ish, I think we have to 
> be careful not introduce any regressions with existing applications with the 
> proposal you suggest(even though it is just relaxes the spec), as you 
> mentioned their is a jtreg test that  seems to fail.
>
> Have you run the JCK with your ZLIB implementation?  I only skimmed the tests 
> but looks like there might be a couple of tests which validate the array’s 
> contents.

Yes, I did run the JCK tests with the optimized zlib implementations
and couldn't find any problems related to this issue. We've also using
the optimized version of zlib in production for quite a while without
any problems. However, I did found a problem related to a test which
was copied from "test/jdk/java/util/zip/DeflateIn_InflateOut.java".
That jtreg test was fixed with "8240226: DeflateIn_InflateOut.java
test incorrectly assumes size of compressed" [2] in JDK 15 but
apparently that fix didn't made it into the JCK version. The test has
now been excluded from JCK 17/18.

Finally, the currently failing "nio/zipfs/ZipFSOutputStreamTest.java"
[3] jtreg test is not failing because it specifically checks that the
bytes in the output buffer beyond the last inflated byte remains
untouched. It's just because they use a poor, "lazy" method of
comparing inflated content to the original content and can easily be
fixed. Instead of:
```
while ((numRead = is.read(buf)) != -1) {
totalRead += numRead;
// verify the content
Assert.assertEquals(Arrays.mismatch(buf, chunk), -1, "Unexpected content");
}
```
just use:
```
while ((numRead = is.read(buf)) != -1) {
totalRead += numRead;
// verify the content
Assert.assertEquals(Arrays.mismatch(buf, 0, numRead, chunk, 0,
numRead), -1, "Unexpected content");
}
```

[1] 
https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/DeflateIn_InflateOut.java
[2] https://bugs.openjdk.java.net/browse/JDK-8240226
[3] 
https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java

>
> We don’t have a lot of test coverage so if we were to consider moving forward 
> with your proposal, I believe additional test coverage would be warranted.

Sure, I'll be happy to add some more testing. Do you have specific
ideas? In fact my suggestion relaxes the specification of read(..)
which would be hard to check :)

Thank you and best regards,
Volker

> Best
> Lance
>
>
> On Mar 4, 2022, at 5:04 AM, Volker Simonis  wrote:
>
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
> functionality. `Inflater::inflate(byte[] output, int off, int len)`
> currently calls zlib's native `inflate(..)` function and passes the
> address of `output[off]` and `len` to it via JNI.
>
> The specification of zlib's `inflate(..)` function (i.e. the [API
> documentation in the original zlib
> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
> doesn't give any guarantees with regard to usage of the output buffer.
> It only states that upon completion the function will return the
> number of bytes that have been written (i.e. "inflated") into the
> output buffer.
>
> The original zlib implementation only wrote as many bytes into the
> output buffer as it inflated. However, this is not a hard requirement
> and newer, more performant implementations of the zlib library like
> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more
> bytes of the output buffer than they actually inflate as a scratch
> buffer. See https://github.com/simonis/zlib-chromium for a more
> detailed description of their approach and its performance benefit.
>
> These new zlib versions can still be used transparently from Java
> (e.g. by putting them into the `LD_LIBRARY_PATH` or by using
> `LD_PRELOAD`), because they still fully comply to specification of
> `Inflater::inflate(..)`. However, we might run into problems when
> using the `Inflater` functionality from the `InflaterInputStream`
> class. `InflaterInputStream` is derived from from `InputStream` and as
> such, its `read(byte[] b, int off, int len)` method is quite
> constrained. It specifically specifies that if *k* bytes have been
> read, then "these bytes will be stored in elements `b[off]` through
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
> b, int off, int len)` (which is constrained by
> `InputStream::read(..)`'s specification) calls
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes
> its output buffer down to the 

Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-22 Thread Volker Simonis
On Mon, Mar 21, 2022 at 8:19 PM Alan Bateman  wrote:
>

Hi Alan,

Thanks for looking at this issue. Please find my answers to your
questions inline:

> On 04/03/2022 10:04, Volker Simonis wrote:
> > :
> >
> > 1. Relax the specification of `InflaterInputStream::read(..)` and
> > specifically note in the API documentation that a call to
> > `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> > than *k* characters into `b` where *k* is the returned number of
> > inflated bytes. Notice that there's a precedence for this approach in
> > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> > method of InputStream, returns -1 instead of zero if the end of the
> > stream has been reached and len==0*".
> >
> On the surface this is probably okay. It wouldn't really be relaxing the
> specification, rather than it would say for a return value n, the bytes
> in b[n] to b[len-1] are undefined as the read operate may have clobbered
> their original values.

Yes, that's exactly what I'd like to propose.

> The risk is that it becomes a performance "trick"
> to provide a larger buffer.

A larger buffer is always beneficial because it saves us from multiple
native down-calls to zlib's inflate function. That's already true
today, with the current implementation, where ideally the whole
inflated content fits into the output buffer such that it can be
inflated by a single call to inflate().

The issue I'd like to solve here is really a corner case for the case
where the size of the inflated content is `m` but the buffer size is
`n` with `n < m`. In that case we need `(m + (n-1)) / n)` calls to
inflate and if `(m % n) > 0` the last call will write fewer than ´n`
content bytes (i.e. `m % n` bytes) to the output buffer. Only during
the last call to inflate() some of the remaining bytes in the buffer
after the `m % n` content bytes can be clobbered. Calling that out in
the specification won't change the performance benefits of a larger
output buffer except for the writing of the 16 (i.e. the size of a
vector store) very last inflated bytes. This last write could be
potentially optimized with a 16-byte vector store instruction if the
output buffer was larger than the deflated content `m`. But that's
really a corner case because in reality the output buffer is usually
smaller than `m` and the size of the inflated content `m` is usually
much, much larger than 16 (so optimizing the output of the last 16
bytes won't have any measurable effects).

> That said, I think the main thing we need to
> satisfy ourselves on is security. One part of that is whether anything
> can be gleaned by reading from the byte array during or after an
> inflate.

I can't see any security risks here. The optimized vector store
instructions are only used to copy repeated content from the history
buffer (aka "sliding window") to the output (i.e. inflate LZ77
compressed data). So the "clobber" bytes will always only contain
bytes which have already been written before. See [1, 2] for details.

[1] https://www.zlib.net/manual.html
[2] https://www.euccas.me/zlib/#deflate_sliding_window

> The other part is how the implementation behaves when there is
> a tampering of the array contents during an inflate.

I don't really understand this concern? Do you mean what happens if
another thread is changing the content of the output buffer during an
inflate? I think such a use case has never been well-defined and
amending the specification won't change anything for such a situation.
As I've explained before, the extra clobber bytes won't contain any
data which is required by the inflate algorithm for its correct
operation. It just contains some additional data from the LZ77 history
buffer which can be safely overwritten. The optimized inflate
implementations are actually constantly doing this until they reach
the end of the output buffer or the end of the deflated data (see [3]
for a simple example).

[3] 
https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium

>
> -Alan

If you don't mind I'll send out a pull request with a draft of the
amended API-doc and  open a CSR for this issue.

Thank you and best regards,
Volker


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-21 Thread Lance Andersen
Hi Volker,

I have read through what you have provided/pointed to, thank you, and on the 
surface what you are suggesting sounds reasonable.

That being said given that this API dates back to 1997ish, I think we have to 
be careful not introduce any regressions with existing applications with the 
proposal you suggest(even though it is just relaxes the spec), as you mentioned 
their is a jtreg test that  seems to fail.

Have you run the JCK with your ZLIB implementation?  I only skimmed the tests 
but looks like there might be a couple of tests which validate the array’s 
contents.

We don’t have a lot of test coverage so if we were to consider moving forward 
with your proposal, I believe additional test coverage would be warranted.



Best
Lance


On Mar 4, 2022, at 5:04 AM, Volker Simonis 
mailto:volker.simo...@gmail.com>> wrote:

`java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
functionality. `Inflater::inflate(byte[] output, int off, int len)`
currently calls zlib's native `inflate(..)` function and passes the
address of `output[off]` and `len` to it via JNI.

The specification of zlib's `inflate(..)` function (i.e. the [API
documentation in the original zlib
implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
doesn't give any guarantees with regard to usage of the output buffer.
It only states that upon completion the function will return the
number of bytes that have been written (i.e. "inflated") into the
output buffer.

The original zlib implementation only wrote as many bytes into the
output buffer as it inflated. However, this is not a hard requirement
and newer, more performant implementations of the zlib library like
[zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more
bytes of the output buffer than they actually inflate as a scratch
buffer. See https://github.com/simonis/zlib-chromium for a more
detailed description of their approach and its performance benefit.

These new zlib versions can still be used transparently from Java
(e.g. by putting them into the `LD_LIBRARY_PATH` or by using
`LD_PRELOAD`), because they still fully comply to specification of
`Inflater::inflate(..)`. However, we might run into problems when
using the `Inflater` functionality from the `InflaterInputStream`
class. `InflaterInputStream` is derived from from `InputStream` and as
such, its `read(byte[] b, int off, int len)` method is quite
constrained. It specifically specifies that if *k* bytes have been
read, then "these bytes will be stored in elements `b[off]` through
`b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
`b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
b, int off, int len)` (which is constrained by
`InputStream::read(..)`'s specification) calls
`Inflater::inflate(byte[] b, int off, int len)` and directly passes
its output buffer down to the native zlib `inflate(..)` method which
is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
number of inflated bytes).

From a practical point of view, I don't see this as a big problem,
because callers of `InflaterInputStream::read(byte[] b, int off, int
len)` can never know how many bytes will be written into the output
buffer `b` (and in fact its content can always be completely
overwritten). It therefore makes no sense to depend on any data there
being untouched after the call. Also, having used zlib-cloudflare
productively for about two years, we haven't seen real-world issues
because of this behavior yet. However, from a specification point of
view it is easy to artificially construct a program which violates
`InflaterInputStream::read(..)`'s postcondition if using one of the
alterantive zlib implementations. A recently integrated JTreg test
(test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
fails with zlib-chromium but could be easily fixed to run with
alternative implementations as well.

What should/can be done to solve this problem? I think we have three options:

1. Relax the specification of `InflaterInputStream::read(..)` and
specifically note in the API documentation that a call to
`InflaterInputStream::read(byte[] b, int off, int len)` may write more
than *k* characters into `b` where *k* is the returned number of
inflated bytes. Notice that there's a precedence for this approach in
`java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
method of InputStream, returns -1 instead of zero if the end of the
stream has been reached and len==0*".

2. Tighten the specification of `Inflater::read(byte[] b, int off, int
len)` to explicitly forbid any writes into the output array `b` beyond
the inflated bytes.

3. Change the implementation of `InflaterInputStream::read(..)` to
call `Inflater::read(..)` with a temporary buffer and only copy the
nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s 

Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-21 Thread Alan Bateman

On 04/03/2022 10:04, Volker Simonis wrote:

:

1. Relax the specification of `InflaterInputStream::read(..)` and
specifically note in the API documentation that a call to
`InflaterInputStream::read(byte[] b, int off, int len)` may write more
than *k* characters into `b` where *k* is the returned number of
inflated bytes. Notice that there's a precedence for this approach in
`java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
method of InputStream, returns -1 instead of zero if the end of the
stream has been reached and len==0*".

On the surface this is probably okay. It wouldn't really be relaxing the 
specification, rather than it would say for a return value n, the bytes 
in b[n] to b[len-1] are undefined as the read operate may have clobbered 
their original values. The risk is that it becomes a performance "trick" 
to provide a larger buffer. That said, I think the main thing we need to 
satisfy ourselves on is security. One part of that is whether anything 
can be gleaned by reading from the byte array during or after an 
inflate. The other part is how the implementation behaves when there is 
a tampering of the array contents during an inflate.


-Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-21 Thread Volker Simonis
Ping...

On Thu, Mar 10, 2022 at 8:26 PM Lance Andersen 
wrote:

> Hi Volker,
>
> Thank you for the reminder
>
> This is on my radar as well but have not had a chance spend any time on
> this as yet.
>
>
>
> On Mar 9, 2022, at 2:33 PM, Volker Simonis 
> wrote:
>
> @Alan, @Lance,
>
> Sorry for my obtrusiveness, but what's your opinion on this issue?
>
> Thank you and best regards,
> Volker
>
> On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis 
> wrote:
>
>
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
> functionality. `Inflater::inflate(byte[] output, int off, int len)`
> currently calls zlib's native `inflate(..)` function and passes the
> address of `output[off]` and `len` to it via JNI.
>
> The specification of zlib's `inflate(..)` function (i.e. the [API
> documentation in the original zlib
> implementation](
> https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0w3u0Q7HA$
> ))
> doesn't give any guarantees with regard to usage of the output buffer.
> It only states that upon completion the function will return the
> number of bytes that have been written (i.e. "inflated") into the
> output buffer.
>
> The original zlib implementation only wrote as many bytes into the
> output buffer as it inflated. However, this is not a hard requirement
> and newer, more performant implementations of the zlib library like
> [zlib-chromium](
> https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0yNMBtuew$
> )
> or [zlib-cloudflare](
> https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0y7Uhi8nA$
> ) can use more
> bytes of the output buffer than they actually inflate as a scratch
> buffer. See
> https://urldefense.com/v3/__https://github.com/simonis/zlib-chromium__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0z1qVlYPg$
>  for a more
> detailed description of their approach and its performance benefit.
>
> These new zlib versions can still be used transparently from Java
> (e.g. by putting them into the `LD_LIBRARY_PATH` or by using
> `LD_PRELOAD`), because they still fully comply to specification of
> `Inflater::inflate(..)`. However, we might run into problems when
> using the `Inflater` functionality from the `InflaterInputStream`
> class. `InflaterInputStream` is derived from from `InputStream` and as
> such, its `read(byte[] b, int off, int len)` method is quite
> constrained. It specifically specifies that if *k* bytes have been
> read, then "these bytes will be stored in elements `b[off]` through
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
> b, int off, int len)` (which is constrained by
> `InputStream::read(..)`'s specification) calls
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes
> its output buffer down to the native zlib `inflate(..)` method which
> is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
> number of inflated bytes).
>
> From a practical point of view, I don't see this as a big problem,
> because callers of `InflaterInputStream::read(byte[] b, int off, int
> len)` can never know how many bytes will be written into the output
> buffer `b` (and in fact its content can always be completely
> overwritten). It therefore makes no sense to depend on any data there
> being untouched after the call. Also, having used zlib-cloudflare
> productively for about two years, we haven't seen real-world issues
> because of this behavior yet. However, from a specification point of
> view it is easy to artificially construct a program which violates
> `InflaterInputStream::read(..)`'s postcondition if using one of the
> alterantive zlib implementations. A recently integrated JTreg test
> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
> fails with zlib-chromium but could be easily fixed to run with
> alternative implementations as well.
>
> What should/can be done to solve this problem? I think we have three
> options:
>
> 1. Relax the specification of `InflaterInputStream::read(..)` and
> specifically note in the API documentation that a call to
> `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> than *k* characters into `b` where *k* is the returned number of
> inflated bytes. Notice that there's a precedence for this approach in
> `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> method of InputStream, returns -1 instead of zero if the end of the
> stream has been reached and len==0*".
>
> 2. Tighten the specification of `Inflater::read(byte[] b, int off, int
> len)` to explicitly forbid any writes into the output array `b` 

Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-10 Thread Alan Bateman



On 09/03/2022 19:33, Volker Simonis wrote:

@Alan, @Lance,

Sorry for my obtrusiveness, but what's your opinion on this issue?

I saw your mail but I haven't had time to study it yet and see if the 
spec option you prefer is the best. I will try to get time next week.


-Alan


Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

2022-03-09 Thread Volker Simonis
@Alan, @Lance,

Sorry for my obtrusiveness, but what's your opinion on this issue?

Thank you and best regards,
Volker

On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis  wrote:
>
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
> functionality. `Inflater::inflate(byte[] output, int off, int len)`
> currently calls zlib's native `inflate(..)` function and passes the
> address of `output[off]` and `len` to it via JNI.
>
> The specification of zlib's `inflate(..)` function (i.e. the [API
> documentation in the original zlib
> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
> doesn't give any guarantees with regard to usage of the output buffer.
> It only states that upon completion the function will return the
> number of bytes that have been written (i.e. "inflated") into the
> output buffer.
>
> The original zlib implementation only wrote as many bytes into the
> output buffer as it inflated. However, this is not a hard requirement
> and newer, more performant implementations of the zlib library like
> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more
> bytes of the output buffer than they actually inflate as a scratch
> buffer. See https://github.com/simonis/zlib-chromium for a more
> detailed description of their approach and its performance benefit.
>
> These new zlib versions can still be used transparently from Java
> (e.g. by putting them into the `LD_LIBRARY_PATH` or by using
> `LD_PRELOAD`), because they still fully comply to specification of
> `Inflater::inflate(..)`. However, we might run into problems when
> using the `Inflater` functionality from the `InflaterInputStream`
> class. `InflaterInputStream` is derived from from `InputStream` and as
> such, its `read(byte[] b, int off, int len)` method is quite
> constrained. It specifically specifies that if *k* bytes have been
> read, then "these bytes will be stored in elements `b[off]` through
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
> b, int off, int len)` (which is constrained by
> `InputStream::read(..)`'s specification) calls
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes
> its output buffer down to the native zlib `inflate(..)` method which
> is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
> number of inflated bytes).
>
> From a practical point of view, I don't see this as a big problem,
> because callers of `InflaterInputStream::read(byte[] b, int off, int
> len)` can never know how many bytes will be written into the output
> buffer `b` (and in fact its content can always be completely
> overwritten). It therefore makes no sense to depend on any data there
> being untouched after the call. Also, having used zlib-cloudflare
> productively for about two years, we haven't seen real-world issues
> because of this behavior yet. However, from a specification point of
> view it is easy to artificially construct a program which violates
> `InflaterInputStream::read(..)`'s postcondition if using one of the
> alterantive zlib implementations. A recently integrated JTreg test
> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
> fails with zlib-chromium but could be easily fixed to run with
> alternative implementations as well.
>
> What should/can be done to solve this problem? I think we have three options:
>
> 1. Relax the specification of `InflaterInputStream::read(..)` and
> specifically note in the API documentation that a call to
> `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> than *k* characters into `b` where *k* is the returned number of
> inflated bytes. Notice that there's a precedence for this approach in
> `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> method of InputStream, returns -1 instead of zero if the end of the
> stream has been reached and len==0*".
>
> 2. Tighten the specification of `Inflater::read(byte[] b, int off, int
> len)` to explicitly forbid any writes into the output array `b` beyond
> the inflated bytes.
>
> 3. Change the implementation of `InflaterInputStream::read(..)` to
> call `Inflater::read(..)` with a temporary buffer and only copy the
> nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output
> buffer. I think we all agree that this is only a theoretic option
> because of its unacceptable performance regression.
>
> I personally favor option 1 but I'm interested in your opinions?
>
> Thank you and best regards,
> Volker