Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On Mon, Mar 28, 2022 at 7:33 PM Alan Bateman wrote: > > On 28/03/2022 11:02, Volker Simonis wrote: > > : > > As I wrote before, the extra data written into the output buffer isn't > > sensitive because it can only originate from the history buffer (aka > > "sliding window"). Also, this data is already exposed today if the > > `Inflater` class is being used stand-alone, because in contrast to > > `InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't > > guarantee to leave the content beyond the last read byte unaffected. > > Finally, the referenced zlib-chromium implementation with the > > mentioned behavior is the default zlib implementation in on Android > > and Chrome browsers. > If you are satisfied that flipping bits during an inflate operation > cannot never lead to something bad happening then okay. I'ts important > to ask about such as matters. > Sure. Thanks for your comments. > -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On 28/03/2022 11:02, Volker Simonis wrote: : As I wrote before, the extra data written into the output buffer isn't sensitive because it can only originate from the history buffer (aka "sliding window"). Also, this data is already exposed today if the `Inflater` class is being used stand-alone, because in contrast to `InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't guarantee to leave the content beyond the last read byte unaffected. Finally, the referenced zlib-chromium implementation with the mentioned behavior is the default zlib implementation in on Android and Chrome browsers. If you are satisfied that flipping bits during an inflate operation cannot never lead to something bad happening then okay. I'ts important to ask about such as matters. -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On Mon, Mar 28, 2022 at 10:53 AM Alan Bateman wrote: > > On 22/03/2022 12:28, Volker Simonis wrote: > > : > > I don't really understand this concern? Do you mean what happens if > > another thread is changing the content of the output buffer during an > > inflate? I think such a use case has never been well-defined and > > amending the specification won't change anything for such a situation. > The setup means that user code has access to temporary storage used by > the inflater library. It's important that nothing sensitive leaks, also > important that flipping bits in any of the bytes in that temporary > buffer doesn't lead to something that is considered a security issue. If > you are confident that nothing bad can happen they great, I'm just > pointing out things to consider when allow for the behavior discussed here. As I wrote before, the extra data written into the output buffer isn't sensitive because it can only originate from the history buffer (aka "sliding window"). Also, this data is already exposed today if the `Inflater` class is being used stand-alone, because in contrast to `InflaterInputStream::read(..)`, `Inflater::inflate(..)` doesn't guarantee to leave the content beyond the last read byte unaffected. Finally, the referenced zlib-chromium implementation with the mentioned behavior is the default zlib implementation in on Android and Chrome browsers. I've created a pull request and an associated CSR for this issue under: https://github.com/openjdk/jdk/pull/7986 https://bugs.openjdk.java.net/browse/JDK-8283758 Finally I've also created a PR to fix the ZipFSOutputStreamTest mentioned in my initial mail: https://github.com/openjdk/jdk/pull/7984 Can you please kindly take a look and review? Thank you and best regards, Volker PS: just saw you've already approved https://github.com/openjdk/jdk/pull/7984 :) > > -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On 22/03/2022 12:28, Volker Simonis wrote: : I don't really understand this concern? Do you mean what happens if another thread is changing the content of the output buffer during an inflate? I think such a use case has never been well-defined and amending the specification won't change anything for such a situation. The setup means that user code has access to temporary storage used by the inflater library. It's important that nothing sensitive leaks, also important that flipping bits in any of the bytes in that temporary buffer doesn't lead to something that is considered a security issue. If you are confident that nothing bad can happen they great, I'm just pointing out things to consider when allow for the behavior discussed here. -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On Mar 22, 2022, at 12:28 PM, Volker Simonis mailto:volker.simo...@gmail.com>> wrote: On Mon, Mar 21, 2022 at 8:24 PM Lance Andersen mailto:lance.ander...@oracle.com>> wrote: Hi Lance, Thanks for looking into this issue. Please find my answers inline: Hi Volker, I have read through what you have provided/pointed to, thank you, and on the surface what you are suggesting sounds reasonable. That being said given that this API dates back to 1997ish, I think we have to be careful not introduce any regressions with existing applications with the proposal you suggest(even though it is just relaxes the spec), as you mentioned their is a jtreg test that seems to fail. Have you run the JCK with your ZLIB implementation? I only skimmed the tests but looks like there might be a couple of tests which validate the array’s contents. Yes, I did run the JCK tests with the optimized zlib implementations and couldn't find any problems related to this issue. We've also using the optimized version of zlib in production for quite a while without any problems. However, I did found a problem related to a test which was copied from "test/jdk/java/util/zip/DeflateIn_InflateOut.java". That was the test on a quick scan I wondered about actually... That jtreg test was fixed with "8240226: DeflateIn_InflateOut.java test incorrectly assumes size of compressed" [2] in JDK 15 but apparently that fix didn't made it into the JCK version. The test has now been excluded from JCK 17/18. Finally, the currently failing "nio/zipfs/ZipFSOutputStreamTest.java" [3] jtreg test is not failing because it specifically checks that the bytes in the output buffer beyond the last inflated byte remains untouched. It's just because they use a poor, "lazy" method of comparing inflated content to the original content and can easily be fixed. Instead of: ``` while ((numRead = is.read(buf)) != -1) { totalRead += numRead; // verify the content Assert.assertEquals(Arrays.mismatch(buf, chunk), -1, "Unexpected content"); } ``` just use: ``` while ((numRead = is.read(buf)) != -1) { totalRead += numRead; // verify the content Assert.assertEquals(Arrays.mismatch(buf, 0, numRead, chunk, 0, numRead), -1, "Unexpected content"); } ``` Why don’t you generate a PR to fix this separate from the change in spec proposal? [1] https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/DeflateIn_InflateOut.java__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuKmMKov2g$ [2] https://bugs.openjdk.java.net/browse/JDK-8240226 [3] https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuJSeRPJgw$ We don’t have a lot of test coverage so if we were to consider moving forward with your proposal, I believe additional test coverage would be warranted. Sure, I'll be happy to add some more testing. Do you have specific ideas? In fact my suggestion relaxes the specification of read(..) which would be hard to check :) Just some general coverage of the API would be great not expecting tests for the relaxed specification I think you are probably at a point where you can craft a draft CSR and PR and we can continue to review and discuss from there Best Lance Thank you and best regards, Volker Best Lance On Mar 4, 2022, at 5:04 AM, Volker Simonis mailto:volker.simo...@gmail.com>> wrote: `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater functionality. `Inflater::inflate(byte[] output, int off, int len)` currently calls zlib's native `inflate(..)` function and passes the address of `output[off]` and `len` to it via JNI. The specification of zlib's `inflate(..)` function (i.e. the [API documentation in the original zlib implementation](https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuKV0E1HqA$ )) doesn't give any guarantees with regard to usage of the output buffer. It only states that upon completion the function will return the number of bytes that have been written (i.e. "inflated") into the output buffer. The original zlib implementation only wrote as many bytes into the output buffer as it inflated. However, this is not a hard requirement and newer, more performant implementations of the zlib library like [zlib-chromium](https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuJVkxWFew$ ) or [zlib-cloudflare](https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!d0HFq-TCYrPXsCP8R_ja46jCDmVAXQCpz58ddo4rzXdRdDikJ27vpsKzBuIpQS_D6w$ ) can use more bytes of the output buffer than they actually inflate as a scratch buffer. See
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On Mon, Mar 21, 2022 at 8:24 PM Lance Andersen wrote: Hi Lance, Thanks for looking into this issue. Please find my answers inline: > Hi Volker, > > I have read through what you have provided/pointed to, thank you, and on the > surface what you are suggesting sounds reasonable. > > That being said given that this API dates back to 1997ish, I think we have to > be careful not introduce any regressions with existing applications with the > proposal you suggest(even though it is just relaxes the spec), as you > mentioned their is a jtreg test that seems to fail. > > Have you run the JCK with your ZLIB implementation? I only skimmed the tests > but looks like there might be a couple of tests which validate the array’s > contents. Yes, I did run the JCK tests with the optimized zlib implementations and couldn't find any problems related to this issue. We've also using the optimized version of zlib in production for quite a while without any problems. However, I did found a problem related to a test which was copied from "test/jdk/java/util/zip/DeflateIn_InflateOut.java". That jtreg test was fixed with "8240226: DeflateIn_InflateOut.java test incorrectly assumes size of compressed" [2] in JDK 15 but apparently that fix didn't made it into the JCK version. The test has now been excluded from JCK 17/18. Finally, the currently failing "nio/zipfs/ZipFSOutputStreamTest.java" [3] jtreg test is not failing because it specifically checks that the bytes in the output buffer beyond the last inflated byte remains untouched. It's just because they use a poor, "lazy" method of comparing inflated content to the original content and can easily be fixed. Instead of: ``` while ((numRead = is.read(buf)) != -1) { totalRead += numRead; // verify the content Assert.assertEquals(Arrays.mismatch(buf, chunk), -1, "Unexpected content"); } ``` just use: ``` while ((numRead = is.read(buf)) != -1) { totalRead += numRead; // verify the content Assert.assertEquals(Arrays.mismatch(buf, 0, numRead, chunk, 0, numRead), -1, "Unexpected content"); } ``` [1] https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/DeflateIn_InflateOut.java [2] https://bugs.openjdk.java.net/browse/JDK-8240226 [3] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java > > We don’t have a lot of test coverage so if we were to consider moving forward > with your proposal, I believe additional test coverage would be warranted. Sure, I'll be happy to add some more testing. Do you have specific ideas? In fact my suggestion relaxes the specification of read(..) which would be hard to check :) Thank you and best regards, Volker > Best > Lance > > > On Mar 4, 2022, at 5:04 AM, Volker Simonis wrote: > > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater > functionality. `Inflater::inflate(byte[] output, int off, int len)` > currently calls zlib's native `inflate(..)` function and passes the > address of `output[off]` and `len` to it via JNI. > > The specification of zlib's `inflate(..)` function (i.e. the [API > documentation in the original zlib > implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) > doesn't give any guarantees with regard to usage of the output buffer. > It only states that upon completion the function will return the > number of bytes that have been written (i.e. "inflated") into the > output buffer. > > The original zlib implementation only wrote as many bytes into the > output buffer as it inflated. However, this is not a hard requirement > and newer, more performant implementations of the zlib library like > [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) > or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more > bytes of the output buffer than they actually inflate as a scratch > buffer. See https://github.com/simonis/zlib-chromium for a more > detailed description of their approach and its performance benefit. > > These new zlib versions can still be used transparently from Java > (e.g. by putting them into the `LD_LIBRARY_PATH` or by using > `LD_PRELOAD`), because they still fully comply to specification of > `Inflater::inflate(..)`. However, we might run into problems when > using the `Inflater` functionality from the `InflaterInputStream` > class. `InflaterInputStream` is derived from from `InputStream` and as > such, its `read(byte[] b, int off, int len)` method is quite > constrained. It specifically specifies that if *k* bytes have been > read, then "these bytes will be stored in elements `b[off]` through > `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through > `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] > b, int off, int len)` (which is constrained by > `InputStream::read(..)`'s specification) calls > `Inflater::inflate(byte[] b, int off, int len)` and directly passes > its output buffer down to the
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On Mon, Mar 21, 2022 at 8:19 PM Alan Bateman wrote: > Hi Alan, Thanks for looking at this issue. Please find my answers to your questions inline: > On 04/03/2022 10:04, Volker Simonis wrote: > > : > > > > 1. Relax the specification of `InflaterInputStream::read(..)` and > > specifically note in the API documentation that a call to > > `InflaterInputStream::read(byte[] b, int off, int len)` may write more > > than *k* characters into `b` where *k* is the returned number of > > inflated bytes. Notice that there's a precedence for this approach in > > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden > > method of InputStream, returns -1 instead of zero if the end of the > > stream has been reached and len==0*". > > > On the surface this is probably okay. It wouldn't really be relaxing the > specification, rather than it would say for a return value n, the bytes > in b[n] to b[len-1] are undefined as the read operate may have clobbered > their original values. Yes, that's exactly what I'd like to propose. > The risk is that it becomes a performance "trick" > to provide a larger buffer. A larger buffer is always beneficial because it saves us from multiple native down-calls to zlib's inflate function. That's already true today, with the current implementation, where ideally the whole inflated content fits into the output buffer such that it can be inflated by a single call to inflate(). The issue I'd like to solve here is really a corner case for the case where the size of the inflated content is `m` but the buffer size is `n` with `n < m`. In that case we need `(m + (n-1)) / n)` calls to inflate and if `(m % n) > 0` the last call will write fewer than ´n` content bytes (i.e. `m % n` bytes) to the output buffer. Only during the last call to inflate() some of the remaining bytes in the buffer after the `m % n` content bytes can be clobbered. Calling that out in the specification won't change the performance benefits of a larger output buffer except for the writing of the 16 (i.e. the size of a vector store) very last inflated bytes. This last write could be potentially optimized with a 16-byte vector store instruction if the output buffer was larger than the deflated content `m`. But that's really a corner case because in reality the output buffer is usually smaller than `m` and the size of the inflated content `m` is usually much, much larger than 16 (so optimizing the output of the last 16 bytes won't have any measurable effects). > That said, I think the main thing we need to > satisfy ourselves on is security. One part of that is whether anything > can be gleaned by reading from the byte array during or after an > inflate. I can't see any security risks here. The optimized vector store instructions are only used to copy repeated content from the history buffer (aka "sliding window") to the output (i.e. inflate LZ77 compressed data). So the "clobber" bytes will always only contain bytes which have already been written before. See [1, 2] for details. [1] https://www.zlib.net/manual.html [2] https://www.euccas.me/zlib/#deflate_sliding_window > The other part is how the implementation behaves when there is > a tampering of the array contents during an inflate. I don't really understand this concern? Do you mean what happens if another thread is changing the content of the output buffer during an inflate? I think such a use case has never been well-defined and amending the specification won't change anything for such a situation. As I've explained before, the extra clobber bytes won't contain any data which is required by the inflate algorithm for its correct operation. It just contains some additional data from the LZ77 history buffer which can be safely overwritten. The optimized inflate implementations are actually constantly doing this until they reach the end of the output buffer or the end of the deflated data (see [3] for a simple example). [3] https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium > > -Alan If you don't mind I'll send out a pull request with a draft of the amended API-doc and open a CSR for this issue. Thank you and best regards, Volker
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
Hi Volker, I have read through what you have provided/pointed to, thank you, and on the surface what you are suggesting sounds reasonable. That being said given that this API dates back to 1997ish, I think we have to be careful not introduce any regressions with existing applications with the proposal you suggest(even though it is just relaxes the spec), as you mentioned their is a jtreg test that seems to fail. Have you run the JCK with your ZLIB implementation? I only skimmed the tests but looks like there might be a couple of tests which validate the array’s contents. We don’t have a lot of test coverage so if we were to consider moving forward with your proposal, I believe additional test coverage would be warranted. Best Lance On Mar 4, 2022, at 5:04 AM, Volker Simonis mailto:volker.simo...@gmail.com>> wrote: `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater functionality. `Inflater::inflate(byte[] output, int off, int len)` currently calls zlib's native `inflate(..)` function and passes the address of `output[off]` and `len` to it via JNI. The specification of zlib's `inflate(..)` function (i.e. the [API documentation in the original zlib implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) doesn't give any guarantees with regard to usage of the output buffer. It only states that upon completion the function will return the number of bytes that have been written (i.e. "inflated") into the output buffer. The original zlib implementation only wrote as many bytes into the output buffer as it inflated. However, this is not a hard requirement and newer, more performant implementations of the zlib library like [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes of the output buffer than they actually inflate as a scratch buffer. See https://github.com/simonis/zlib-chromium for a more detailed description of their approach and its performance benefit. These new zlib versions can still be used transparently from Java (e.g. by putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because they still fully comply to specification of `Inflater::inflate(..)`. However, we might run into problems when using the `Inflater` functionality from the `InflaterInputStream` class. `InflaterInputStream` is derived from from `InputStream` and as such, its `read(byte[] b, int off, int len)` method is quite constrained. It specifically specifies that if *k* bytes have been read, then "these bytes will be stored in elements `b[off]` through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int off, int len)` (which is constrained by `InputStream::read(..)`'s specification) calls `Inflater::inflate(byte[] b, int off, int len)` and directly passes its output buffer down to the native zlib `inflate(..)` method which is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the number of inflated bytes). From a practical point of view, I don't see this as a big problem, because callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never know how many bytes will be written into the output buffer `b` (and in fact its content can always be completely overwritten). It therefore makes no sense to depend on any data there being untouched after the call. Also, having used zlib-cloudflare productively for about two years, we haven't seen real-world issues because of this behavior yet. However, from a specification point of view it is easy to artificially construct a program which violates `InflaterInputStream::read(..)`'s postcondition if using one of the alterantive zlib implementations. A recently integrated JTreg test (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails with zlib-chromium but could be easily fixed to run with alternative implementations as well. What should/can be done to solve this problem? I think we have three options: 1. Relax the specification of `InflaterInputStream::read(..)` and specifically note in the API documentation that a call to `InflaterInputStream::read(byte[] b, int off, int len)` may write more than *k* characters into `b` where *k* is the returned number of inflated bytes. Notice that there's a precedence for this approach in `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden method of InputStream, returns -1 instead of zero if the end of the stream has been reached and len==0*". 2. Tighten the specification of `Inflater::read(byte[] b, int off, int len)` to explicitly forbid any writes into the output array `b` beyond the inflated bytes. 3. Change the implementation of `InflaterInputStream::read(..)` to call `Inflater::read(..)` with a temporary buffer and only copy the nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On 04/03/2022 10:04, Volker Simonis wrote: : 1. Relax the specification of `InflaterInputStream::read(..)` and specifically note in the API documentation that a call to `InflaterInputStream::read(byte[] b, int off, int len)` may write more than *k* characters into `b` where *k* is the returned number of inflated bytes. Notice that there's a precedence for this approach in `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden method of InputStream, returns -1 instead of zero if the end of the stream has been reached and len==0*". On the surface this is probably okay. It wouldn't really be relaxing the specification, rather than it would say for a return value n, the bytes in b[n] to b[len-1] are undefined as the read operate may have clobbered their original values. The risk is that it becomes a performance "trick" to provide a larger buffer. That said, I think the main thing we need to satisfy ourselves on is security. One part of that is whether anything can be gleaned by reading from the byte array during or after an inflate. The other part is how the implementation behaves when there is a tampering of the array contents during an inflate. -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
Ping... On Thu, Mar 10, 2022 at 8:26 PM Lance Andersen wrote: > Hi Volker, > > Thank you for the reminder > > This is on my radar as well but have not had a chance spend any time on > this as yet. > > > > On Mar 9, 2022, at 2:33 PM, Volker Simonis > wrote: > > @Alan, @Lance, > > Sorry for my obtrusiveness, but what's your opinion on this issue? > > Thank you and best regards, > Volker > > On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis > wrote: > > > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater > functionality. `Inflater::inflate(byte[] output, int off, int len)` > currently calls zlib's native `inflate(..)` function and passes the > address of `output[off]` and `len` to it via JNI. > > The specification of zlib's `inflate(..)` function (i.e. the [API > documentation in the original zlib > implementation]( > https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0w3u0Q7HA$ > )) > doesn't give any guarantees with regard to usage of the output buffer. > It only states that upon completion the function will return the > number of bytes that have been written (i.e. "inflated") into the > output buffer. > > The original zlib implementation only wrote as many bytes into the > output buffer as it inflated. However, this is not a hard requirement > and newer, more performant implementations of the zlib library like > [zlib-chromium]( > https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0yNMBtuew$ > ) > or [zlib-cloudflare]( > https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0y7Uhi8nA$ > ) can use more > bytes of the output buffer than they actually inflate as a scratch > buffer. See > https://urldefense.com/v3/__https://github.com/simonis/zlib-chromium__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0z1qVlYPg$ > for a more > detailed description of their approach and its performance benefit. > > These new zlib versions can still be used transparently from Java > (e.g. by putting them into the `LD_LIBRARY_PATH` or by using > `LD_PRELOAD`), because they still fully comply to specification of > `Inflater::inflate(..)`. However, we might run into problems when > using the `Inflater` functionality from the `InflaterInputStream` > class. `InflaterInputStream` is derived from from `InputStream` and as > such, its `read(byte[] b, int off, int len)` method is quite > constrained. It specifically specifies that if *k* bytes have been > read, then "these bytes will be stored in elements `b[off]` through > `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through > `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] > b, int off, int len)` (which is constrained by > `InputStream::read(..)`'s specification) calls > `Inflater::inflate(byte[] b, int off, int len)` and directly passes > its output buffer down to the native zlib `inflate(..)` method which > is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the > number of inflated bytes). > > From a practical point of view, I don't see this as a big problem, > because callers of `InflaterInputStream::read(byte[] b, int off, int > len)` can never know how many bytes will be written into the output > buffer `b` (and in fact its content can always be completely > overwritten). It therefore makes no sense to depend on any data there > being untouched after the call. Also, having used zlib-cloudflare > productively for about two years, we haven't seen real-world issues > because of this behavior yet. However, from a specification point of > view it is easy to artificially construct a program which violates > `InflaterInputStream::read(..)`'s postcondition if using one of the > alterantive zlib implementations. A recently integrated JTreg test > (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" > fails with zlib-chromium but could be easily fixed to run with > alternative implementations as well. > > What should/can be done to solve this problem? I think we have three > options: > > 1. Relax the specification of `InflaterInputStream::read(..)` and > specifically note in the API documentation that a call to > `InflaterInputStream::read(byte[] b, int off, int len)` may write more > than *k* characters into `b` where *k* is the returned number of > inflated bytes. Notice that there's a precedence for this approach in > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden > method of InputStream, returns -1 instead of zero if the end of the > stream has been reached and len==0*". > > 2. Tighten the specification of `Inflater::read(byte[] b, int off, int > len)` to explicitly forbid any writes into the output array `b`
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
On 09/03/2022 19:33, Volker Simonis wrote: @Alan, @Lance, Sorry for my obtrusiveness, but what's your opinion on this issue? I saw your mail but I haven't had time to study it yet and see if the spec option you prefer is the best. I will try to get time next week. -Alan
Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
@Alan, @Lance, Sorry for my obtrusiveness, but what's your opinion on this issue? Thank you and best regards, Volker On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis wrote: > > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater > functionality. `Inflater::inflate(byte[] output, int off, int len)` > currently calls zlib's native `inflate(..)` function and passes the > address of `output[off]` and `len` to it via JNI. > > The specification of zlib's `inflate(..)` function (i.e. the [API > documentation in the original zlib > implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) > doesn't give any guarantees with regard to usage of the output buffer. > It only states that upon completion the function will return the > number of bytes that have been written (i.e. "inflated") into the > output buffer. > > The original zlib implementation only wrote as many bytes into the > output buffer as it inflated. However, this is not a hard requirement > and newer, more performant implementations of the zlib library like > [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) > or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more > bytes of the output buffer than they actually inflate as a scratch > buffer. See https://github.com/simonis/zlib-chromium for a more > detailed description of their approach and its performance benefit. > > These new zlib versions can still be used transparently from Java > (e.g. by putting them into the `LD_LIBRARY_PATH` or by using > `LD_PRELOAD`), because they still fully comply to specification of > `Inflater::inflate(..)`. However, we might run into problems when > using the `Inflater` functionality from the `InflaterInputStream` > class. `InflaterInputStream` is derived from from `InputStream` and as > such, its `read(byte[] b, int off, int len)` method is quite > constrained. It specifically specifies that if *k* bytes have been > read, then "these bytes will be stored in elements `b[off]` through > `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through > `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] > b, int off, int len)` (which is constrained by > `InputStream::read(..)`'s specification) calls > `Inflater::inflate(byte[] b, int off, int len)` and directly passes > its output buffer down to the native zlib `inflate(..)` method which > is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the > number of inflated bytes). > > From a practical point of view, I don't see this as a big problem, > because callers of `InflaterInputStream::read(byte[] b, int off, int > len)` can never know how many bytes will be written into the output > buffer `b` (and in fact its content can always be completely > overwritten). It therefore makes no sense to depend on any data there > being untouched after the call. Also, having used zlib-cloudflare > productively for about two years, we haven't seen real-world issues > because of this behavior yet. However, from a specification point of > view it is easy to artificially construct a program which violates > `InflaterInputStream::read(..)`'s postcondition if using one of the > alterantive zlib implementations. A recently integrated JTreg test > (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" > fails with zlib-chromium but could be easily fixed to run with > alternative implementations as well. > > What should/can be done to solve this problem? I think we have three options: > > 1. Relax the specification of `InflaterInputStream::read(..)` and > specifically note in the API documentation that a call to > `InflaterInputStream::read(byte[] b, int off, int len)` may write more > than *k* characters into `b` where *k* is the returned number of > inflated bytes. Notice that there's a precedence for this approach in > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden > method of InputStream, returns -1 instead of zero if the end of the > stream has been reached and len==0*". > > 2. Tighten the specification of `Inflater::read(byte[] b, int off, int > len)` to explicitly forbid any writes into the output array `b` beyond > the inflated bytes. > > 3. Change the implementation of `InflaterInputStream::read(..)` to > call `Inflater::read(..)` with a temporary buffer and only copy the > nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output > buffer. I think we all agree that this is only a theoretic option > because of its unacceptable performance regression. > > I personally favor option 1 but I'm interested in your opinions? > > Thank you and best regards, > Volker