Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

2022-04-13 Thread Brian Burkhalter
On Thu, 14 Apr 2022 01:40:50 GMT, Brian Burkhalter  wrote:

> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

Currently for `java.io.FileInputStream.read(byte[],int,int)` and 
`java.io.FileOutputStream.write(byte[],int,int)`, for example, if the number of 
bytes respectively to be read or written exceeds 8192, an array of the total 
length of the read or write is allocated. For large reads or writes this could 
be significant. It is proposed to limit the maximum allocation size to 1 MB and 
perform the read or write by looping with this buffer. For reading or writing 
less than 1 MB, there is no effective change in the implementation.

This change both saves off-heap memory and increases throughput. An allocation 
of 1  MB is only 0.42% the size of the buffer in the JBS issue, 501 x 501 x 501 
x 2 (= 251,503,002), so for this case the memory reduction is drastic. Reading 
throughput is almost doubled and writing throughput improved by about 50%. As 
measured on macOS, the throughput of the methods mentioned above before the 
change was:


Benchmark Mode  Cnt   Score   Error  Units
ReadWrite.read   thrpt5  10.108 ± 0.264  ops/s
ReadWrite.write  thrpt5   7.188 ± 0.431  ops/s

and that after is:

Benchmark Mode  Cnt   Score   Error  Units
ReadWrite.read   thrpt5  20.112 ± 0.262  ops/s
ReadWrite.write  thrpt5  10.644 ± 4.568  ops/s

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

2022-04-13 Thread Daniel Jeliński
On Thu, 14 Apr 2022 01:40:50 GMT, Brian Burkhalter  wrote:

> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

The benchmark results are quite unexpected. Would we benefit from reducing the 
buffer size even further?

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

2022-04-13 Thread Alan Bateman
On Thu, 14 Apr 2022 01:40:50 GMT, Brian Burkhalter  wrote:

> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

src/java.base/share/native/libjava/io_util.c line 133:

> 131: if (nread == 0)
> 132: nread = -1;
> 133: break;

Can you prototype doing the loop in Java rather than in native code so that 
there is less native code to maintain?

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

2022-04-19 Thread Brian Burkhalter
On Thu, 14 Apr 2022 06:23:24 GMT, Alan Bateman  wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to 
>> limit the size of the allocated native buffer thereby decreasing off-heap 
>> memory footprint and increasing throughput.
>
> src/java.base/share/native/libjava/io_util.c line 133:
> 
>> 131: if (nread == 0)
>> 132: nread = -1;
>> 133: break;
> 
> Can you prototype doing the loop in Java rather than in native code so that 
> there is less native code to maintain?

I prototyped doing the read loop in Java and the performance seemed to be about 
the same. The problem is that the loop needs to exit when the native `read()` 
function performs a short read, i.e., fewer bytes are read than were requested. 
Without this, at least one regression test fails. The reason is not completely 
clear.

To detect such a short read in the Java layer would require some way for the 
native layer to notify the Java layer. One potential such method is

boolean readBytes(byte[] b, int off, int len, int[] nread) {}

where the return value indicates whether all or only some of the `len` bytes 
were read. If not all were read, then `nread[0]` would contain the number 
actually read or `-1`.

Another possibility is

int readBytes(byte[] b, int off, int len, int maxBufferSize) {}

which is identical to the current `readBytes()` except that the maximum 
transfer buffer size is specified as a method parameter instead of being 
defined by a symbolic constant in the native layer. In this case a short read 
would be detected if `len >= maxBufferSize` and the returned value is less than 
`maxBufferSize`.

As for the read loop being in native code, note that the write loop is also 
already in native code, so if the read loop were moved to Java, then probably 
the write loop should be as well.

One advantage of the loops being in native code is that there is only one 
`malloc()` per Java `read(byte[],int,int)` or `write(byte[],int,int)` 
invocation.

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available

2022-04-19 Thread Brian Burkhalter
On Thu, 14 Apr 2022 05:57:54 GMT, Daniel Jeliński  wrote:

> The benchmark results are quite unexpected. Would we benefit from reducing 
> the buffer size even further?

I tested with smaller and smaller buffer sizes until the performance started to 
be affected which was about 64 KB. I have not changed this value in the patch 
just yet.

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v5]

2022-06-09 Thread Brian Burkhalter
> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

Brian Burkhalter has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains seven additional 
commits since the last revision:

 - Merge
 - 6478546: Add break in write loop on ExceptionOccurred
 - Merge
 - 6478546: Clean up io_util.c
 - Merge
 - 6478546: Decrease malloc'ed buffer maximum size to 64 kB
 - 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty 
available

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/8235/files
  - new: https://git.openjdk.org/jdk/pull/8235/files/10f14bb3..00521485

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=03-04

  Stats: 118890 lines in 1877 files changed: 57455 ins; 51098 del; 10337 mod
  Patch: https://git.openjdk.org/jdk/pull/8235.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/8235/head:pull/8235

PR: https://git.openjdk.org/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v2]

2022-04-28 Thread Brian Burkhalter
> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

Brian Burkhalter has updated the pull request incrementally with one additional 
commit since the last revision:

  6478546: Decrease malloc'ed buffer maximum size to 64 kB

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8235/files
  - new: https://git.openjdk.java.net/jdk/pull/8235/files/8bb1e14f..40d46f8f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8235.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8235/head:pull/8235

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v2]

2022-04-28 Thread Uwe Schindler
On Thu, 28 Apr 2022 20:02:36 GMT, Brian Burkhalter  wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to 
>> limit the size of the allocated native buffer thereby decreasing off-heap 
>> memory footprint and increasing throughput.
>
> Brian Burkhalter has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   6478546: Decrease malloc'ed buffer maximum size to 64 kB

By the way: FileOutputStream has exactly the same problem with `write(byte[])`. 
I see no test for it, but I assume this is now also fixed. That's a 
longstanding issue in Lucene (this is why we use a ChunkedOutputStream when 
writing files.

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v2]

2022-04-28 Thread Brian Burkhalter
On Thu, 28 Apr 2022 20:13:48 GMT, Uwe Schindler  wrote:

> By the way: FileOutputStream has exactly the same problem with 
> `write(byte[])`. I see no test for it, but I assume this is now also fixed. 
> That's a longstanding issue in Lucene (this is why we use a 
> ChunkedOutputStream when writing files.

Yes, `write(byte[])` is also fixed: 
[io_util.c#L164](https://github.com/bplb/jdk/blob/40d46f8f4463dbd7dbe10651910826e90ca4dbdd/src/java.base/share/native/libjava/io_util.c#L164).

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v2]

2022-05-05 Thread Brian Burkhalter
On Thu, 28 Apr 2022 20:02:36 GMT, Brian Burkhalter  wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to 
>> limit the size of the allocated native buffer thereby decreasing off-heap 
>> memory footprint and increasing throughput.
>
> Brian Burkhalter has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   6478546: Decrease malloc'ed buffer maximum size to 64 kB

Further performance testing was conducted for the case where the native read 
and write functions used a fixed, stack-allocated buffer of size 8192. The 
loops were moved up into the Java code of `FileInputStream`, `FileOutputStream` 
and `RandomAccessFile`. Note that there was code duplication because RAF needs 
both read and write methods as well. The performance of writing with this 
approach was approximately half what it had been, so for writing the approach 
was abandoned.

Here are some updated performance measurements:

https://user-images.githubusercontent.com/71468245/167041493-6d4c421c-c2ec-4a8a-8b32-09b2a902a77c.png";>

https://user-images.githubusercontent.com/71468245/167041541-94e5806c-de86-4e62-a117-4cfafac82e87.png";>

The performance measurements shown are for the following cases:

1. Master: unmodified code as it exists in the mainline
2. Java: fixed-size stack buffer in native read, read loops in Java, write as 
in the mainline but with malloc buffer size limit
3. Native: read loop in native read with malloc buffer size limit, write as in 
the mainline but with malloc buffer size limit

The horizontal axis represents a variety of lengths from 8192 to 1GB; the 
vertical axis is throughput (ops/s) on a log 10 scale. The native lines in the 
charts are for the code proposed to be integrated.

As can be seen, the performance of reading is quite similar up to larger 
lengths. The mainline version presumably starts to suffer the effect of large 
allocations. The native read loop performs the best throughout, being for 
lengths 10 MB and above from 50% to 3X faster than the mainline version. The 
native read loop is about 40% faster than the Java read loop for these larger 
lengths.

Due to the log scale of the charts, the reading performance detail cannot be 
seen exactly and so is given here for the larger lengths:


   Throughput of read(byte[]) (ops/s)
   Length  Master JavaNative
   104857611341.39  6124.48211371.091
  10485760  356.893  376.326  557.906
 251503002   10.036   14.2719.869
 5242880005.0056.8579.552
101.6753.5274.997

The performance of writing is about the same for the Java and Native versions, 
as it should be since the implementations are the same. Any difference is 
likely due to measurement noise. The mainline version again suffers for larger 
lengths.

As the native write loop was already present in the mainline code, the 
principal complexity proposed to be added is the native read loop. Given the 
improved throughput and vastly reduced native memory allocation this seems to 
be justified.

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v3]

2022-05-05 Thread Brian Burkhalter
> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

Brian Burkhalter has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains four additional 
commits since the last revision:

 - 6478546: Clean up io_util.c
 - Merge
 - 6478546: Decrease malloc'ed buffer maximum size to 64 kB
 - 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty 
available

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8235/files
  - new: https://git.openjdk.java.net/jdk/pull/8235/files/40d46f8f..5c3a3446

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=01-02

  Stats: 36827 lines in 1404 files changed: 26452 ins; 4333 del; 6042 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8235.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8235/head:pull/8235

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v3]

2022-05-12 Thread John Hendrikx
On Thu, 5 May 2022 23:39:30 GMT, Brian Burkhalter  wrote:

>> Modify native multi-byte read-write code used by the `java.io` classes to 
>> limit the size of the allocated native buffer thereby decreasing off-heap 
>> memory footprint and increasing throughput.
>
> Brian Burkhalter has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains four additional 
> commits since the last revision:
> 
>  - 6478546: Clean up io_util.c
>  - Merge
>  - 6478546: Decrease malloc'ed buffer maximum size to 64 kB
>  - 6478546: FileInputStream.read() throws OutOfMemoryError when there is 
> plenty available

src/java.base/share/native/libjava/io_util.c line 79:

> 77:   jint off, jint len, jfieldID fid)
> 78: {
> 79: char stackBuf[STACK_BUF_SIZE];

This was in the original code already, but doesn't this always allocate 8k on 
the stack, regardless of whether this buffer will be used (if len > 8k or len 
== 0)?  Wouldn't it be better to allocate this only in the `else` case?

Would apply to the write code as well.

src/java.base/share/native/libjava/io_util.c line 81:

> 79: char stackBuf[STACK_BUF_SIZE];
> 80: char *buf = NULL;
> 81: jint buf_size, read_size;;

minor: double semi colon

src/java.base/share/native/libjava/io_util.c line 129:

> 127: off += n;
> 128: } else if (n == -1) {
> 129: JNU_ThrowIOExceptionWithLastError(env, "Read error");

The original code would have `nread` set to `-1` here, and thus exit with `-1`. 
 The new code would exit with the last value for `nread` which could be 
anything.

src/java.base/share/native/libjava/io_util.c line 201:

> 199: write_size = len < buf_size ? len : buf_size;
> 200: (*env)->GetByteArrayRegion(env, bytes, off, write_size, 
> (jbyte*)buf);
> 201: if (!(*env)->ExceptionOccurred(env)) {

Wouldn't this result in an infinite loop if `GetByteArrayRegion` triggered an 
exception and `len > 0` ?  It would never enter the `if` and `len` is never 
changed then...

-

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v4]

2022-05-18 Thread Brian Burkhalter
> Modify native multi-byte read-write code used by the `java.io` classes to 
> limit the size of the allocated native buffer thereby decreasing off-heap 
> memory footprint and increasing throughput.

Brian Burkhalter has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains six additional 
commits since the last revision:

 - 6478546: Add break in write loop on ExceptionOccurred
 - Merge
 - 6478546: Clean up io_util.c
 - Merge
 - 6478546: Decrease malloc'ed buffer maximum size to 64 kB
 - 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty 
available

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8235/files
  - new: https://git.openjdk.java.net/jdk/pull/8235/files/5c3a3446..10f14bb3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8235&range=02-03

  Stats: 232900 lines in 3152 files changed: 173230 ins; 42904 del; 16766 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8235.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8235/head:pull/8235

PR: https://git.openjdk.java.net/jdk/pull/8235


Re: RFR: 6478546: FileInputStream.read() throws OutOfMemoryError when there is plenty available [v3]

2022-05-18 Thread Brian Burkhalter
On Thu, 12 May 2022 07:59:36 GMT, John Hendrikx  wrote:

>> Brian Burkhalter has updated the pull request with a new target base due to 
>> a merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains four additional 
>> commits since the last revision:
>> 
>>  - 6478546: Clean up io_util.c
>>  - Merge
>>  - 6478546: Decrease malloc'ed buffer maximum size to 64 kB
>>  - 6478546: FileInputStream.read() throws OutOfMemoryError when there is 
>> plenty available
>
> src/java.base/share/native/libjava/io_util.c line 79:
> 
>> 77:   jint off, jint len, jfieldID fid)
>> 78: {
>> 79: char stackBuf[STACK_BUF_SIZE];
> 
> This was in the original code already, but doesn't this always allocate 8k on 
> the stack, regardless of whether this buffer will be used (if len > 8k or len 
> == 0)?  Wouldn't it be better to allocate this only in the `else` case?
> 
> Would apply to the write code as well.

This is intentional. We want to avoid having a malloc + free in every call and 
so avoid it for small buffers.

> src/java.base/share/native/libjava/io_util.c line 81:
> 
>> 79: char stackBuf[STACK_BUF_SIZE];
>> 80: char *buf = NULL;
>> 81: jint buf_size, read_size;;
> 
> minor: double semi colon

FIxed in 10f14bb3faef2b55ab59a85016874d12815f3c87.

> src/java.base/share/native/libjava/io_util.c line 129:
> 
>> 127: off += n;
>> 128: } else if (n == -1) {
>> 129: JNU_ThrowIOExceptionWithLastError(env, "Read error");
> 
> The original code would have `nread` set to `-1` here, and thus exit with 
> `-1`.  The new code would exit with the last value for `nread` which could be 
> anything.

The returned value of `nread` would in this case indicate the number of bytes 
actually read so far, which is intentional.

> src/java.base/share/native/libjava/io_util.c line 201:
> 
>> 199: write_size = len < buf_size ? len : buf_size;
>> 200: (*env)->GetByteArrayRegion(env, bytes, off, write_size, 
>> (jbyte*)buf);
>> 201: if (!(*env)->ExceptionOccurred(env)) {
> 
> Wouldn't this result in an infinite loop if `GetByteArrayRegion` triggered an 
> exception and `len > 0` ?  It would never enter the `if` and `len` is never 
> changed then...

Good catch, you are correct. Fixed in 10f14bb3faef2b55ab59a85016874d12815f3c87.

-

PR: https://git.openjdk.java.net/jdk/pull/8235