Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-03-12 Thread Paul Sandoz
Hi Peter,

Thanks, well spotted. I adjusted to:

  try {
  return finalized.get();
  } finally {
Reference.reachabilityFence(o);
  }

I also created this issue:
 
  https://bugs.openjdk.java.net/browse/JDK-8199462
  http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/

I will push today or tomorrow.

Paul.

> On Mar 11, 2018, at 1:03 AM, Peter Levart  wrote:
> 
> Hi,
> 
> On 03/02/18 18:15, Paul Sandoz wrote:
>> Thanks!
>> Paul.
>> 
>> 
>>> On Mar 2, 2018, at 9:11 AM, Vladimir Ivanov 
>>>  wrote:
>>> 
>>> 
>>> 
>>> On 3/2/18 8:01 PM, Paul Sandoz wrote:
>>> 
 Here’s an update Ben and I tweaked:
   
 http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html
 
 I think this looks good but would still like to double check with Vladimir 
 that the @ForceInline is not problematic.
 
>>> I confirm that my previous analysis [1] still applies when method is marked 
>>> w/ @ForceInline.
>>> 
>>> Best regards,
>>> Vladimir Ivanov
>>> 
>>> [1] 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-February/051312.html
> 
> 
> I was going to suggest to add a test for that JIT assumption, but I see 
> there's already a test called ReachabilityFenceTest that should catch a 
> change in JIT behavior that would break reachabilityFence(). I spotted a flaw 
> in that test. See method fenced():
> 
> public static boolean fenced() {
> AtomicBoolean finalized = new AtomicBoolean();
> MyFinalizeable o = new MyFinalizeable(finalized);
> 
> for (int i = 0; i < LOOP_ITERS; i++) {
> if (finalized.get()) break;
> if (i > WARMUP_LOOP_ITERS) {
> System.gc();
> System.runFinalization();
> }
> }
> 
> Reference.reachabilityFence(o);
> 
> return finalized.get();
> }
> 
> 
> The last two statements should be reversed or else the test could produce a 
> false alarm:
> 
> 
> boolean fin = finalized.get(); 
> Reference.reachabilityFence(o);
> return fin;
> 
> 
> Regards, Peter
> 



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-03-11 Thread Peter Levart

Hi,

On 03/02/18 18:15, Paul Sandoz wrote:

Thanks!
Paul.


On Mar 2, 2018, at 9:11 AM, Vladimir Ivanov  
wrote:



On 3/2/18 8:01 PM, Paul Sandoz wrote:

Here’s an update Ben and I tweaked:
   
http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html
I think this looks good but would still like to double check with Vladimir that 
the @ForceInline is not problematic.

I confirm that my previous analysis [1] still applies when method is marked w/ 
@ForceInline.

Best regards,
Vladimir Ivanov

[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-February/051312.html



I was going to suggest to add a test for that JIT assumption, but I see 
there's already a test called ReachabilityFenceTest that should catch a 
change in JIT behavior that would break reachabilityFence(). I spotted a 
flaw in that test. See method fenced():


    public static boolean fenced() {
    AtomicBoolean finalized = new AtomicBoolean();
    MyFinalizeable o = new MyFinalizeable(finalized);

    for (int i = 0; i < LOOP_ITERS; i++) {
    if (finalized.get()) break;
    if (i > WARMUP_LOOP_ITERS) {
    System.gc();
    System.runFinalization();
    }
    }

    Reference.reachabilityFence(o);

    return finalized.get();
    }


The last two statements should be reversed or else the test could 
produce a false alarm:



        boolean fin = finalized.get();
        Reference.reachabilityFence(o);
        return fin;


Regards, Peter



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-03-02 Thread Paul Sandoz
Thanks!
Paul.

> On Mar 2, 2018, at 9:11 AM, Vladimir Ivanov  
> wrote:
> 
> 
> 
> On 3/2/18 8:01 PM, Paul Sandoz wrote:
>> Here’s an update Ben and I tweaked:
>>   
>> http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html
>> I think this looks good but would still like to double check with Vladimir 
>> that the @ForceInline is not problematic.
> 
> I confirm that my previous analysis [1] still applies when method is marked 
> w/ @ForceInline.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-February/051312.html
> 
>>> On Feb 26, 2018, at 6:50 PM, Paul Sandoz  wrote:
>>> 
>>> Hi Ben,
>>> 
>>> Here is the webrev online:
>>> 
>>>  
>>> http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html
>>> 



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-03-02 Thread Vladimir Ivanov



On 3/2/18 8:01 PM, Paul Sandoz wrote:

Here’s an update Ben and I tweaked:

   
http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html

I think this looks good but would still like to double check with Vladimir that 
the @ForceInline is not problematic.


I confirm that my previous analysis [1] still applies when method is 
marked w/ @ForceInline.


Best regards,
Vladimir Ivanov

[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-February/051312.html



On Feb 26, 2018, at 6:50 PM, Paul Sandoz  wrote:

Hi Ben,

Here is the webrev online:

  
http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html






Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-03-02 Thread Paul Sandoz
Here’s an update Ben and I tweaked:

  
http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html

I think this looks good but would still like to double check with Vladimir that 
the @ForceInline is not problematic.

Paul.

> On Feb 26, 2018, at 6:50 PM, Paul Sandoz  wrote:
> 
> Hi Ben,
> 
> Here is the webrev online:
> 
>  
> http://cr.openjdk.java.net/~psandoz/jdk/buffer-reachability-fence/webrev/index.html
> 




Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-26 Thread Paul Sandoz
 ByteBuffer bb = bbC.getByteBuffer();
> 
>bb.position(0);
> 
>int sum = 0;
> 
>for (int i = 0; i < L; i++) {
>sum += bb.getInt();
>}
> 
>return sum;
> 
>}
> 
> }
> 
> Results :
> 
> - Unmodified Build -
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 29677205.748 ± 544721.142  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 18219951.454 ± 320724.793  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 7767650.826 ± 121798.910  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1646075.010 ±   9804.499  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 183489.418 ±   1355.967  ops/s
> 
> - Build With Reference.reachabilityFences Added -
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 15230086.695 ± 390174.190  ops/s  -48.681%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 8126310.728 ± 123661.342  ops/s  -55.399%
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 1582699.233 ±   7278.744  ops/s  -79.624%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 179726.465 ±802.333  ops/s  -89.082%
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 18327.049 
> ±  9.506  ops/s  -90.012%
> 
> - Build With Reference.reachabilityFences Added And DontInline Replaced 
> With ForceInline -
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 29839190.147 ± 576585.796  ops/s  +0.546%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 18397768.759 ± 338144.327  ops/s  +0.976%
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 7746079.875 ± 101621.105  ops/s  -0.278%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1629413.444 ±  24163.399  ops/s  -1.012%
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 182250.811 ±   2028.461  ops/s  -0.675%
> 
> - Build With Reference.reachabilityFences Added And DontInline Removed -
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 29442980.464 ± 556324.877  ops/s  -0.789%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 18401757.539 ± 419383.901  ops/s  +0.998%
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 7816766.062 ± 100144.611  ops/s  +0.632%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1636811.564 ±  13811.447  ops/s  -0.563%
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 183463.292 ±   2056.016  ops/s  -0.014%
> 
> 
> Regards,
> Ben
> 
> 
> 
> From:   Paul Sandoz <paul.san...@oracle.com>
> To: Ben Walsh <ben_wa...@uk.ibm.com>
> Cc: core-libs-dev <core-libs-dev@openjdk.java.net>
> Date:   08/02/2018 16:54
> Subject:Re: [PATCH] Reduce Chance Of Mistakenly Early Backing 
> Memory Cleanup
> 
> 
> 
> Hi Ben,
> 
> Thanks. I anticipated a performance hit but not necessarily a 10x. Without 
> looking at the generated code of the benchmark method it is hard to be 
> sure [*], but i believe the fence is interfering with loop unrolling 
> and/or vectorization, the comparative differences between byte and int may 
> be related to vectorization (for byte there may be less or limited support 
> for vectorization).
> 
> How about we now try another experiment commenting out the @DontInline on 
> the fence method and re-run the benchmarks. From Peter’s observations and 
> Vladimir’s analysis we should be able to remove that, or even, contrary to 
> what we initial expected when adding this feature, change to @ForceInline!
> 
> Thanks,
> Paul.
> 
> [*] If you are running on linux you can use the excellent JMH perfasm 
> feature to dump the hot parts of HotSpots generated code.
> 
>> On Feb 8, 2018, at 8:22 AM, Ben Walsh <ben_wa...@uk.ibm.com> wrote:
>> 
>> Hi Paul,
>> 
>> Following up with the requested loop and vectorization benchmarks ...
>> 
>> 
>> (Do the vectorization benchmark results imply that the Hotspot compiler 
>> has been unable to perform the vectorization optimisat

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-21 Thread Paul Sandoz
Hi Ben,

Much better :-) thanks for doing this.

Before preceding with replacing @DontInline with @ForceInline i would like to 
appeal to Vladimir to confirm this is ok.

If you send me an attached patch directly in email i can upload it as a webrev 
for you, it should be easier to review. (When you have two sponsored 
contributions you can request the author role which gives you an account on 
JIRA and to the cr system so you can upload webrevs [2] yourself). 

Paul.

[1] http://openjdk.java.net/projects/#project-author 

[2] http://openjdk.java.net/guide/webrevHelp.html 


> On Feb 19, 2018, at 8:37 AM, Ben Walsh  wrote:
> 
> As requested, here are the results with modifications to the annotations 
> on Reference.reachabilityFence. Much more promising …



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-19 Thread Ben Walsh
200 
144990.569 ±   1518.877  ops/s  -0.668%

- Build With Reference.reachabilityFences Added And DontInline Removed -

Benchmark(L)   Mode  Cnt Score  
Error  Units  Impact
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
29842696.017 ± 462902.634  ops/s  +1.841%
ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
24842729.069 ± 436174.452  ops/s  +2.398%
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
8518393.953 ± 129254.536  ops/s  +0.071%
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
1344772.370 ±  15916.867  ops/s  +1.588%
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
145087.256 ±   1277.491  ops/s  -0.602%


* Benchmark 3 *

Test Code :

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

import java.nio.ByteBuffer;

@State(Scope.Benchmark)
public class ByteBufferBenchmark {

@Param({"1", "10", "100", "1000", "1"})
public int L;

@State(Scope.Benchmark)
public static class ByteBufferContainer {

ByteBuffer bb;

@Setup(Level.Invocation)
public void initByteBuffer() {
bb = ByteBuffer.allocateDirect(4 * 1);
 
for (int i = 0; i < 1; i++) {
bb.putInt(i);
}
}

ByteBuffer getByteBuffer() {
return bb;
}

}

@Benchmark
public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
 
ByteBuffer bb = bbC.getByteBuffer();

bb.position(0);

int sum = 0;

for (int i = 0; i < L; i++) {
sum += bb.getInt();
}

return sum;

}

}

Results :

- Unmodified Build -

Benchmark(L)   Mode  Cnt Score  
Error  Units
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
29677205.748 ± 544721.142  ops/s
ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
18219951.454 ± 320724.793  ops/s
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
7767650.826 ± 121798.910  ops/s
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
1646075.010 ±   9804.499  ops/s
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
183489.418 ±   1355.967  ops/s

- Build With Reference.reachabilityFences Added -

Benchmark(L)   Mode  Cnt Score  
Error  Units  Impact
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
15230086.695 ± 390174.190  ops/s  -48.681%
ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
8126310.728 ± 123661.342  ops/s  -55.399%
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
1582699.233 ±   7278.744  ops/s  -79.624%
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
179726.465 ±802.333  ops/s  -89.082%
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 18327.049 
±  9.506  ops/s  -90.012%

- Build With Reference.reachabilityFences Added And DontInline Replaced 
With ForceInline -

Benchmark(L)   Mode  Cnt Score  
Error  Units  Impact
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
29839190.147 ± 576585.796  ops/s  +0.546%
ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
18397768.759 ± 338144.327  ops/s  +0.976%
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
7746079.875 ± 101621.105  ops/s  -0.278%
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
1629413.444 ±  24163.399  ops/s  -1.012%
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
182250.811 ±   2028.461  ops/s  -0.675%

- Build With Reference.reachabilityFences Added And DontInline Removed -

Benchmark(L)   Mode  Cnt Score  
Error  Units  Impact
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
29442980.464 ± 556324.877  ops/s  -0.789%
ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
18401757.539 ± 419383.901  ops/s  +0.998%
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
7816766.062 ± 100144.611  ops/s  +0.632%
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
1636811.564 ±  13811.447  ops/s  -0.563%
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
183463.292 ±   2056.016  ops/s  -0.014%


Regards,
Ben



From:   Paul Sandoz <paul.san...@oracle.com>
To: Ben Walsh <ben_wa...@uk.ibm.com>
Cc: core-libs-dev <core-libs-dev@openjdk.java.net>
Date:   08/02/2018 16:54
Subject:Re: [PATCH] Reduce Chance Of Mistakenly Early Backing 
Memory Cleanup



Hi Ben,

Thanks. I ant

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-08 Thread Paul Sandoz
Hi Ben,

Thanks. I anticipated a performance hit but not necessarily a 10x. Without 
looking at the generated code of the benchmark method it is hard to be sure 
[*], but i believe the fence is interfering with loop unrolling and/or 
vectorization, the comparative differences between byte and int may be related 
to vectorization (for byte there may be less or limited support for 
vectorization).

How about we now try another experiment commenting out the @DontInline on the 
fence method and re-run the benchmarks. From Peter’s observations and 
Vladimir’s analysis we should be able to remove that, or even, contrary to what 
we initial expected when adding this feature, change to @ForceInline!

Thanks,
Paul.

[*] If you are running on linux you can use the excellent JMH perfasm feature 
to dump the hot parts of HotSpots generated code.

> On Feb 8, 2018, at 8:22 AM, Ben Walsh  wrote:
> 
> Hi Paul,
> 
> Following up with the requested loop and vectorization benchmarks ...
> 
> 
> (Do the vectorization benchmark results imply that the Hotspot compiler 
> has been unable to perform the vectorization optimisation due to the 
> presence of the reachabilityFence ?)
> 
> 
> ---
> 
> 
> Loop Benchmarking
>  
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>@Param({"1", "10", "100", "1000", "1"})
>public int L;
> 
>@State(Scope.Benchmark)
>public static class ByteBufferContainer {
> 
>ByteBuffer bb;
> 
>@Setup(Level.Invocation)
>public void initByteBuffer() {
>bb = ByteBuffer.allocateDirect(1);
>}
> 
>ByteBuffer getByteBuffer() {
>return bb;
>}
>}
> 
>@Benchmark
>public ByteBuffer benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>ByteBuffer bb = bbC.getByteBuffer();
> 
>for (int i = 0; i < L; i++) {
>bb.put((byte)i);
>}
> 
>return bb;
>}
> 
> }
> 
> 
> Without Changes
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 29303145.752 ± 635979.750  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 24260859.017 ± 528891.303  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 8512366.637 ± 136615.070  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 1323756.037 ±  21485.369  ops/s
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 145965.305 ±   1301.469  ops/s
> 
> 
> With Changes
> 
> Benchmark(L)   Mode  Cnt Score  
> Error  Units  Impact
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 
> 28893540.122 ± 754554.747  ops/s  -1.398%
> ByteBufferBenchmark.benchmark_byte_buffer_put 10  thrpt  200 
> 15317696.355 ± 231621.608  ops/s  -36.863%
> ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
> 2546599.578 ±  32136.873  ops/s  -70.084%
> ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
> 288832.514 ±   3854.522  ops/s  -78.181%
> ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 29747.386 
> ±214.831  ops/s  -79.620%
> 
> 
> ---
> 
> 
> Vectorization Benchmarking
> - 
> 
> package org.sample;
> 
> import org.openjdk.jmh.annotations.Benchmark;
> import org.openjdk.jmh.annotations.Level;
> import org.openjdk.jmh.annotations.Param;
> import org.openjdk.jmh.annotations.Scope;
> import org.openjdk.jmh.annotations.Setup;
> import org.openjdk.jmh.annotations.State;
> 
> import java.nio.ByteBuffer;
> 
> @State(Scope.Benchmark)
> public class ByteBufferBenchmark {
> 
>@Param({"1", "10", "100", "1000", "1"})
>public int L;
> 
>@State(Scope.Benchmark)
>public static class ByteBufferContainer {
> 
>ByteBuffer bb;
> 
>@Setup(Level.Invocation)
>public void initByteBuffer() {
>bb = ByteBuffer.allocateDirect(4 * 1);
> 
>for (int i = 0; i < 1; i++) {
>bb.putInt(i);
>}
>}
> 
>ByteBuffer getByteBuffer() {
>return bb;
>}
> 
>}
> 
>@Benchmark
>public int benchmark_byte_buffer_put(ByteBufferContainer bbC) {
> 
>ByteBuffer bb = bbC.getByteBuffer();
> 
>bb.position(0);
> 
> 

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-08 Thread Ben Walsh
9%
ByteBufferBenchmark.benchmark_byte_buffer_put100  thrpt  200 
1582699.233 ±   7278.744  ops/s  -79.624%
ByteBufferBenchmark.benchmark_byte_buffer_put   1000  thrpt  200 
179726.465 ±802.333  ops/s  -89.082%
ByteBufferBenchmark.benchmark_byte_buffer_put  1  thrpt  200 18327.049 
±  9.506  ops/s  -90.012%



NB : For reference - for this and previous benchmarking results ...

"Without Changes" and "With Changes" - java -version ...

openjdk version "10-internal" 2018-03-20
OpenJDK Runtime Environment (build 10-internal+0-adhoc.walshbp.jdk)
OpenJDK 64-Bit Server VM (build 10-internal+0-adhoc.walshbp.jdk, mixed 
mode)


---


Regards,
Ben Walsh




From:   Ben Walsh/UK/IBM
To:     Paul Sandoz <paul.san...@oracle.com>
Cc:     core-libs-dev <core-libs-dev@openjdk.java.net>
Date:   05/02/2018 16:47
Subject:Re: [PATCH] Reduce Chance Of Mistakenly Early Backing 
Memory Cleanup


Running with the following test under the JMH benchmark framework (never 
used this before, so please do point out any issues with the test) ...

---

package org.sample;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

import java.nio.ByteBuffer;

public class ByteBufferBenchmark {

@State(Scope.Benchmark)
public static class ByteBufferContainer {

ByteBuffer bb;

@Setup(Level.Invocation)
public void initByteBuffer() {
bb = ByteBuffer.allocateDirect(1);
}

ByteBuffer getByteBuffer() {
return bb;
}
}

@Benchmark
public void benchmark_byte_buffer_put(ByteBufferContainer bbC) {
 
bbC.getByteBuffer().put((byte)42);
}

}

---

... I get the following results with and without the changes ...

[9walsbp@dolphin ~]$ ./build_with/jdk/bin/java -jar benchmarks.jar 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.openjdk.jmh.util.Utils 
(file:/home/9walsbp/benchmarks.jar) to field java.io.Console.cs
WARNING: Please consider reporting this to the maintainers of 
org.openjdk.jmh.util.Utils
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
# JMH version: 1.20
# VM version: JDK 10-internal, VM 10-internal+0-adhoc.walshbp.jdk
# VM invoker: /home/9walsbp/build_with/jdk/bin/java
# VM options: 
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.sample.ByteBufferBenchmark.benchmark_byte_buffer_put

# Run progress: 0.00% complete, ETA 00:06:40
# Fork: 1 of 10
# Warmup Iteration   1: 28572294.700 ops/s
# Warmup Iteration   2: 33102015.934 ops/s
# Warmup Iteration   3: 32883031.968 ops/s
# Warmup Iteration   4: 32875302.586 ops/s
# Warmup Iteration   5: 33708018.941 ops/s
# Warmup Iteration   6: 34107603.885 ops/s
# Warmup Iteration   7: 31615123.362 ops/s
# Warmup Iteration   8: 34306874.876 ops/s
# Warmup Iteration   9: 32615070.232 ops/s
# Warmup Iteration  10: 34268641.323 ops/s
# Warmup Iteration  11: 32209257.766 ops/s
# Warmup Iteration  12: 34111408.999 ops/s
# Warmup Iteration  13: 33527360.187 ops/s
# Warmup Iteration  14: 34331726.136 ops/s
# Warmup Iteration  15: 34325119.218 ops/s
# Warmup Iteration  16: 31562168.904 ops/s
# Warmup Iteration  17: 33227878.358 ops/s
# Warmup Iteration  18: 34151154.273 ops/s
# Warmup Iteration  19: 29264381.793 ops/s
# Warmup Iteration  20: 30308058.560 ops/s
Iteration   1: 33502237.659 ops/s
Iteration   2: 33591469.068 ops/s
Iteration   3: 29330487.674 ops/s
Iteration   4: 31401627.173 ops/s
Iteration   5: 32346391.980 ops/s
Iteration   6: 34106932.868 ops/s
Iteration   7: 29961218.721 ops/s
Iteration   8: 31166503.731 ops/s
Iteration   9: 34180826.531 ops/s
Iteration  10: 34436432.202 ops/s
Iteration  11: 28003272.167 ops/s
Iteration  12: 30947688.218 ops/s
Iteration  13: 33727404.696 ops/s
Iteration  14: 33794727.241 ops/s
Iteration  15: 34089932.945 ops/s
Iteration  16: 30229615.627 ops/s
Iteration  17: 32747950.250 ops/s
Iteration  18: 34195332.494 ops/s
Iteration  19: 34225966.793 ops/s
Iteration  20: 30253949.366 ops/s

# Run progress: 10.00% complete, ETA 00:27:45
# Fork: 2 of 10
# Warmup Iteration   1: 28189458.524 ops/s
# Warmup Iteration   2: 33385943.637 ops/s
# Warmup Iteration   3: 31474459.740 ops/s

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-07 Thread Vladimir Ivanov



Vladimir, just to be sure I presume your analysis applies to both C1 and C2? 
what about compilers such as Graal?


I only looked at C1 & C2, but I'm sure it applies to Graal as well: GC 
interaction mechanism is the same for all JIT-compilers in HotSpot.


Best regards,
Vladimir Ivanov


Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-07 Thread Paul Sandoz


> On Feb 7, 2018, at 7:22 AM, Vladimir Ivanov  
> wrote:
> 
> Peter,
> 
>>> Objects.requireNonNull() shows zero overhead here.
>>> 
>>> I guess the main question is whether Objects.requireNonNull(this) behavior 
>>> in the former test is a result of chance and current Hotspot behavior or is 
>>> it somehow guaranteed by the spec.
>> I haven't looked into what actually happens in JIT-compilers on your 
>> benchmark, but I'm surprised it works at all.
> 
> So, here's why Objects.requireNonNull() keeps the receiver alive in your test 
> case.
> 
> JIT-compilers in HotSpot aggressively prune dead locals [1], but they do that 
> based on method bytecode analysis [2] (and not on optimized IR). So, any 
> usage of a local extends its live range, even if that usage is eliminated in 
> generated code. It means an oop in that local will live past its last usage 
> in generated code and all safepoints (in generated code) till last usage (on 
> bytecode level) will enumerate the local it is held in.
> 
> If there were GC-only safepoints supported, then JITs could still prune 
> unused locals from oop maps, but HotSpot doesn't have them and all safepoints 
> in generated code keep full JVM state, so it's always possible to deoptimize 
> at any one of them (and then run into the code which is eliminated in 
> generated code).
> 
> If there are no safepoints till the end of the method, then nothing will keep 
> the object alive. But there's no way for GC to collect it, since GCs rely on 
> safepoints to mark thread stack. (That's why I mentioned GC-only safepoints 
> earlier.)
> 
> As a conclusion: removing @DontInline on Reference.reachabilityFence() should 
> eliminate most of the overhead (no call anymore, additional spill may be 
> needed) and still keep it working. It's not guaranteed by JVMS, but at least 
> should work on HotSpot JVM (in its current state).
> 
> So, nice discovery, Peter! :-)

Yes, Peter, thank you, for pushing on this. We would of course need to be 
careful and revisit if any changes occur.

Vladimir, just to be sure I presume your analysis applies to both C1 and C2? 
what about compilers such as Graal?

Ben, i still think additional performance analysis is still valuable (such 
performance tests are also useful for another reason, consolidating unsafe 
accesses using the double addressing mode, thereby removing another difference 
between heap and direct buffers).

Paul.

> Want to file an RFE & fix it?
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] 
> http://hg.openjdk.java.net/jdk/hs/file/45b6aae769cc/src/hotspot/share/opto/graphKit.cpp#l736
> 
> [2] 
> http://hg.openjdk.java.net/jdk/hs/file/45b6aae769cc/src/hotspot/share/compiler/methodLiveness.cpp#l37
> 
>> Explicit null check on the receiver is an easy target for elimination and 
>> should be effectively a no-op in generated code. (And that's what you 
>> observe with the benchmark!) Once the check is gone, nothing keeps receiver 
>> alive anymore (past the last usage).
>> So, I'd say such behavior it's a matter of chance in your case and can't be 
>> relied on in general. And definitely not something guaranteed by JVMS.
>> Best regards,
>> Vladimir Ivanov



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-07 Thread Vladimir Ivanov

Peter,




Objects.requireNonNull() shows zero overhead here.

I guess the main question is whether Objects.requireNonNull(this) 
behavior in the former test is a result of chance and current Hotspot 
behavior or is it somehow guaranteed by the spec.


I haven't looked into what actually happens in JIT-compilers on your 
benchmark, but I'm surprised it works at all.


So, here's why Objects.requireNonNull() keeps the receiver alive in your 
test case.


JIT-compilers in HotSpot aggressively prune dead locals [1], but they do 
that based on method bytecode analysis [2] (and not on optimized IR). 
So, any usage of a local extends its live range, even if that usage is 
eliminated in generated code. It means an oop in that local will live 
past its last usage in generated code and all safepoints (in generated 
code) till last usage (on bytecode level) will enumerate the local it is 
held in.


If there were GC-only safepoints supported, then JITs could still prune 
unused locals from oop maps, but HotSpot doesn't have them and all 
safepoints in generated code keep full JVM state, so it's always 
possible to deoptimize at any one of them (and then run into the code 
which is eliminated in generated code).


If there are no safepoints till the end of the method, then nothing will 
keep the object alive. But there's no way for GC to collect it, since 
GCs rely on safepoints to mark thread stack. (That's why I mentioned 
GC-only safepoints earlier.)


As a conclusion: removing @DontInline on Reference.reachabilityFence() 
should eliminate most of the overhead (no call anymore, additional spill 
may be needed) and still keep it working. It's not guaranteed by JVMS, 
but at least should work on HotSpot JVM (in its current state).


So, nice discovery, Peter! :-) Want to file an RFE & fix it?

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk/hs/file/45b6aae769cc/src/hotspot/share/opto/graphKit.cpp#l736


[2] 
http://hg.openjdk.java.net/jdk/hs/file/45b6aae769cc/src/hotspot/share/compiler/methodLiveness.cpp#l37


Explicit null check on the receiver is an easy target for elimination 
and should be effectively a no-op in generated code. (And that's what 
you observe with the benchmark!) Once the check is gone, nothing keeps 
receiver alive anymore (past the last usage).


So, I'd say such behavior it's a matter of chance in your case and can't 
be relied on in general. And definitely not something guaranteed by JVMS.


Best regards,
Vladimir Ivanov


Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-06 Thread Vladimir Ivanov



Objects.requireNonNull() shows zero overhead here.

I guess the main question is whether Objects.requireNonNull(this) 
behavior in the former test is a result of chance and current Hotspot 
behavior or is it somehow guaranteed by the spec.


I haven't looked into what actually happens in JIT-compilers on your 
benchmark, but I'm surprised it works at all.


Explicit null check on the receiver is an easy target for elimination 
and should be effectively a no-op in generated code. (And that's what 
you observe with the benchmark!) Once the check is gone, nothing keeps 
receiver alive anymore (past the last usage).


So, I'd say such behavior it's a matter of chance in your case and can't 
be relied on in general. And definitely not something guaranteed by JVMS.


Best regards,
Vladimir Ivanov


Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-06 Thread Peter Levart

Hi,

I checked the behavior of Objects.requireNonNull(this) at appropriate 
place instead of Reference.reachabilityFence(this) and it does seem to 
work. For example in the following test (see method m()):



import java.util.Objects;
import java.util.concurrent.atomic.AtomicLong;

public class ReachabilityTest {
    private static final AtomicLong seq = new AtomicLong();
    private static final AtomicLong deallocatedHwm = new AtomicLong();

    private static long allocate() {
    return seq.incrementAndGet();
    }

    private static void deallocate(long address) {
    deallocatedHwm.accumulateAndGet(address, Math::max);
    }

    private static void access(long address) {
    if (deallocatedHwm.get() == address) {
    throw new IllegalStateException(
    "Address " + address + " has allready been deallocated");
    }
    }

    private long address = allocate();

    @SuppressWarnings("deprecation")
    @Override
    protected void finalize() throws Throwable {
    deallocate(address);
    address = 0;
    }

    void m() {
    long a = address;
    if (a != 0) {
    System.gc();
    try { Thread.sleep(1); } catch (InterruptedException e) {}
    access(a);
    // Reference.reachabilityFence(this);
    Objects.requireNonNull(this);
    }
    }

    static void test() {
    new ReachabilityTest().m();
    }

    public static void main(String[] args) {
    while (true) {
    test();
    }
    }
}


...Objects.requireNonNull does prevent otherwise reliable provoked 
failure, just like Reference.reachabilityFence.


As to the speed of optimized-away Objects.requireNonNull() vs. 
Reference.reachabilityFence() I tried the following benchmark:



package jdk.test;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

import java.lang.ref.Reference;
import java.util.Arrays;
import java.util.Objects;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@State(Scope.Thread)
public class ReachabilityBench {

    static class Buf0 {
    final byte[] buf;

    Buf0(int len) {
    buf = new byte[len];
    }

    @SuppressWarnings("deprecation")
    @Override
    protected void finalize() throws Throwable {
    Arrays.fill(buf, (byte) -1);
    }

    byte get(int i) {
    byte[] b = buf;
    // this might get finalized before accessing the b element
    byte r = b[i];
    return r;
    }
    }

    static class Buf1 {
    final byte[] buf;

    Buf1(int len) {
    buf = new byte[len];
    }

    @SuppressWarnings("deprecation")
    @Override
    protected void finalize() throws Throwable {
    Arrays.fill(buf, (byte) -1);
    }

    byte get(int i) {
    byte[] b = buf;
    byte r = b[i];
    Reference.reachabilityFence(this);
    return r;
    }
    }

    static class Buf2 {
    final byte[] buf;

    Buf2(int len) {
    buf = new byte[len];
    }

    @SuppressWarnings("deprecation")
    @Override
    protected void finalize() throws Throwable {
    Arrays.fill(buf, (byte) -1);
    }

    byte get(int i) {
    byte[] b = buf;
    byte r = b[i];
    Objects.requireNonNull(this);
    return r;
    }
    }

    Buf0 buf0;
    Buf1 buf1;
    Buf2 buf2;

    @Param({"64"})
    public int len;

    @Setup(Level.Trial)
    public void setup() {
    buf0 = new Buf0(len);
    buf1 = new Buf1(len);
    buf2 = new Buf2(len);
    }

    @Benchmark
    public int sumBuf0() {
    int s = 0;
    for (int i = 0; i < len; i++) {
    s += buf0.get(i);
    }
    return s;
    }

    @Benchmark
    public int sumBuf1() {
    int s = 0;
    for (int i = 0; i < len; i++) {
    s += buf1.get(i);
    }
    return s;
    }

    @Benchmark
    public int sumBuf2() {
    int s = 0;
    for (int i = 0; i < len; i++) {
    s += buf2.get(i);
    }
    return s;
    }
}


The results are as follows:

Benchmark  (len)  Mode  Cnt    Score   Error  Units
ReachabilityBench.sumBuf0 64  avgt   10   20.530 ± 0.052  ns/op
ReachabilityBench.sumBuf1 64  avgt   10  184.402 ± 0.531  ns/op

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-05 Thread Paul Sandoz
Hi Ben,

Thanks for looking into this.


> On Feb 5, 2018, at 8:52 AM, Ben Walsh  wrote:
> 
> Running with the following test under the JMH benchmark framework (never 
> used this before, so please do point out any issues with the test) …
> 

Your benchmark looks good, i would return the byte buffer thereby avoiding any 
risk of dead code elimination (generally a best practice even if not absolutely 
required). As a follow on you might wanna try measuring a loop e.g.:

  ByteBuffer bb = ...
  for (int i = 0; i < L; i++) {
bb.put((byte)i)
  }
  return bb;

where L is parameterized (see JMH’s @Param annotation), then you can easily 
vary from 1 upwards. Performance might be affected in loops if unrolling and/or 
vectorization is perturbed by the fence (i doubt the compiler will hoist the 
fence out of the loop given the JIT is forced to *not* inline it).

To test for vectorization you could write a test method to get int values from 
the buffer and sum ‘em up returning the sum.

Alas a 7% hit on simple access is not good :-( we really need the JIT to track 
the argument passed to the method.

Thanks,
Paul.

> 
> ---


> Result "org.sample.ByteBufferBenchmark.benchmark_byte_buffer_put":
>  33100911.857 ±(99.9%) 747461.951 ops/s [Average]
>  (min, avg, max) = (25373082.559, 33100911.857, 38885170.177), stdev = 
> 3164800.705
>  CI (99.9%): [32353449.906, 33848373.808] (assumes normal distribution)
> 
> 
> # Run complete. Total time: 00:27:27
> 
> Benchmark   Mode  Cnt Score   
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  33100911.857 ± 
> 747461.951  ops/s
> 
> 


> 
> Result "org.sample.ByteBufferBenchmark.benchmark_byte_buffer_put":
>  35604933.518 ±(99.9%) 654975.515 ops/s [Average]
>  (min, avg, max) = (25558172.378, 35604933.518, 39524804.534), stdev = 
> 2773207.341
>  CI (99.9%): [34949958.003, 36259909.033] (assumes normal distribution)
> 
> 
> # Run complete. Total time: 00:27:51
> 
> Benchmark   Mode  Cnt Score   
> Error  Units
> ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  35604933.518 ± 
> 654975.515  ops/s
> 
> 
> ... So a performance degradation of roughly 7%.
> 
> 
> Regards,
> Ben Walsh
> 



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-05 Thread Ben Walsh
.267 ops/s
Iteration   2: 36719438.475 ops/s
Iteration   3: 33895023.416 ops/s
Iteration   4: 34812011.249 ops/s
Iteration   5: 38906937.001 ops/s
Iteration   6: 38182707.915 ops/s
Iteration   7: 35294815.634 ops/s
Iteration   8: 37426940.293 ops/s
Iteration   9: 37385447.731 ops/s
Iteration  10: 38628608.563 ops/s
Iteration  11: 34323084.222 ops/s
Iteration  12: 35285335.362 ops/s
Iteration  13: 36995943.696 ops/s
Iteration  14: 38944858.911 ops/s
Iteration  15: 38516334.356 ops/s
Iteration  16: 34389934.288 ops/s
Iteration  17: 35448198.837 ops/s
Iteration  18: 38375706.441 ops/s
Iteration  19: 38841034.693 ops/s
Iteration  20: 38600776.337 ops/s


Result "org.sample.ByteBufferBenchmark.benchmark_byte_buffer_put":
  35604933.518 ±(99.9%) 654975.515 ops/s [Average]
  (min, avg, max) = (25558172.378, 35604933.518, 39524804.534), stdev = 
2773207.341
  CI (99.9%): [34949958.003, 36259909.033] (assumes normal distribution)


# Run complete. Total time: 00:27:51

Benchmark   Mode  Cnt Score   
Error  Units
ByteBufferBenchmark.benchmark_byte_buffer_put  thrpt  200  35604933.518 ± 
654975.515  ops/s


... So a performance degradation of roughly 7%.


Regards,
Ben Walsh





From:   Paul Sandoz <paul.san...@oracle.com>
To: Ben Walsh <ben_wa...@uk.ibm.com>
Cc: core-libs-dev <core-libs-dev@openjdk.java.net>
Date:   01/02/2018 22:50
Subject:Re: [PATCH] Reduce Chance Of Mistakenly Early Backing 
Memory Cleanup



Hi Ben,

I don?t think you require the fence in all the cases you have currently 
placed it e.g. here for example

$memtype$ y = $toBits$(x);
UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
+Reference.reachabilityFence(this);
return this;

since ?this? is being returned it will be kept live during the unsafe 
access.

Would you mind providing a JMH benchmark and results for the performance 
with and without the fence for say a put and/or a get. I would like to 
understand the performance impact on HotSpot, this is one reason why we 
have yet to add such fences as it will likely impact performance. 

At the moment we are relying on the method not being inlined, which is an 
expedient technique to make it functional and keep a reference alive but 
not necessarily optimal for usages in DBB.

For more details see:

  https://bugs.openjdk.java.net/browse/JDK-8169605
  https://bugs.openjdk.java.net/browse/JDK-8149610

Thanks,
Paul.

On Feb 1, 2018, at 6:55 AM, Ben Walsh <ben_wa...@uk.ibm.com> wrote:

This contribution forms a partial solution to the problem detailed here - 
http://thevirtualmachinist.blogspot.ca/2011/07/subtle-issue-of-reachability.html

.

In this context, this contribution provides "markers" such that a suitably 

"aware" compiler can reduce the chances of such a problem occurring with 
each of the DirectBuffer objects and MappedByteBuffer 
object. The reachabilityFences may prevent crashes / exceptions due to 
cleaning up the backing memory before the user has finished using the 
pointer. 

Any compiler not suitably "aware" could be modified to make use of the 
"markers".


I would like to pair with a sponsor to contribute this patch ...

---
diff -r d51e64840b4f 
src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template
--- 
a/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template 
Wed Jan 31 12:04:53 2018 +0800
+++ 
b/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template 
Thu Feb 01 11:30:10 2018 +
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights 
reserved.
+ * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights 
reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
@@ -33,15 +33,21 @@

private $type$ get$Type$(long a) {
$memtype$ x = UNSAFE.get$Memtype$Unaligned(null, a, bigEndian);
-return $fromBits$(x);
+$type$ y = $fromBits$(x);
+Reference.reachabilityFence(this);
+return y;
}

public $type$ get$Type$() {
-return get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
+$type$ y = get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
+Reference.reachabilityFence(this);
+return y;
}

public $type$ get$Type$(int i) {
-return get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
+$type$ y = get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
+Reference.reachabilityFence(this);
+return y;
}

#end[rw]
@@ -50,6 +56,7 @@
#if[rw]
$memtype$ y = $toBits$(x);
UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
+Reference.reachabilityFence(this);
return this;
#else[rw]
throw new R

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-02 Thread Peter Levart

Hi,

I have one question, maybe stupid. I'm wondering about one situation. 
Suppose you have this Java code:


void m() {
    // code before...

    Objects.requireNonNull(this);
}

Of course, the non-null check will never throw NPE. The check will most 
likely even be optimized away by JIT. But is JIT allowed to optimize it 
away so that the "code before" observes the effects of 'this' being 
unreachable? Wouldn't that violate the semantics of the program as it 
would 1st observe that some object is not reachable and after that it 
would observer that 'this' still points to some object which by 
definition must be it?


If JIT is not allowed to optimize the null check so that the object 
becomes unreachable, then Objects.requireNonNull could be used as a 
replacement for Reference.reachabilityFence. It might even perform better.


What do you think?

Regards, Peter


On 02/01/18 23:50, Paul Sandoz wrote:

Hi Ben,

I don’t think you require the fence in all the cases you have currently placed 
it e.g. here for example

 $memtype$ y = $toBits$(x);
 UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
+Reference.reachabilityFence(this);
 return this;

since “this” is being returned it will be kept live during the unsafe access.

Would you mind providing a JMH benchmark and results for the performance with 
and without the fence for say a put and/or a get. I would like to understand 
the performance impact on HotSpot, this is one reason why we have yet to add 
such fences as it will likely impact performance.

At the moment we are relying on the method not being inlined, which is an 
expedient technique to make it functional and keep a reference alive but not 
necessarily optimal for usages in DBB.

For more details see:

   https://bugs.openjdk.java.net/browse/JDK-8169605 

   https://bugs.openjdk.java.net/browse/JDK-8149610 


Thanks,
Paul.


On Feb 1, 2018, at 6:55 AM, Ben Walsh  wrote:

This contribution forms a partial solution to the problem detailed here -
http://thevirtualmachinist.blogspot.ca/2011/07/subtle-issue-of-reachability.html
.

In this context, this contribution provides "markers" such that a suitably
"aware" compiler can reduce the chances of such a problem occurring with
each of the DirectBuffer objects and MappedByteBuffer
object. The reachabilityFences may prevent crashes / exceptions due to
cleaning up the backing memory before the user has finished using the
pointer.

Any compiler not suitably "aware" could be modified to make use of the
"markers".


I would like to pair with a sponsor to contribute this patch ...

---
diff -r d51e64840b4f
src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template
---
a/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template
Wed Jan 31 12:04:53 2018 +0800
+++
b/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template
Thu Feb 01 11:30:10 2018 +
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights
reserved.
+ * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights
reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -33,15 +33,21 @@

 private $type$ get$Type$(long a) {
 $memtype$ x = UNSAFE.get$Memtype$Unaligned(null, a, bigEndian);
-return $fromBits$(x);
+$type$ y = $fromBits$(x);
+Reference.reachabilityFence(this);
+return y;
 }

 public $type$ get$Type$() {
-return get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
+$type$ y = get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
+Reference.reachabilityFence(this);
+return y;
 }

 public $type$ get$Type$(int i) {
-return get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
+$type$ y = get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
+Reference.reachabilityFence(this);
+return y;
 }

#end[rw]
@@ -50,6 +56,7 @@
#if[rw]
 $memtype$ y = $toBits$(x);
 UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
+Reference.reachabilityFence(this);
 return this;
#else[rw]
 throw new ReadOnlyBufferException();
diff -r d51e64840b4f
src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
--- a/src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
Wed Jan 31 12:04:53 2018 +0800
+++ b/src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
Thu Feb 01 11:30:10 2018 +
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights
reserved.
+ * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights
reserved.
  * DO NOT ALTER OR REMOVE 

Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-01 Thread Paul Sandoz


> On Feb 1, 2018, at 4:03 PM, Hans Boehm  wrote:
> 
> 
> On Thu, Feb 1, 2018 at 2:50 PM, Paul Sandoz  > wrote:
> >
> > Hi Ben,
> >
> > I don’t think you require the fence in all the cases you have currently 
> > placed it e.g. here for example
> >
> > $memtype$ y = $toBits$(x);
> > UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
> > +Reference.reachabilityFence(this);
> > return this;
> >
> > since “this” is being returned it will be kept live during the unsafe 
> > access.
> 
> I don't see how that's guaranteed by the JLS, specifically 12.6.2. What if 
> this function is inlined into the caller, and the caller drops the result?
> 

Ah yes, my apologies, i was thinking too narrowly. Your document reminds we 
should really use the try/finally pattern for any updates DBB code for reasons 
you suggest even if not strictly needed.


> I would still greatly prefer to see an added mechanism (possibly along the 
> lines of 
> https://docs.google.com/document/d/1yMC4VzZobMQTrVAraMy7xBdWPCJ0p7G5r_43yaqsj-s/edit?usp=sharing
>  
> )
>  that associates the reachability requirements with fields or classes, rather 
> than putting a reachabilityFence() in every method accessing a field. I'm 
> sure there are cases where reachabilityFence() is the preferred mechanism. 
> But for the code I've looked at recently, I haven't actually found one yet. I 
> don't think it is a good mechanism here, either.
> 

IIRC we discussed such approaches before and concluded reachabilityFence was 
the most expedient way to get something functional into developers hands if not 
in the most ideal and expressive way for all cases.


> >
> > Would you mind providing a JMH benchmark and results for the performance 
> > with and without the fence for say a put and/or a get. I would like to 
> > understand the performance impact on HotSpot, this is one reason why we 
> > have yet to add such fences as it will likely impact performance.
> >
> > At the moment we are relying on the method not being inlined, which is an 
> > expedient technique to make it functional and keep a reference alive but 
> > not necessarily optimal for usages in DBB.
> 
> But that should be fixable, right?

I believe so, it comes down to prioritizing and expending some compiler 
engineering effort.

Thanks,
Paul.


> It shouldn't generate any code beyond potentially preventing register or 
> stack slot reuse; the required effect of reachabilityFence (in a conventional 
> implementation) is to ensure that the argument is considered live at any 
> preceding GC points.
> 
> >
> > For more details see:
> >
> >   https://bugs.openjdk.java.net/browse/JDK-8169605 
> >  
> >  > >
> >   https://bugs.openjdk.java.net/browse/JDK-8149610 
> >  
> >  > >
> >
> > Thanks,
> > Paul.



Re: [PATCH] Reduce Chance Of Mistakenly Early Backing Memory Cleanup

2018-02-01 Thread Paul Sandoz
Hi Ben,

I don’t think you require the fence in all the cases you have currently placed 
it e.g. here for example

$memtype$ y = $toBits$(x);
UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
+Reference.reachabilityFence(this);
return this;

since “this” is being returned it will be kept live during the unsafe access.

Would you mind providing a JMH benchmark and results for the performance with 
and without the fence for say a put and/or a get. I would like to understand 
the performance impact on HotSpot, this is one reason why we have yet to add 
such fences as it will likely impact performance. 

At the moment we are relying on the method not being inlined, which is an 
expedient technique to make it functional and keep a reference alive but not 
necessarily optimal for usages in DBB.

For more details see:

  https://bugs.openjdk.java.net/browse/JDK-8169605 

  https://bugs.openjdk.java.net/browse/JDK-8149610 


Thanks,
Paul.

> On Feb 1, 2018, at 6:55 AM, Ben Walsh  wrote:
> 
> This contribution forms a partial solution to the problem detailed here - 
> http://thevirtualmachinist.blogspot.ca/2011/07/subtle-issue-of-reachability.html
> .
> 
> In this context, this contribution provides "markers" such that a suitably 
> "aware" compiler can reduce the chances of such a problem occurring with 
> each of the DirectBuffer objects and MappedByteBuffer 
> object. The reachabilityFences may prevent crashes / exceptions due to 
> cleaning up the backing memory before the user has finished using the 
> pointer. 
> 
> Any compiler not suitably "aware" could be modified to make use of the 
> "markers".
> 
> 
> I would like to pair with a sponsor to contribute this patch ...
> 
> ---
> diff -r d51e64840b4f 
> src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template
> --- 
> a/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template 
> Wed Jan 31 12:04:53 2018 +0800
> +++ 
> b/src/java.base/share/classes/java/nio/Direct-X-Buffer-bin.java.template 
> Thu Feb 01 11:30:10 2018 +
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights 
> reserved.
> + * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights 
> reserved.
>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>  *
>  * This code is free software; you can redistribute it and/or modify it
> @@ -33,15 +33,21 @@
> 
> private $type$ get$Type$(long a) {
> $memtype$ x = UNSAFE.get$Memtype$Unaligned(null, a, bigEndian);
> -return $fromBits$(x);
> +$type$ y = $fromBits$(x);
> +Reference.reachabilityFence(this);
> +return y;
> }
> 
> public $type$ get$Type$() {
> -return get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
> +$type$ y = get$Type$(ix(nextGetIndex($BYTES_PER_VALUE$)));
> +Reference.reachabilityFence(this);
> +return y;
> }
> 
> public $type$ get$Type$(int i) {
> -return get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
> +$type$ y = get$Type$(ix(checkIndex(i, $BYTES_PER_VALUE$)));
> +Reference.reachabilityFence(this);
> +return y;
> }
> 
> #end[rw]
> @@ -50,6 +56,7 @@
> #if[rw]
> $memtype$ y = $toBits$(x);
> UNSAFE.put$Memtype$Unaligned(null, a, y, bigEndian);
> +Reference.reachabilityFence(this);
> return this;
> #else[rw]
> throw new ReadOnlyBufferException();
> diff -r d51e64840b4f 
> src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template
> --- a/src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template 
> Wed Jan 31 12:04:53 2018 +0800
> +++ b/src/java.base/share/classes/java/nio/Direct-X-Buffer.java.template 
> Thu Feb 01 11:30:10 2018 +
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights 
> reserved.
> + * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights 
> reserved.
>  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>  *
>  * This code is free software; you can redistribute it and/or modify it
> @@ -28,6 +28,7 @@
> package java.nio;
> 
> import java.io.FileDescriptor;
> +import java.lang.ref.Reference;
> import jdk.internal.misc.VM;
> import jdk.internal.ref.Cleaner;
> import sun.nio.ch.DirectBuffer;
> @@ -312,6 +313,7 @@
> public $Type$Buffer put($type$ x) {
> #if[rw]
> UNSAFE.put$Swaptype$(ix(nextPutIndex()), $swap$($toBits$(x)));
> +Reference.reachabilityFence(this);
> return this;
> #else[rw]
> throw new ReadOnlyBufferException();
> @@ -321,6 +323,7 @@
> public $Type$Buffer put(int i, $type$ x) {
> #if[rw]
> UNSAFE.put$Swaptype$(ix(checkIndex(i)), $swap$($toBits$(x)));
>