Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-04-16 Thread Andrew Haley
On 02/18/2015 08:59 PM, Vladimir Kozlov wrote:
 The code which eliminates MemBars for scalarized objects was added in jdk8:
 
 http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67

Right enough, but it only works with boxed objects.  The
Precedent of the MemBarNode is needed by MemBarNode::Ideal,
and it's checked for:

  // Eliminate volatile MemBars for scalar replaced objects.
  if (can_reshape  req() == (Precedent+1)) {
... think about eliminating the MemBar

So if there's no Precedent, none of the barrier elimination is done.

The only thing that sets the MemBar's Precedent is here:

In parse::do_put_xxx

// Preserve allocation ptr to create precedent edge to it in membar
// generated on exit from constructor.
if (C-eliminate_boxing() 
adr_type-isa_oopptr()  
adr_type-is_oopptr()-is_ptr_to_boxed_value() 
AllocateNode::Ideal_allocation(obj, _gvn) != NULL) {
  set_alloc_with_final(obj);
}

The barrier is created in parse1, and uses alloc_with_final:

  if (method()-is_initializer() 
(wrote_final() ||
   PPC64_ONLY(wrote_volatile() ||)
   (AlwaysSafeConstructors  wrote_fields( {
_exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final());

So, it looks to me as though even the most trivial user-defined
constructors with final fields will never eliminate barriers.

I don't know what the thinking is here.  Why does it matter whether
the type being constructed is a boxed value?

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-04-16 Thread Vladimir Kozlov
Because that code was added and tested only for boxed objects (goal of 
6934604) - I wanted to avoid wider effects of those changes.


I think we can remove the limitation now in jd9 sources since we have 
enough time to tests it.


Regards,
Vladimir

On 4/16/15 10:07 AM, Andrew Haley wrote:

On 02/18/2015 08:59 PM, Vladimir Kozlov wrote:

The code which eliminates MemBars for scalarized objects was added in jdk8:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67


Right enough, but it only works with boxed objects.  The
Precedent of the MemBarNode is needed by MemBarNode::Ideal,
and it's checked for:

   // Eliminate volatile MemBars for scalar replaced objects.
   if (can_reshape  req() == (Precedent+1)) {
 ... think about eliminating the MemBar

So if there's no Precedent, none of the barrier elimination is done.

The only thing that sets the MemBar's Precedent is here:

In parse::do_put_xxx

 // Preserve allocation ptr to create precedent edge to it in membar
 // generated on exit from constructor.
 if (C-eliminate_boxing() 
 adr_type-isa_oopptr()  adr_type-is_oopptr()-is_ptr_to_boxed_value() 

 AllocateNode::Ideal_allocation(obj, _gvn) != NULL) {
   set_alloc_with_final(obj);
 }

The barrier is created in parse1, and uses alloc_with_final:

   if (method()-is_initializer() 
 (wrote_final() ||
PPC64_ONLY(wrote_volatile() ||)
(AlwaysSafeConstructors  wrote_fields( {
 _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final());

So, it looks to me as though even the most trivial user-defined
constructors with final fields will never eliminate barriers.

I don't know what the thinking is here.  Why does it matter whether
the type being constructed is a boxed value?

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Andrew Haley
On 02/18/2015 10:01 AM, Andrew Dinn wrote:
 On 17/02/15 19:21, Vitaly Davidovich wrote:
 IMO I don't think such barriers should be removed just because EA is able
 to elide the heap allocation.
 
 Why not? Are you assuming that the programmer might be relying on a
 memory barrier being implied in interpreted/JITted code by the presence
 in the source of an allocation? If so then I am not sure the Java memory
 model justifies that assumption, especially so in the case EA optimises.

It doesn't.

There are essentially two ways to prevent unsafe publication of
objects with final fields: either emit a barrier at the end of the
constructor or track the reference to the newly-constructed object
until it is stored in memory.  That store to memory can be a releasing
store.  If the object does not escape that releasing store can be
eliminated.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Andrew Haley
On 02/18/2015 09:15 AM, Andrew Haley wrote:
 On 18/02/15 09:14, Florian Weimer wrote:
 Wow, looks nice.  What OpenJDK build did you use?  I want to see if this
 happens on x86_64, too.
 
 I'm working on JDK9.  You don't have this code yet.  I'll do an x86
 build.

  0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry
; - 
java.nio.HeapByteBuffer::init@-1 (line 84)
; - java.nio.ByteBuffer::wrap@7 
(line 373)
; - java.nio.ByteBuffer::wrap@4 
(line 396)
; - 
bytebuffertests.ByteBufferTests3::getLong@1 (line 23)
; implicit exception: 
dispatches to 0x7f2948acbff5
  ;; B2: #  B5 B3 - B1  Freq: 0.99

  ;; MEMBAR-release ! (empty encoding)

  0x7f2948acbf90: test   %ecx,%ecx
  0x7f2948acbf92: jl 0x7f2948acbfb5  ;*iflt
; - 
java.nio.Buffer::checkIndex@1 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  ;; B3: #  B6 B4 - B2  Freq: 0.99

  0x7f2948acbf94: mov%r10d,%ebp
  0x7f2948acbf97: sub%ecx,%ebp  ;*isub
; - 
java.nio.Buffer::checkIndex@10 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  0x7f2948acbf99: cmp$0x8,%ebp
  0x7f2948acbf9c: jl 0x7f2948acbfd5  ;*if_icmple
; - 
java.nio.Buffer::checkIndex@11 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  ;; B4: #  N95 - B3  Freq: 0.98

  0x7f2948acbf9e: movslq %ecx,%r10
  0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax
  0x7f2948acbfa6: bswap  %rax   ;*invokestatic reverseBytes
; - java.nio.Bits::swap@1 (line 
61)
; - 
java.nio.HeapByteBuffer::getLong@41 (line 466)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

So, just the same except that there is no explicit fence instruction
to remove.  It's a shame for AArch64 because that fence really kills
performance but it's bad for x86 too.  Even on machines that don't
emit fence instructions the fence still acts as a compiler barrier.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Vladimir Kozlov

The code which eliminates MemBars for scalarized objects was added in jdk8:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67

An other store barrier change was also pushed into jdk8:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/fcf521c3fbc6

I don't remember we did anything special with membars in jdk9.

Regards,
Vladimir

On 2/18/15 6:27 AM, Vitaly Davidovich wrote:

Indeed, that's quite nice and not what I saw in java 7 so good to see that
this case is EA'd out.  Does anyone know if there was EA work done in java
9 or is this simply inlining policy change that makes EA work (as John
alluded to)?

sent from my phone
On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote:


On 02/18/2015 09:15 AM, Andrew Haley wrote:

On 18/02/15 09:14, Florian Weimer wrote:

Wow, looks nice.  What OpenJDK build did you use?  I want to see if this
happens on x86_64, too.


I'm working on JDK9.  You don't have this code yet.  I'll do an x86
build.


   0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry
 ; -
java.nio.HeapByteBuffer::init@-1 (line 84)
 ; -
java.nio.ByteBuffer::wrap@7 (line 373)
 ; -
java.nio.ByteBuffer::wrap@4 (line 396)
 ; -
bytebuffertests.ByteBufferTests3::getLong@1 (line 23)
 ; implicit exception:
dispatches to 0x7f2948acbff5
   ;; B2: #  B5 B3 - B1  Freq: 0.99

   ;; MEMBAR-release ! (empty encoding)

   0x7f2948acbf90: test   %ecx,%ecx
   0x7f2948acbf92: jl 0x7f2948acbfb5  ;*iflt
 ; -
java.nio.Buffer::checkIndex@1 (line 545)
 ; -
java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   ;; B3: #  B6 B4 - B2  Freq: 0.99

   0x7f2948acbf94: mov%r10d,%ebp
   0x7f2948acbf97: sub%ecx,%ebp  ;*isub
 ; -
java.nio.Buffer::checkIndex@10 (line 545)
 ; -
java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   0x7f2948acbf99: cmp$0x8,%ebp
   0x7f2948acbf9c: jl 0x7f2948acbfd5  ;*if_icmple
 ; -
java.nio.Buffer::checkIndex@11 (line 545)
 ; -
java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   ;; B4: #  N95 - B3  Freq: 0.98

   0x7f2948acbf9e: movslq %ecx,%r10
   0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax
   0x7f2948acbfa6: bswap  %rax   ;*invokestatic reverseBytes
 ; - java.nio.Bits::swap@1
(line 61)
 ; -
java.nio.HeapByteBuffer::getLong@41 (line 466)
 ; -
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

So, just the same except that there is no explicit fence instruction
to remove.  It's a shame for AArch64 because that fence really kills
performance but it's bad for x86 too.  Even on machines that don't
emit fence instructions the fence still acts as a compiler barrier.

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Vitaly Davidovich
Thanks Vladimir.  I was actually asking about the ByteBuffer elimination
itself; when I tried Andrew's example on 7u60 (i.e. a single method with a
ByteBuffer.wrap(...).getLong(...)), the ByteBuffer allocation was not
removed.

On Wed, Feb 18, 2015 at 3:59 PM, Vladimir Kozlov vladimir.koz...@oracle.com
 wrote:

 The code which eliminates MemBars for scalarized objects was added in jdk8:

 http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67

 An other store barrier change was also pushed into jdk8:

 http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/fcf521c3fbc6

 I don't remember we did anything special with membars in jdk9.

 Regards,
 Vladimir


 On 2/18/15 6:27 AM, Vitaly Davidovich wrote:

 Indeed, that's quite nice and not what I saw in java 7 so good to see that
 this case is EA'd out.  Does anyone know if there was EA work done in java
 9 or is this simply inlining policy change that makes EA work (as John
 alluded to)?

 sent from my phone
 On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote:

  On 02/18/2015 09:15 AM, Andrew Haley wrote:

 On 18/02/15 09:14, Florian Weimer wrote:

 Wow, looks nice.  What OpenJDK build did you use?  I want to see if
 this
 happens on x86_64, too.


 I'm working on JDK9.  You don't have this code yet.  I'll do an x86
 build.


0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry
  ; -
 java.nio.HeapByteBuffer::init@-1 (line 84)
  ; -
 java.nio.ByteBuffer::wrap@7 (line 373)
  ; -
 java.nio.ByteBuffer::wrap@4 (line 396)
  ; -
 bytebuffertests.ByteBufferTests3::getLong@1 (line 23)
  ; implicit exception:
 dispatches to 0x7f2948acbff5
;; B2: #  B5 B3 - B1  Freq: 0.99

;; MEMBAR-release ! (empty encoding)

0x7f2948acbf90: test   %ecx,%ecx
0x7f2948acbf92: jl 0x7f2948acbfb5  ;*iflt
  ; -
 java.nio.Buffer::checkIndex@1 (line 545)
  ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
  ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

;; B3: #  B6 B4 - B2  Freq: 0.99

0x7f2948acbf94: mov%r10d,%ebp
0x7f2948acbf97: sub%ecx,%ebp  ;*isub
  ; -
 java.nio.Buffer::checkIndex@10 (line 545)
  ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
  ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

0x7f2948acbf99: cmp$0x8,%ebp
0x7f2948acbf9c: jl 0x7f2948acbfd5  ;*if_icmple
  ; -
 java.nio.Buffer::checkIndex@11 (line 545)
  ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
  ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

;; B4: #  N95 - B3  Freq: 0.98

0x7f2948acbf9e: movslq %ecx,%r10
0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax
0x7f2948acbfa6: bswap  %rax   ;*invokestatic
 reverseBytes
  ; -
 java.nio.Bits::swap@1
 (line 61)
  ; -
 java.nio.HeapByteBuffer::getLong@41 (line 466)
  ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

 So, just the same except that there is no explicit fence instruction
 to remove.  It's a shame for AArch64 because that fence really kills
 performance but it's bad for x86 too.  Even on machines that don't
 emit fence instructions the fence still acts as a compiler barrier.

 Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Andrew Dinn
On 17/02/15 19:21, Vitaly Davidovich wrote:
 IMO I don't think such barriers should be removed just because EA is able
 to elide the heap allocation.

Why not? Are you assuming that the programmer might be relying on a
memory barrier being implied in interpreted/JITted code by the presence
in the source of an allocation? If so then I am not sure the Java memory
model justifies that assumption, especially so in the case EA optimises.

As I recall, the arguments here and on the concurrency lists for the
presence of a memory barrier at the end of allocation were only /for/ it
as a heuristic to ensure that objects which might be shared would not be
shared before all effects of construction were visible (I may be
misstating that -- you might like to reread it as the arguments on the
concurrency lists I was convinced by :-). In which case, if an object
cannot be shared, indeed need not even be allocated, then there appears
to be no need for such a heuristic.

n.b. if a Java programmer really wants to enforce memory ordering wrt
other threads then Java provides a very simple mechanism for that in
volatile.

regards,


Andrew Dinn
---
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Vitaly Davidovich
I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as
I don't think compiler can prove that it's safe to do so.  When value types
come to be and get scalarized, it may be possible to create cheap
synchronization types that are stack allocated yet are used for
synchronization control.  For example, C# has a SpinLock struct (
https://msdn.microsoft.com/en-us/library/system.threading.spinlock%28v=vs.110%29.aspx
).

Also, I don't think Unsafe induced fences are part of JMM, current or
future (at least I haven't heard that to be the case).

sent from my phone
On Feb 18, 2015 5:11 AM, Andrew Haley a...@redhat.com wrote:

 On 02/18/2015 10:01 AM, Andrew Dinn wrote:
  On 17/02/15 19:21, Vitaly Davidovich wrote:
  IMO I don't think such barriers should be removed just because EA is
 able
  to elide the heap allocation.
 
  Why not? Are you assuming that the programmer might be relying on a
  memory barrier being implied in interpreted/JITted code by the presence
  in the source of an allocation? If so then I am not sure the Java memory
  model justifies that assumption, especially so in the case EA optimises.

 It doesn't.

 There are essentially two ways to prevent unsafe publication of
 objects with final fields: either emit a barrier at the end of the
 constructor or track the reference to the newly-constructed object
 until it is stored in memory.  That store to memory can be a releasing
 store.  If the object does not escape that releasing store can be
 eliminated.

 Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Vitaly Davidovich
Indeed, that's quite nice and not what I saw in java 7 so good to see that
this case is EA'd out.  Does anyone know if there was EA work done in java
9 or is this simply inlining policy change that makes EA work (as John
alluded to)?

sent from my phone
On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote:

 On 02/18/2015 09:15 AM, Andrew Haley wrote:
  On 18/02/15 09:14, Florian Weimer wrote:
  Wow, looks nice.  What OpenJDK build did you use?  I want to see if this
  happens on x86_64, too.
 
  I'm working on JDK9.  You don't have this code yet.  I'll do an x86
  build.

   0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry
 ; -
 java.nio.HeapByteBuffer::init@-1 (line 84)
 ; -
 java.nio.ByteBuffer::wrap@7 (line 373)
 ; -
 java.nio.ByteBuffer::wrap@4 (line 396)
 ; -
 bytebuffertests.ByteBufferTests3::getLong@1 (line 23)
 ; implicit exception:
 dispatches to 0x7f2948acbff5
   ;; B2: #  B5 B3 - B1  Freq: 0.99

   ;; MEMBAR-release ! (empty encoding)

   0x7f2948acbf90: test   %ecx,%ecx
   0x7f2948acbf92: jl 0x7f2948acbfb5  ;*iflt
 ; -
 java.nio.Buffer::checkIndex@1 (line 545)
 ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   ;; B3: #  B6 B4 - B2  Freq: 0.99

   0x7f2948acbf94: mov%r10d,%ebp
   0x7f2948acbf97: sub%ecx,%ebp  ;*isub
 ; -
 java.nio.Buffer::checkIndex@10 (line 545)
 ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   0x7f2948acbf99: cmp$0x8,%ebp
   0x7f2948acbf9c: jl 0x7f2948acbfd5  ;*if_icmple
 ; -
 java.nio.Buffer::checkIndex@11 (line 545)
 ; -
 java.nio.HeapByteBuffer::getLong@18 (line 465)
 ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

   ;; B4: #  N95 - B3  Freq: 0.98

   0x7f2948acbf9e: movslq %ecx,%r10
   0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax
   0x7f2948acbfa6: bswap  %rax   ;*invokestatic reverseBytes
 ; - java.nio.Bits::swap@1
 (line 61)
 ; -
 java.nio.HeapByteBuffer::getLong@41 (line 466)
 ; -
 bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

 So, just the same except that there is no explicit fence instruction
 to remove.  It's a shame for AArch64 because that fence really kills
 performance but it's bad for x86 too.  Even on machines that don't
 emit fence instructions the fence still acts as a compiler barrier.

 Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread David M. Lloyd

It's an awful lot of pain to avoid what IMO should be an obvious addition:

(Short|Character|Integer|Long).(get|put)(Little|Big)EndianBytes([value,] 
byte[] b, int offs)


This could (it seems to me) be easily intrinsified, is hugely useful 
both within and outside of the JDK, and fits perfectly well in the 
family of integral bit manipulations such as:


Integer.bitCount()
Integer.highestOneBit()
Integer.rotate*()
etc.

On 02/18/2015 08:16 AM, Vitaly Davidovich wrote:

I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as
I don't think compiler can prove that it's safe to do so.  When value types
come to be and get scalarized, it may be possible to create cheap
synchronization types that are stack allocated yet are used for
synchronization control.  For example, C# has a SpinLock struct (
https://msdn.microsoft.com/en-us/library/system.threading.spinlock%28v=vs.110%29.aspx
).

Also, I don't think Unsafe induced fences are part of JMM, current or
future (at least I haven't heard that to be the case).

sent from my phone
On Feb 18, 2015 5:11 AM, Andrew Haley a...@redhat.com wrote:


On 02/18/2015 10:01 AM, Andrew Dinn wrote:

On 17/02/15 19:21, Vitaly Davidovich wrote:

IMO I don't think such barriers should be removed just because EA is

able

to elide the heap allocation.


Why not? Are you assuming that the programmer might be relying on a
memory barrier being implied in interpreted/JITted code by the presence
in the source of an allocation? If so then I am not sure the Java memory
model justifies that assumption, especially so in the case EA optimises.


It doesn't.

There are essentially two ways to prevent unsafe publication of
objects with final fields: either emit a barrier at the end of the
constructor or track the reference to the newly-constructed object
until it is stored in memory.  That store to memory can be a releasing
store.  If the object does not escape that releasing store can be
eliminated.

Andrew.



--
- DML


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Andrew Haley
On 02/18/2015 02:32 PM, David M. Lloyd wrote:
 It's an awful lot of pain to avoid what IMO should be an obvious addition:
 
 (Short|Character|Integer|Long).(get|put)(Little|Big)EndianBytes([value,] 
 byte[] b, int offs)
 
 This could (it seems to me) be easily intrinsified, is hugely useful 
 both within and outside of the JDK,

Sure, I get that, but it's a new API.  Once I've finished this work,
implementing such an API would be trivial.  I have no objection to it
in principle, but making ByteBuffer.map() operations work well is
worth doing too.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Vitaly Davidovich
Ok, perhaps I misunderstood then since you mentioned Unsafe.storeFence() in
your earlier post and Vladimir said they were debating whether these fences
should be removed.  If you guys were talking only about the final field
fence, then my bad, I don't disagree with removing those if the object
doesn't escape.

On Wed, Feb 18, 2015 at 10:26 AM, Andrew Haley a...@redhat.com wrote:

 On 02/18/2015 02:16 PM, Vitaly Davidovich wrote:
  I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed
 as
  I don't think compiler can prove that it's safe to do so.

 Nobody thinks that explicit barriers (i.e. Unsafe.xxxFence) should be
 removed.

 We're talking about fences at the end of constructors which have final
 fields.  These should be removed if the object does not escape.

 Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-18 Thread Andrew Haley
On 02/18/2015 02:16 PM, Vitaly Davidovich wrote:
 I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as
 I don't think compiler can prove that it's safe to do so.

Nobody thinks that explicit barriers (i.e. Unsafe.xxxFence) should be
removed.

We're talking about fences at the end of constructors which have final
fields.  These should be removed if the object does not escape.

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Vitaly Davidovich
FWIW, when I checked, ByteBuffer.wrap(...).getXXX() type of code didn't EA
out the BB; it seems the call chain is too complex for EA.  I checked 7u60
though, so maybe newer versions are different.

sent from my phone
On Feb 17, 2015 5:53 AM, Andrew Haley a...@redhat.com wrote:

 On 02/17/2015 10:49 AM, Florian Weimer wrote:
  On 02/17/2015 11:22 AM, Andrew Haley wrote:
  You'll still have to allocate a wrapping ByteBuffer object to use them.
   I expect that makes them unattractive in many cases.
 
  Hmm.  I'm having a hard time trying to understand why.  If you need to
  do a lot of accesses the allocation of the ByteBuffer won't be
  significant; if you don't need to do a lot of accesses it won't
  matter either.
 
  The typical use case I have in mind is exemplified by
  com.sun.crypto.provider.GHASH(processBlock(byte[] data, int ofs):
 
   174 private void processBlock(byte[] data, int ofs) {
   175 if (data.length - ofs  AES_BLOCK_SIZE) {
   176 throw new RuntimeException(need complete block);
   177 }
   178 state0 ^= getLong(data, ofs);
   179 state1 ^= getLong(data, ofs + 8);
   180 blockMult(subkeyH0, subkeyH1);
   181 }
 
  That is, the byte array is supplied by the caller, and if we wanted to
  use a ByteBuffer, we would have to allocate a fresh one on every
  iteration.  In this case, neither of the two alternatives you list apply.

 I see.  So the question could also be whether escape analysis would
 notice that a ByteBuffer does not escape.  I hope to know that soon.

 Andrew.





Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Andrew Haley
On 02/17/2015 10:53 AM, Andrew Haley wrote:
 I see.  So the question could also be whether escape analysis would
 notice that a ByteBuffer does not escape.  I hope to know that soon.

Close but no cigar.

long getLong(byte[] bytes, int i) {
return ByteBuffer.wrap(bytes).getLong(i);
}

Everything gets inlined nicely and the ByteBuffer is not created, but
a store fence remains because of the final fields in HeapByteBuffer.

So the resulting code for getLong (minus the prologue and epilogue) looks like 
this:

  0x03ff7426dc34: ldr   w11, [x2,#12]   ;*arraylength
; - java.nio.ByteBuffer::wrap@3 
(line 396)
; - 
bytebuffertests.ByteBufferTests3::getLong@1 (line 23)
; implicit exception: 
dispatches to 0x03ff7426dca4
  ;; B2: #  B5 B3 - B1  Freq: 0.99

  0x03ff7426dc38: dmb   ish ;*synchronization entry
; - 
java.nio.HeapByteBuffer::init@-1 (line 84)
; - java.nio.ByteBuffer::wrap@7 
(line 373)
; - java.nio.ByteBuffer::wrap@4 
(line 396)
; - 
bytebuffertests.ByteBufferTests3::getLong@1 (line 23)

  0x03ff7426dc3c: sub   w12, w11, w3;*isub
; - 
java.nio.Buffer::checkIndex@10 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  0x03ff7426dc40: cmp   w3, #0x0
  0x03ff7426dc44: b.lt  0x03ff7426dc70  ;*iflt
; - 
java.nio.Buffer::checkIndex@1 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  ;; B3: #  B6 B4 - B2  Freq: 0.99

  0x03ff7426dc48: cmp   w12, #0x8
  0x03ff7426dc4c: b.lt  0x03ff7426dc88  ;*if_icmple
; - 
java.nio.Buffer::checkIndex@11 (line 545)
; - 
java.nio.HeapByteBuffer::getLong@18 (line 465)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

  ;; B4: #  N92 - B3  Freq: 0.98

  0x03ff7426dc50: add   x10, x2, w3, sxtw
  0x03ff7426dc54: ldr   x10, [x10,#16]
  0x03ff7426dc58: rev   x0, x10 ;*invokestatic reverseBytes
; - java.nio.Bits::swap@1 (line 
61)
; - 
java.nio.HeapByteBuffer::getLong@41 (line 466)
; - 
bytebuffertests.ByteBufferTests3::getLong@5 (line 23)

If it weren't for the stray DMB ISH it'd be almost perfect.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Ulf Zibis


Am 17.02.2015 um 11:53 schrieb Andrew Haley:

On 02/17/2015 10:49 AM, Florian Weimer wrote:

That is, the byte array is supplied by the caller, and if we wanted to
use a ByteBuffer, we would have to allocate a fresh one on every
iteration.  In this case, neither of the two alternatives you list apply.

I see.  So the question could also be whether escape analysis would
notice that a ByteBuffer does not escape.  I hope to know that soon.


See:
https://bugs.openjdk.java.net/browse/JDK-6908239
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6914113

-Ulf



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Ulf Zibis

Am 17.02.2015 um 04:35 schrieb John Rose:

On Feb 14, 2015, at 12:01 AM, Andrew Haley a...@redhat.com wrote:

On 02/14/2015 12:09 AM, John Rose wrote:

We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
second-best mechanism the platform offers.

Indeed.  I'm intending to prototype a design for those next week.  OK?

Yes, please.  — John


+1
I guess, also sun.nio.cs coders could benefit from that.

-Ulf



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Vitaly Davidovich
IMO I don't think such barriers should be removed just because EA is able
to elide the heap allocation.

On Tue, Feb 17, 2015 at 2:15 PM, Vladimir Kozlov vladimir.koz...@oracle.com
 wrote:

 There was discussion should we remove such barriers or not because they
 create memory operations ordering which could be different if we remove
 them.

 To eliminate them we need to add 'precedent' edge to store's membar as we
 do, for example, for loads:

   if (field-is_volatile()) {
 // Memory barrier includes bogus read of value to force load BEFORE
 membar
 insert_mem_bar(Op_MemBarAcquire, ld);
   }

 MemBarNode::Ideal() will do elimination.

 Regards,
 Vladimir


 On 2/17/15 10:58 AM, Andrew Haley wrote:

 On 02/17/2015 06:42 PM, John Rose wrote:

 The remaining store fence is probably a bug.  A store fence for
 scalarized (lifted-out-of-memory) final fields should go away, since the
 fields are not actually stored in heap memory.


 After inlining how would escape analysis know that the store fence is
 associated with final fields rather than, say, an explicit
 Unsafe.storeFence() ?

 Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread John Rose
On Feb 17, 2015, at 6:22 AM, Andrew Haley a...@redhat.com wrote:
 
 Everything gets inlined nicely and the ByteBuffer is not created, but
 a store fence remains because of the final fields in HeapByteBuffer.

Wow, that got closer to the goal than I expected.  In general, the EA analysis 
can fail at random because of vagaries of inlining policy.

The remaining store fence is probably a bug.  A store fence for scalarized 
(lifted-out-of-memory) final fields should go away, since the fields are not 
actually stored in heap memory.

I filed JDK-8073358 to track.

BTW, we already elide synch. ops on scalarized (non-stored) objects.  Fence 
elision is a similar optimization.

— John

P.S.  Value types will come with scalarization always-on, so even if a call 
goes out of line, the value's fields can be kept out of the heap.

One of the projected use cases of values is safe encapsulation for complex 
pointers (native or in-object).

Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Andrew Haley
On 02/17/2015 06:42 PM, John Rose wrote:
 The remaining store fence is probably a bug.  A store fence for scalarized 
 (lifted-out-of-memory) final fields should go away, since the fields are not 
 actually stored in heap memory.

After inlining how would escape analysis know that the store fence is
associated with final fields rather than, say, an explicit
Unsafe.storeFence() ?

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Vitaly Davidovich
What do you mean exactly? I don't think inlining hides anything, so the
explicit fence should still be there for EA to see (and preserve).

sent from my phone
On Feb 17, 2015 1:58 PM, Andrew Haley a...@redhat.com wrote:

 On 02/17/2015 06:42 PM, John Rose wrote:
  The remaining store fence is probably a bug.  A store fence for
 scalarized (lifted-out-of-memory) final fields should go away, since the
 fields are not actually stored in heap memory.

 After inlining how would escape analysis know that the store fence is
 associated with final fields rather than, say, an explicit
 Unsafe.storeFence() ?

 Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Vladimir Kozlov
There was discussion should we remove such barriers or not because they 
create memory operations ordering which could be different if we remove 
them.


To eliminate them we need to add 'precedent' edge to store's membar as 
we do, for example, for loads:


  if (field-is_volatile()) {
// Memory barrier includes bogus read of value to force load BEFORE 
membar

insert_mem_bar(Op_MemBarAcquire, ld);
  }

MemBarNode::Ideal() will do elimination.

Regards,
Vladimir

On 2/17/15 10:58 AM, Andrew Haley wrote:

On 02/17/2015 06:42 PM, John Rose wrote:

The remaining store fence is probably a bug.  A store fence for scalarized 
(lifted-out-of-memory) final fields should go away, since the fields are not 
actually stored in heap memory.


After inlining how would escape analysis know that the store fence is
associated with final fields rather than, say, an explicit
Unsafe.storeFence() ?

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Andrew Haley
On 02/17/2015 10:15 AM, Florian Weimer wrote:
 On 02/17/2015 11:00 AM, Andrew Haley wrote:
 On 02/17/2015 09:39 AM, Florian Weimer wrote:
 On 02/14/2015 01:09 AM, John Rose wrote:
 These queries need to go into Unsafe.
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.

 The safe variants should go into the java.lang.Integer etc. classes
 IMHO.  Even the JDK has quite a few uses for them (particularly the
 big endian variant).  Putting that into Unsafe only encourages
 further use of Unsafe from application code.

 They'll all be visible as ByteBuffer methods, which should be enough
 for application code, shouldn't it?  I'm not sure how much sense it
 makes to put them into java.lang.Integer etc.
 
 You'll still have to allocate a wrapping ByteBuffer object to use them.
  I expect that makes them unattractive in many cases.

Hmm.  I'm having a hard time trying to understand why.  If you need to
do a lot of accesses the allocation of the ByteBuffer won't be
significant; if you don't need to do a lot of accesses it won't
matter either.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Florian Weimer
On 02/17/2015 11:22 AM, Andrew Haley wrote:
 You'll still have to allocate a wrapping ByteBuffer object to use them.
  I expect that makes them unattractive in many cases.
 
 Hmm.  I'm having a hard time trying to understand why.  If you need to
 do a lot of accesses the allocation of the ByteBuffer won't be
 significant; if you don't need to do a lot of accesses it won't
 matter either.

The typical use case I have in mind is exemplified by
com.sun.crypto.provider.GHASH(processBlock(byte[] data, int ofs):

 174 private void processBlock(byte[] data, int ofs) {
 175 if (data.length - ofs  AES_BLOCK_SIZE) {
 176 throw new RuntimeException(need complete block);
 177 }
 178 state0 ^= getLong(data, ofs);
 179 state1 ^= getLong(data, ofs + 8);
 180 blockMult(subkeyH0, subkeyH1);
 181 }

That is, the byte array is supplied by the caller, and if we wanted to
use a ByteBuffer, we would have to allocate a fresh one on every
iteration.  In this case, neither of the two alternatives you list apply.

-- 
Florian Weimer / Red Hat Product Security


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Florian Weimer
On 02/14/2015 01:09 AM, John Rose wrote:
 These queries need to go into Unsafe.
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.

The safe variants should go into the java.lang.Integer etc. classes
IMHO.  Even the JDK has quite a few uses for them (particularly the big
endian variant).  Putting that into Unsafe only encourages further use
of Unsafe from application code.

-- 
Florian Weimer / Red Hat Product Security


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Florian Weimer
On 02/14/2015 11:10 PM, Dean Long wrote:

 Even if linux-aarch64 always allows unaligned, checking only for
 aarch64 is not future-proof
 because it doesn't take the OS into account.

Surely a simple test case is sufficient to ensure that the platform
supports misaligned accesses?  Then new ports will see the failure
immediately and can tweak the code.

-- 
Florian Weimer / Red Hat Product Security


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Florian Weimer
On 02/17/2015 11:00 AM, Andrew Haley wrote:
 On 02/17/2015 09:39 AM, Florian Weimer wrote:
 On 02/14/2015 01:09 AM, John Rose wrote:
 These queries need to go into Unsafe.
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.

 The safe variants should go into the java.lang.Integer etc. classes
 IMHO.  Even the JDK has quite a few uses for them (particularly the
 big endian variant).  Putting that into Unsafe only encourages
 further use of Unsafe from application code.
 
 They'll all be visible as ByteBuffer methods, which should be enough
 for application code, shouldn't it?  I'm not sure how much sense it
 makes to put them into java.lang.Integer etc.

You'll still have to allocate a wrapping ByteBuffer object to use them.
 I expect that makes them unattractive in many cases.

Hmm, maybe I should propose a patch for DataInputStream and see how it's
received. :-)

-- 
Florian Weimer / Red Hat Product Security


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-17 Thread Andrew Haley
On 02/17/2015 09:39 AM, Florian Weimer wrote:
 On 02/14/2015 01:09 AM, John Rose wrote:
 These queries need to go into Unsafe.
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.
 
 The safe variants should go into the java.lang.Integer etc. classes
 IMHO.  Even the JDK has quite a few uses for them (particularly the
 big endian variant).  Putting that into Unsafe only encourages
 further use of Unsafe from application code.

They'll all be visible as ByteBuffer methods, which should be enough
for application code, shouldn't it?  I'm not sure how much sense it
makes to put them into java.lang.Integer etc.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-16 Thread John Rose
On Feb 14, 2015, at 12:01 AM, Andrew Haley a...@redhat.com wrote:
 
 On 02/14/2015 12:09 AM, John Rose wrote:
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.
 
 Indeed.  I'm intending to prototype a design for those next week.  OK?

Yes, please.  — John

Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-16 Thread Alan Bateman

On 14/02/2015 22:10, Dean Long wrote:


Even if linux-aarch64 always allows unaligned, checking only for 
aarch64 is not future-proof
because it doesn't take the OS into account.  However, I really don't 
like having to enumerate
all relevant platforms in multiple places in shared code, so I 
disagree with the existing code
and with perpetuating the pattern.  As long as the decision is in 
platform-specific code, a build-time

decision may be entirely appropriate.
This alignment test in Bits.java has been there for a long time (JDK 
1.4). It's technical debt that hasn't surfaces very often as it's so 
rare to add architectures. If Unsafe gets a method to test the alignment 
then it would be great to get Bits changed.


-Alan


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-16 Thread Andrew Haley
On 02/16/2015 11:02 AM, Alan Bateman wrote:
 On 14/02/2015 22:10, Dean Long wrote:

 Even if linux-aarch64 always allows unaligned, checking only for 
 aarch64 is not future-proof
 because it doesn't take the OS into account.  However, I really don't 
 like having to enumerate
 all relevant platforms in multiple places in shared code, so I 
 disagree with the existing code
 and with perpetuating the pattern.  As long as the decision is in 
 platform-specific code, a build-time
 decision may be entirely appropriate.

 This alignment test in Bits.java has been there for a long time (JDK 
 1.4). It's technical debt that hasn't surfaces very often as it's so 
 rare to add architectures. If Unsafe gets a method to test the alignment 
 then it would be great to get Bits changed.

Hopefully it's getting less rare to add architectures!

I'll do that as part of my patch.

Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-16 Thread Andrew Haley
On 14/02/15 22:10, Dean Long wrote:
 On 2/14/2015 12:07 AM, Andrew Haley wrote:
 On 02/13/2015 10:52 PM, Dean Long wrote:

 My understanding is that whether or not aarch64 allows unaligned
 accesses is based on a system setting, so this change is too
 simplistic.

 Disabling unaligned access would be a really perverse thing to do, and
 I suspect that GCC and glibc already assume that unaligned accesses
 work so it would require a recompilation of libjvm (and probably the
 whole OS) to make it work.  However, if you really think there's a
 point to making this a runtime flag I won't resist.
 
 Even if linux-aarch64 always allows unaligned, checking only for
 aarch64 is not future-proof because it doesn't take the OS into
 account. 

Sure, but we can't predict all the crazy things that writers of future
operating systems might do.

 However, I really don't like having to enumerate all relevant
 platforms in multiple places in shared code, so I disagree with the
 existing code and with perpetuating the pattern.  As long as the
 decision is in platform-specific code, a build-time decision may be
 entirely appropriate.

That makes sense.  I don't like the way that the decision is hidden in
shared code either: if it had been in a more obvious place I would
have found it earlier.  I'll have a look at writing an Unsafe method
which does the right thing.

Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-14 Thread Dean Long

On 2/14/2015 12:07 AM, Andrew Haley wrote:

On 02/13/2015 10:52 PM, Dean Long wrote:


My understanding is that whether or not aarch64 allows unaligned
accesses is based on a system setting, so this change is too
simplistic.

Disabling unaligned access would be a really perverse thing to do, and
I suspect that GCC and glibc already assume that unaligned accesses
work so it would require a recompilation of libjvm (and probably the
whole OS) to make it work.  However, if you really think there's a
point to making this a runtime flag I won't resist.

Andrew.


Even if linux-aarch64 always allows unaligned, checking only for 
aarch64 is not future-proof
because it doesn't take the OS into account.  However, I really don't 
like having to enumerate
all relevant platforms in multiple places in shared code, so I disagree 
with the existing code
and with perpetuating the pattern.  As long as the decision is in 
platform-specific code, a build-time

decision may be entirely appropriate.

dl


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-14 Thread Andrew Haley
On 02/14/2015 12:09 AM, John Rose wrote:
 We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
 second-best mechanism the platform offers.

Indeed.  I'm intending to prototype a design for those next week.  OK?

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-14 Thread Andrew Haley
On 02/13/2015 10:52 PM, Dean Long wrote:

 My understanding is that whether or not aarch64 allows unaligned
 accesses is based on a system setting, so this change is too
 simplistic.

Disabling unaligned access would be a really perverse thing to do, and
I suspect that GCC and glibc already assume that unaligned accesses
work so it would require a recompilation of libjvm (and probably the
whole OS) to make it work.  However, if you really think there's a
point to making this a runtime flag I won't resist.

Andrew.


RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Andrew Haley
java.​nio.​DirectByteBuffer.getXXX is slow for types larger than byte
because the runtime does not know that AArch64 can perform unaligned
memory accesses.

The problem is due to this code in java.nio.Bits.unaligned():

unaligned = arch.equals(i386) || arch.equals(x86)
|| arch.equals(amd64) || arch.equals(x86_64);

If we add AArch64 to this list code quality is very much improved.

http://cr.openjdk.java.net/~aph/8073093/

Thanks,
Andrew.


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Andrew Haley
On 02/13/2015 04:05 PM, Alan Bateman wrote:
 On 13/02/2015 13:38, Andrew Haley wrote:
 java.​nio.​DirectByteBuffer.getXXX is slow for types larger than byte
 because the runtime does not know that AArch64 can perform unaligned
 memory accesses.

 The problem is due to this code in java.nio.Bits.unaligned():

  unaligned = arch.equals(i386) || arch.equals(x86)
  || arch.equals(amd64) || arch.equals(x86_64);

 If we add AArch64 to this list code quality is very much improved.

 http://cr.openjdk.java.net/~aph/8073093/

 Make sense, I assume this will go in when JEP 237 is pushed.

It will, but I need approval to push to the JEP 237 staging repo.
'Cos them's the rules.  :-)

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Alan Bateman

On 13/02/2015 13:38, Andrew Haley wrote:

java.​nio.​DirectByteBuffer.getXXX is slow for types larger than byte
because the runtime does not know that AArch64 can perform unaligned
memory accesses.

The problem is due to this code in java.nio.Bits.unaligned():

 unaligned = arch.equals(i386) || arch.equals(x86)
 || arch.equals(amd64) || arch.equals(x86_64);

If we add AArch64 to this list code quality is very much improved.

http://cr.openjdk.java.net/~aph/8073093/


Make sense, I assume this will go in when JEP 237 is pushed.

-Alan


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Vladimir Kozlov

Changes are fine.

I agree with Alan. Please, wait when we merge aarch64 stage into jdk9/dev and then push this fix into jdk9 (by sponsor). 
We should finish testing of stage repo soon.


Thanks,
Vladimir

On 2/13/15 8:07 AM, Andrew Haley wrote:

On 02/13/2015 04:05 PM, Alan Bateman wrote:

On 13/02/2015 13:38, Andrew Haley wrote:

java.​nio.​DirectByteBuffer.getXXX is slow for types larger than byte
because the runtime does not know that AArch64 can perform unaligned
memory accesses.

The problem is due to this code in java.nio.Bits.unaligned():

  unaligned = arch.equals(i386) || arch.equals(x86)
  || arch.equals(amd64) || arch.equals(x86_64);

If we add AArch64 to this list code quality is very much improved.

http://cr.openjdk.java.net/~aph/8073093/


Make sense, I assume this will go in when JEP 237 is pushed.


It will, but I need approval to push to the JEP 237 staging repo.
'Cos them's the rules.  :-)

Andrew.



Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Dean Long
My understanding is that whether or not aarch64 allows unaligned 
accesses is based on a
system setting, so this change is too simplistic.  I would prefer that 
this was controlled with

something more flexible, like sun.cpu.unaligned.

dl

On 2/13/2015 5:38 AM, Andrew Haley wrote:

java.​nio.​DirectByteBuffer.getXXX is slow for types larger than byte
because the runtime does not know that AArch64 can perform unaligned
memory accesses.

The problem is due to this code in java.nio.Bits.unaligned():

 unaligned = arch.equals(i386) || arch.equals(x86)
 || arch.equals(amd64) || arch.equals(x86_64);

If we add AArch64 to this list code quality is very much improved.

http://cr.openjdk.java.net/~aph/8073093/

Thanks,
Andrew.




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Christos Zoulas
On Feb 13,  2:52pm, dean.l...@oracle.com (Dean Long) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer 

| My understanding is that whether or not aarch64 allows unaligned=20
| accesses is based on a
| system setting, so this change is too simplistic.  I would prefer that=20
| this was controlled with
| something more flexible, like sun.cpu.unaligned.

So does x86_64 and you can ask the CPU if it is enabled... I am not sure
if a variable setting makes sense because if alignment is required you
get a signal (BUS error -- hi linux, SEGV), or incorrect results.

christos


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Dean Long
There is a system register bit to read, but I don't think it can be 
accessed by an application, only the kernel.
If the OS won't provide this information, you could do something similar 
to safeFetchN and catch the

resulting SIGBUS.

dl

On 2/13/2015 4:05 PM, Vladimir Kozlov wrote:
x86 has flag UseUnalignedLoadStores which is set to true depending on 
which version of CPU VM runs. The CPU version is determined based on 
CPUID instruction results.


Does AARCH64 has something similar?

Regards,
Vladimir

On 2/13/15 3:41 PM, Dean Long wrote:

On 2/13/2015 3:04 PM, chris...@zoulas.com wrote:

On Feb 13,  2:52pm, dean.l...@oracle.com (Dean Long) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for
ByteBuffer

| My understanding is that whether or not aarch64 allows unaligned=20
| accesses is based on a
| system setting, so this change is too simplistic.  I would prefer
that=20
| this was controlled with
| something more flexible, like sun.cpu.unaligned.

So does x86_64 and you can ask the CPU if it is enabled... I am not 
sure

if a variable setting makes sense because if alignment is required you
get a signal (BUS error -- hi linux, SEGV), or incorrect results.

christos


So it sounds like we need to determine if unaligned accesses are
supported during startup,
in a platform-specific way.  This could be exposed through a property
like I suggested,
or perhaps a new Unsafe method.

Regarding x86_64, there may be places in the JVM that already assume
unaligned accesses
are allowed, so disabling them may completely break the JVM until those
assumptions
are fixed.

dl




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread John Rose
These queries need to go into Unsafe.
We also need Unsafe.getIntMisaligned, etc., which wire through to whatever 
second-best mechanism the platform offers.
— John

Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Vladimir Kozlov

On 2/13/15 4:22 PM, Dean Long wrote:

There is a system register bit to read, but I don't think it can be
accessed by an application, only the kernel.
If the OS won't provide this information, you could do something similar
to safeFetchN and catch the
resulting SIGBUS.


Yes, I agree it could be done this way too.
On x86 we trigger SEGV to verify that OS's signal handler correctly 
save/restore AVX registers so we can use them.


Vladimir



dl

On 2/13/2015 4:05 PM, Vladimir Kozlov wrote:

x86 has flag UseUnalignedLoadStores which is set to true depending on
which version of CPU VM runs. The CPU version is determined based on
CPUID instruction results.

Does AARCH64 has something similar?

Regards,
Vladimir

On 2/13/15 3:41 PM, Dean Long wrote:

On 2/13/2015 3:04 PM, chris...@zoulas.com wrote:

On Feb 13,  2:52pm, dean.l...@oracle.com (Dean Long) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for
ByteBuffer

| My understanding is that whether or not aarch64 allows unaligned=20
| accesses is based on a
| system setting, so this change is too simplistic.  I would prefer
that=20
| this was controlled with
| something more flexible, like sun.cpu.unaligned.

So does x86_64 and you can ask the CPU if it is enabled... I am not
sure
if a variable setting makes sense because if alignment is required you
get a signal (BUS error -- hi linux, SEGV), or incorrect results.

christos


So it sounds like we need to determine if unaligned accesses are
supported during startup,
in a platform-specific way.  This could be exposed through a property
like I suggested,
or perhaps a new Unsafe method.

Regarding x86_64, there may be places in the JVM that already assume
unaligned accesses
are allowed, so disabling them may completely break the JVM until those
assumptions
are fixed.

dl




Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Dean Long

On 2/13/2015 3:04 PM, chris...@zoulas.com wrote:

On Feb 13,  2:52pm, dean.l...@oracle.com (Dean Long) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer

| My understanding is that whether or not aarch64 allows unaligned=20
| accesses is based on a
| system setting, so this change is too simplistic.  I would prefer that=20
| this was controlled with
| something more flexible, like sun.cpu.unaligned.

So does x86_64 and you can ask the CPU if it is enabled... I am not sure
if a variable setting makes sense because if alignment is required you
get a signal (BUS error -- hi linux, SEGV), or incorrect results.

christos


So it sounds like we need to determine if unaligned accesses are 
supported during startup,
in a platform-specific way.  This could be exposed through a property 
like I suggested,

or perhaps a new Unsafe method.

Regarding x86_64, there may be places in the JVM that already assume 
unaligned accesses
are allowed, so disabling them may completely break the JVM until those 
assumptions

are fixed.

dl


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Vladimir Kozlov
x86 has flag UseUnalignedLoadStores which is set to true depending on 
which version of CPU VM runs. The CPU version is determined based on 
CPUID instruction results.


Does AARCH64 has something similar?

Regards,
Vladimir

On 2/13/15 3:41 PM, Dean Long wrote:

On 2/13/2015 3:04 PM, chris...@zoulas.com wrote:

On Feb 13,  2:52pm, dean.l...@oracle.com (Dean Long) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for
ByteBuffer

| My understanding is that whether or not aarch64 allows unaligned=20
| accesses is based on a
| system setting, so this change is too simplistic.  I would prefer
that=20
| this was controlled with
| something more flexible, like sun.cpu.unaligned.

So does x86_64 and you can ask the CPU if it is enabled... I am not sure
if a variable setting makes sense because if alignment is required you
get a signal (BUS error -- hi linux, SEGV), or incorrect results.

christos


So it sounds like we need to determine if unaligned accesses are
supported during startup,
in a platform-specific way.  This could be exposed through a property
like I suggested,
or perhaps a new Unsafe method.

Regarding x86_64, there may be places in the JVM that already assume
unaligned accesses
are allowed, so disabling them may completely break the JVM until those
assumptions
are fixed.

dl


Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses

2015-02-13 Thread Christos Zoulas
On Feb 13,  4:29pm, vladimir.koz...@oracle.com (Vladimir Kozlov) wrote:
-- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer 

| On 2/13/15 4:22 PM, Dean Long wrote:
|  There is a system register bit to read, but I don't think it can be
|  accessed by an application, only the kernel.
|  If the OS won't provide this information, you could do something similar
|  to safeFetchN and catch the
|  resulting SIGBUS.
| 
| Yes, I agree it could be done this way too.
| On x86 we trigger SEGV to verify that OS's signal handler correctly 
| save/restore AVX registers so we can use them.

It is PSL_AC (0x4) and it is accessible by applications. Now
if it works or not depends on the flavor of the x86... As I mentioned
before there are implementations (for example pre-arm-v6 flavors)
where unaligned accesses don't signal (but don't work). There is
an even 3rd category where unaligned accesses trap, but the kernel
can fix them if the binary is marked specially (sparc with misaligned
for example).

The portable to verify what's going on is to do the misaligned
access and see if it works (dealing with SIGBUS/SIGSEGV).  Even
then (even when it works) you might not want to do it because of
performance reasons (for example when the kernel fixes it).

christos