Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/18/2015 08:59 PM, Vladimir Kozlov wrote: The code which eliminates MemBars for scalarized objects was added in jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67 Right enough, but it only works with boxed objects. The Precedent of the MemBarNode is needed by MemBarNode::Ideal, and it's checked for: // Eliminate volatile MemBars for scalar replaced objects. if (can_reshape req() == (Precedent+1)) { ... think about eliminating the MemBar So if there's no Precedent, none of the barrier elimination is done. The only thing that sets the MemBar's Precedent is here: In parse::do_put_xxx // Preserve allocation ptr to create precedent edge to it in membar // generated on exit from constructor. if (C-eliminate_boxing() adr_type-isa_oopptr() adr_type-is_oopptr()-is_ptr_to_boxed_value() AllocateNode::Ideal_allocation(obj, _gvn) != NULL) { set_alloc_with_final(obj); } The barrier is created in parse1, and uses alloc_with_final: if (method()-is_initializer() (wrote_final() || PPC64_ONLY(wrote_volatile() ||) (AlwaysSafeConstructors wrote_fields( { _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final()); So, it looks to me as though even the most trivial user-defined constructors with final fields will never eliminate barriers. I don't know what the thinking is here. Why does it matter whether the type being constructed is a boxed value? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Because that code was added and tested only for boxed objects (goal of 6934604) - I wanted to avoid wider effects of those changes. I think we can remove the limitation now in jd9 sources since we have enough time to tests it. Regards, Vladimir On 4/16/15 10:07 AM, Andrew Haley wrote: On 02/18/2015 08:59 PM, Vladimir Kozlov wrote: The code which eliminates MemBars for scalarized objects was added in jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67 Right enough, but it only works with boxed objects. The Precedent of the MemBarNode is needed by MemBarNode::Ideal, and it's checked for: // Eliminate volatile MemBars for scalar replaced objects. if (can_reshape req() == (Precedent+1)) { ... think about eliminating the MemBar So if there's no Precedent, none of the barrier elimination is done. The only thing that sets the MemBar's Precedent is here: In parse::do_put_xxx // Preserve allocation ptr to create precedent edge to it in membar // generated on exit from constructor. if (C-eliminate_boxing() adr_type-isa_oopptr() adr_type-is_oopptr()-is_ptr_to_boxed_value() AllocateNode::Ideal_allocation(obj, _gvn) != NULL) { set_alloc_with_final(obj); } The barrier is created in parse1, and uses alloc_with_final: if (method()-is_initializer() (wrote_final() || PPC64_ONLY(wrote_volatile() ||) (AlwaysSafeConstructors wrote_fields( { _exits.insert_mem_bar(Op_MemBarRelease, alloc_with_final()); So, it looks to me as though even the most trivial user-defined constructors with final fields will never eliminate barriers. I don't know what the thinking is here. Why does it matter whether the type being constructed is a boxed value? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/18/2015 10:01 AM, Andrew Dinn wrote: On 17/02/15 19:21, Vitaly Davidovich wrote: IMO I don't think such barriers should be removed just because EA is able to elide the heap allocation. Why not? Are you assuming that the programmer might be relying on a memory barrier being implied in interpreted/JITted code by the presence in the source of an allocation? If so then I am not sure the Java memory model justifies that assumption, especially so in the case EA optimises. It doesn't. There are essentially two ways to prevent unsafe publication of objects with final fields: either emit a barrier at the end of the constructor or track the reference to the newly-constructed object until it is stored in memory. That store to memory can be a releasing store. If the object does not escape that releasing store can be eliminated. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/18/2015 09:15 AM, Andrew Haley wrote: On 18/02/15 09:14, Florian Weimer wrote: Wow, looks nice. What OpenJDK build did you use? I want to see if this happens on x86_64, too. I'm working on JDK9. You don't have this code yet. I'll do an x86 build. 0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry ; - java.nio.HeapByteBuffer::init@-1 (line 84) ; - java.nio.ByteBuffer::wrap@7 (line 373) ; - java.nio.ByteBuffer::wrap@4 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) ; implicit exception: dispatches to 0x7f2948acbff5 ;; B2: # B5 B3 - B1 Freq: 0.99 ;; MEMBAR-release ! (empty encoding) 0x7f2948acbf90: test %ecx,%ecx 0x7f2948acbf92: jl 0x7f2948acbfb5 ;*iflt ; - java.nio.Buffer::checkIndex@1 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B3: # B6 B4 - B2 Freq: 0.99 0x7f2948acbf94: mov%r10d,%ebp 0x7f2948acbf97: sub%ecx,%ebp ;*isub ; - java.nio.Buffer::checkIndex@10 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) 0x7f2948acbf99: cmp$0x8,%ebp 0x7f2948acbf9c: jl 0x7f2948acbfd5 ;*if_icmple ; - java.nio.Buffer::checkIndex@11 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B4: # N95 - B3 Freq: 0.98 0x7f2948acbf9e: movslq %ecx,%r10 0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax 0x7f2948acbfa6: bswap %rax ;*invokestatic reverseBytes ; - java.nio.Bits::swap@1 (line 61) ; - java.nio.HeapByteBuffer::getLong@41 (line 466) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) So, just the same except that there is no explicit fence instruction to remove. It's a shame for AArch64 because that fence really kills performance but it's bad for x86 too. Even on machines that don't emit fence instructions the fence still acts as a compiler barrier. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
The code which eliminates MemBars for scalarized objects was added in jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67 An other store barrier change was also pushed into jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/fcf521c3fbc6 I don't remember we did anything special with membars in jdk9. Regards, Vladimir On 2/18/15 6:27 AM, Vitaly Davidovich wrote: Indeed, that's quite nice and not what I saw in java 7 so good to see that this case is EA'd out. Does anyone know if there was EA work done in java 9 or is this simply inlining policy change that makes EA work (as John alluded to)? sent from my phone On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 09:15 AM, Andrew Haley wrote: On 18/02/15 09:14, Florian Weimer wrote: Wow, looks nice. What OpenJDK build did you use? I want to see if this happens on x86_64, too. I'm working on JDK9. You don't have this code yet. I'll do an x86 build. 0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry ; - java.nio.HeapByteBuffer::init@-1 (line 84) ; - java.nio.ByteBuffer::wrap@7 (line 373) ; - java.nio.ByteBuffer::wrap@4 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) ; implicit exception: dispatches to 0x7f2948acbff5 ;; B2: # B5 B3 - B1 Freq: 0.99 ;; MEMBAR-release ! (empty encoding) 0x7f2948acbf90: test %ecx,%ecx 0x7f2948acbf92: jl 0x7f2948acbfb5 ;*iflt ; - java.nio.Buffer::checkIndex@1 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B3: # B6 B4 - B2 Freq: 0.99 0x7f2948acbf94: mov%r10d,%ebp 0x7f2948acbf97: sub%ecx,%ebp ;*isub ; - java.nio.Buffer::checkIndex@10 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) 0x7f2948acbf99: cmp$0x8,%ebp 0x7f2948acbf9c: jl 0x7f2948acbfd5 ;*if_icmple ; - java.nio.Buffer::checkIndex@11 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B4: # N95 - B3 Freq: 0.98 0x7f2948acbf9e: movslq %ecx,%r10 0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax 0x7f2948acbfa6: bswap %rax ;*invokestatic reverseBytes ; - java.nio.Bits::swap@1 (line 61) ; - java.nio.HeapByteBuffer::getLong@41 (line 466) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) So, just the same except that there is no explicit fence instruction to remove. It's a shame for AArch64 because that fence really kills performance but it's bad for x86 too. Even on machines that don't emit fence instructions the fence still acts as a compiler barrier. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Thanks Vladimir. I was actually asking about the ByteBuffer elimination itself; when I tried Andrew's example on 7u60 (i.e. a single method with a ByteBuffer.wrap(...).getLong(...)), the ByteBuffer allocation was not removed. On Wed, Feb 18, 2015 at 3:59 PM, Vladimir Kozlov vladimir.koz...@oracle.com wrote: The code which eliminates MemBars for scalarized objects was added in jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/6f3fd5150b67 An other store barrier change was also pushed into jdk8: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/fcf521c3fbc6 I don't remember we did anything special with membars in jdk9. Regards, Vladimir On 2/18/15 6:27 AM, Vitaly Davidovich wrote: Indeed, that's quite nice and not what I saw in java 7 so good to see that this case is EA'd out. Does anyone know if there was EA work done in java 9 or is this simply inlining policy change that makes EA work (as John alluded to)? sent from my phone On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 09:15 AM, Andrew Haley wrote: On 18/02/15 09:14, Florian Weimer wrote: Wow, looks nice. What OpenJDK build did you use? I want to see if this happens on x86_64, too. I'm working on JDK9. You don't have this code yet. I'll do an x86 build. 0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry ; - java.nio.HeapByteBuffer::init@-1 (line 84) ; - java.nio.ByteBuffer::wrap@7 (line 373) ; - java.nio.ByteBuffer::wrap@4 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) ; implicit exception: dispatches to 0x7f2948acbff5 ;; B2: # B5 B3 - B1 Freq: 0.99 ;; MEMBAR-release ! (empty encoding) 0x7f2948acbf90: test %ecx,%ecx 0x7f2948acbf92: jl 0x7f2948acbfb5 ;*iflt ; - java.nio.Buffer::checkIndex@1 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B3: # B6 B4 - B2 Freq: 0.99 0x7f2948acbf94: mov%r10d,%ebp 0x7f2948acbf97: sub%ecx,%ebp ;*isub ; - java.nio.Buffer::checkIndex@10 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) 0x7f2948acbf99: cmp$0x8,%ebp 0x7f2948acbf9c: jl 0x7f2948acbfd5 ;*if_icmple ; - java.nio.Buffer::checkIndex@11 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B4: # N95 - B3 Freq: 0.98 0x7f2948acbf9e: movslq %ecx,%r10 0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax 0x7f2948acbfa6: bswap %rax ;*invokestatic reverseBytes ; - java.nio.Bits::swap@1 (line 61) ; - java.nio.HeapByteBuffer::getLong@41 (line 466) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) So, just the same except that there is no explicit fence instruction to remove. It's a shame for AArch64 because that fence really kills performance but it's bad for x86 too. Even on machines that don't emit fence instructions the fence still acts as a compiler barrier. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 17/02/15 19:21, Vitaly Davidovich wrote: IMO I don't think such barriers should be removed just because EA is able to elide the heap allocation. Why not? Are you assuming that the programmer might be relying on a memory barrier being implied in interpreted/JITted code by the presence in the source of an allocation? If so then I am not sure the Java memory model justifies that assumption, especially so in the case EA optimises. As I recall, the arguments here and on the concurrency lists for the presence of a memory barrier at the end of allocation were only /for/ it as a heuristic to ensure that objects which might be shared would not be shared before all effects of construction were visible (I may be misstating that -- you might like to reread it as the arguments on the concurrency lists I was convinced by :-). In which case, if an object cannot be shared, indeed need not even be allocated, then there appears to be no need for such a heuristic. n.b. if a Java programmer really wants to enforce memory ordering wrt other threads then Java provides a very simple mechanism for that in volatile. regards, Andrew Dinn --- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland)
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as I don't think compiler can prove that it's safe to do so. When value types come to be and get scalarized, it may be possible to create cheap synchronization types that are stack allocated yet are used for synchronization control. For example, C# has a SpinLock struct ( https://msdn.microsoft.com/en-us/library/system.threading.spinlock%28v=vs.110%29.aspx ). Also, I don't think Unsafe induced fences are part of JMM, current or future (at least I haven't heard that to be the case). sent from my phone On Feb 18, 2015 5:11 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 10:01 AM, Andrew Dinn wrote: On 17/02/15 19:21, Vitaly Davidovich wrote: IMO I don't think such barriers should be removed just because EA is able to elide the heap allocation. Why not? Are you assuming that the programmer might be relying on a memory barrier being implied in interpreted/JITted code by the presence in the source of an allocation? If so then I am not sure the Java memory model justifies that assumption, especially so in the case EA optimises. It doesn't. There are essentially two ways to prevent unsafe publication of objects with final fields: either emit a barrier at the end of the constructor or track the reference to the newly-constructed object until it is stored in memory. That store to memory can be a releasing store. If the object does not escape that releasing store can be eliminated. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Indeed, that's quite nice and not what I saw in java 7 so good to see that this case is EA'd out. Does anyone know if there was EA work done in java 9 or is this simply inlining policy change that makes EA work (as John alluded to)? sent from my phone On Feb 18, 2015 6:13 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 09:15 AM, Andrew Haley wrote: On 18/02/15 09:14, Florian Weimer wrote: Wow, looks nice. What OpenJDK build did you use? I want to see if this happens on x86_64, too. I'm working on JDK9. You don't have this code yet. I'll do an x86 build. 0x7f2948acbf8c: mov0xc(%rdx),%r10d;*synchronization entry ; - java.nio.HeapByteBuffer::init@-1 (line 84) ; - java.nio.ByteBuffer::wrap@7 (line 373) ; - java.nio.ByteBuffer::wrap@4 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) ; implicit exception: dispatches to 0x7f2948acbff5 ;; B2: # B5 B3 - B1 Freq: 0.99 ;; MEMBAR-release ! (empty encoding) 0x7f2948acbf90: test %ecx,%ecx 0x7f2948acbf92: jl 0x7f2948acbfb5 ;*iflt ; - java.nio.Buffer::checkIndex@1 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B3: # B6 B4 - B2 Freq: 0.99 0x7f2948acbf94: mov%r10d,%ebp 0x7f2948acbf97: sub%ecx,%ebp ;*isub ; - java.nio.Buffer::checkIndex@10 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) 0x7f2948acbf99: cmp$0x8,%ebp 0x7f2948acbf9c: jl 0x7f2948acbfd5 ;*if_icmple ; - java.nio.Buffer::checkIndex@11 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B4: # N95 - B3 Freq: 0.98 0x7f2948acbf9e: movslq %ecx,%r10 0x7f2948acbfa1: mov0x10(%rdx,%r10,1),%rax 0x7f2948acbfa6: bswap %rax ;*invokestatic reverseBytes ; - java.nio.Bits::swap@1 (line 61) ; - java.nio.HeapByteBuffer::getLong@41 (line 466) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) So, just the same except that there is no explicit fence instruction to remove. It's a shame for AArch64 because that fence really kills performance but it's bad for x86 too. Even on machines that don't emit fence instructions the fence still acts as a compiler barrier. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
It's an awful lot of pain to avoid what IMO should be an obvious addition: (Short|Character|Integer|Long).(get|put)(Little|Big)EndianBytes([value,] byte[] b, int offs) This could (it seems to me) be easily intrinsified, is hugely useful both within and outside of the JDK, and fits perfectly well in the family of integral bit manipulations such as: Integer.bitCount() Integer.highestOneBit() Integer.rotate*() etc. On 02/18/2015 08:16 AM, Vitaly Davidovich wrote: I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as I don't think compiler can prove that it's safe to do so. When value types come to be and get scalarized, it may be possible to create cheap synchronization types that are stack allocated yet are used for synchronization control. For example, C# has a SpinLock struct ( https://msdn.microsoft.com/en-us/library/system.threading.spinlock%28v=vs.110%29.aspx ). Also, I don't think Unsafe induced fences are part of JMM, current or future (at least I haven't heard that to be the case). sent from my phone On Feb 18, 2015 5:11 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 10:01 AM, Andrew Dinn wrote: On 17/02/15 19:21, Vitaly Davidovich wrote: IMO I don't think such barriers should be removed just because EA is able to elide the heap allocation. Why not? Are you assuming that the programmer might be relying on a memory barrier being implied in interpreted/JITted code by the presence in the source of an allocation? If so then I am not sure the Java memory model justifies that assumption, especially so in the case EA optimises. It doesn't. There are essentially two ways to prevent unsafe publication of objects with final fields: either emit a barrier at the end of the constructor or track the reference to the newly-constructed object until it is stored in memory. That store to memory can be a releasing store. If the object does not escape that releasing store can be eliminated. Andrew. -- - DML
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/18/2015 02:32 PM, David M. Lloyd wrote: It's an awful lot of pain to avoid what IMO should be an obvious addition: (Short|Character|Integer|Long).(get|put)(Little|Big)EndianBytes([value,] byte[] b, int offs) This could (it seems to me) be easily intrinsified, is hugely useful both within and outside of the JDK, Sure, I get that, but it's a new API. Once I've finished this work, implementing such an API would be trivial. I have no objection to it in principle, but making ByteBuffer.map() operations work well is worth doing too. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Ok, perhaps I misunderstood then since you mentioned Unsafe.storeFence() in your earlier post and Vladimir said they were debating whether these fences should be removed. If you guys were talking only about the final field fence, then my bad, I don't disagree with removing those if the object doesn't escape. On Wed, Feb 18, 2015 at 10:26 AM, Andrew Haley a...@redhat.com wrote: On 02/18/2015 02:16 PM, Vitaly Davidovich wrote: I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as I don't think compiler can prove that it's safe to do so. Nobody thinks that explicit barriers (i.e. Unsafe.xxxFence) should be removed. We're talking about fences at the end of constructors which have final fields. These should be removed if the object does not escape. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/18/2015 02:16 PM, Vitaly Davidovich wrote: I don't think explicit barriers (i.e. Unsafe.xxxFence) should be removed as I don't think compiler can prove that it's safe to do so. Nobody thinks that explicit barriers (i.e. Unsafe.xxxFence) should be removed. We're talking about fences at the end of constructors which have final fields. These should be removed if the object does not escape. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
FWIW, when I checked, ByteBuffer.wrap(...).getXXX() type of code didn't EA out the BB; it seems the call chain is too complex for EA. I checked 7u60 though, so maybe newer versions are different. sent from my phone On Feb 17, 2015 5:53 AM, Andrew Haley a...@redhat.com wrote: On 02/17/2015 10:49 AM, Florian Weimer wrote: On 02/17/2015 11:22 AM, Andrew Haley wrote: You'll still have to allocate a wrapping ByteBuffer object to use them. I expect that makes them unattractive in many cases. Hmm. I'm having a hard time trying to understand why. If you need to do a lot of accesses the allocation of the ByteBuffer won't be significant; if you don't need to do a lot of accesses it won't matter either. The typical use case I have in mind is exemplified by com.sun.crypto.provider.GHASH(processBlock(byte[] data, int ofs): 174 private void processBlock(byte[] data, int ofs) { 175 if (data.length - ofs AES_BLOCK_SIZE) { 176 throw new RuntimeException(need complete block); 177 } 178 state0 ^= getLong(data, ofs); 179 state1 ^= getLong(data, ofs + 8); 180 blockMult(subkeyH0, subkeyH1); 181 } That is, the byte array is supplied by the caller, and if we wanted to use a ByteBuffer, we would have to allocate a fresh one on every iteration. In this case, neither of the two alternatives you list apply. I see. So the question could also be whether escape analysis would notice that a ByteBuffer does not escape. I hope to know that soon. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 10:53 AM, Andrew Haley wrote: I see. So the question could also be whether escape analysis would notice that a ByteBuffer does not escape. I hope to know that soon. Close but no cigar. long getLong(byte[] bytes, int i) { return ByteBuffer.wrap(bytes).getLong(i); } Everything gets inlined nicely and the ByteBuffer is not created, but a store fence remains because of the final fields in HeapByteBuffer. So the resulting code for getLong (minus the prologue and epilogue) looks like this: 0x03ff7426dc34: ldr w11, [x2,#12] ;*arraylength ; - java.nio.ByteBuffer::wrap@3 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) ; implicit exception: dispatches to 0x03ff7426dca4 ;; B2: # B5 B3 - B1 Freq: 0.99 0x03ff7426dc38: dmb ish ;*synchronization entry ; - java.nio.HeapByteBuffer::init@-1 (line 84) ; - java.nio.ByteBuffer::wrap@7 (line 373) ; - java.nio.ByteBuffer::wrap@4 (line 396) ; - bytebuffertests.ByteBufferTests3::getLong@1 (line 23) 0x03ff7426dc3c: sub w12, w11, w3;*isub ; - java.nio.Buffer::checkIndex@10 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) 0x03ff7426dc40: cmp w3, #0x0 0x03ff7426dc44: b.lt 0x03ff7426dc70 ;*iflt ; - java.nio.Buffer::checkIndex@1 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B3: # B6 B4 - B2 Freq: 0.99 0x03ff7426dc48: cmp w12, #0x8 0x03ff7426dc4c: b.lt 0x03ff7426dc88 ;*if_icmple ; - java.nio.Buffer::checkIndex@11 (line 545) ; - java.nio.HeapByteBuffer::getLong@18 (line 465) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) ;; B4: # N92 - B3 Freq: 0.98 0x03ff7426dc50: add x10, x2, w3, sxtw 0x03ff7426dc54: ldr x10, [x10,#16] 0x03ff7426dc58: rev x0, x10 ;*invokestatic reverseBytes ; - java.nio.Bits::swap@1 (line 61) ; - java.nio.HeapByteBuffer::getLong@41 (line 466) ; - bytebuffertests.ByteBufferTests3::getLong@5 (line 23) If it weren't for the stray DMB ISH it'd be almost perfect. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Am 17.02.2015 um 11:53 schrieb Andrew Haley: On 02/17/2015 10:49 AM, Florian Weimer wrote: That is, the byte array is supplied by the caller, and if we wanted to use a ByteBuffer, we would have to allocate a fresh one on every iteration. In this case, neither of the two alternatives you list apply. I see. So the question could also be whether escape analysis would notice that a ByteBuffer does not escape. I hope to know that soon. See: https://bugs.openjdk.java.net/browse/JDK-6908239 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6914113 -Ulf
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Am 17.02.2015 um 04:35 schrieb John Rose: On Feb 14, 2015, at 12:01 AM, Andrew Haley a...@redhat.com wrote: On 02/14/2015 12:09 AM, John Rose wrote: We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. Indeed. I'm intending to prototype a design for those next week. OK? Yes, please. — John +1 I guess, also sun.nio.cs coders could benefit from that. -Ulf
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
IMO I don't think such barriers should be removed just because EA is able to elide the heap allocation. On Tue, Feb 17, 2015 at 2:15 PM, Vladimir Kozlov vladimir.koz...@oracle.com wrote: There was discussion should we remove such barriers or not because they create memory operations ordering which could be different if we remove them. To eliminate them we need to add 'precedent' edge to store's membar as we do, for example, for loads: if (field-is_volatile()) { // Memory barrier includes bogus read of value to force load BEFORE membar insert_mem_bar(Op_MemBarAcquire, ld); } MemBarNode::Ideal() will do elimination. Regards, Vladimir On 2/17/15 10:58 AM, Andrew Haley wrote: On 02/17/2015 06:42 PM, John Rose wrote: The remaining store fence is probably a bug. A store fence for scalarized (lifted-out-of-memory) final fields should go away, since the fields are not actually stored in heap memory. After inlining how would escape analysis know that the store fence is associated with final fields rather than, say, an explicit Unsafe.storeFence() ? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On Feb 17, 2015, at 6:22 AM, Andrew Haley a...@redhat.com wrote: Everything gets inlined nicely and the ByteBuffer is not created, but a store fence remains because of the final fields in HeapByteBuffer. Wow, that got closer to the goal than I expected. In general, the EA analysis can fail at random because of vagaries of inlining policy. The remaining store fence is probably a bug. A store fence for scalarized (lifted-out-of-memory) final fields should go away, since the fields are not actually stored in heap memory. I filed JDK-8073358 to track. BTW, we already elide synch. ops on scalarized (non-stored) objects. Fence elision is a similar optimization. — John P.S. Value types will come with scalarization always-on, so even if a call goes out of line, the value's fields can be kept out of the heap. One of the projected use cases of values is safe encapsulation for complex pointers (native or in-object).
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 06:42 PM, John Rose wrote: The remaining store fence is probably a bug. A store fence for scalarized (lifted-out-of-memory) final fields should go away, since the fields are not actually stored in heap memory. After inlining how would escape analysis know that the store fence is associated with final fields rather than, say, an explicit Unsafe.storeFence() ? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
What do you mean exactly? I don't think inlining hides anything, so the explicit fence should still be there for EA to see (and preserve). sent from my phone On Feb 17, 2015 1:58 PM, Andrew Haley a...@redhat.com wrote: On 02/17/2015 06:42 PM, John Rose wrote: The remaining store fence is probably a bug. A store fence for scalarized (lifted-out-of-memory) final fields should go away, since the fields are not actually stored in heap memory. After inlining how would escape analysis know that the store fence is associated with final fields rather than, say, an explicit Unsafe.storeFence() ? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
There was discussion should we remove such barriers or not because they create memory operations ordering which could be different if we remove them. To eliminate them we need to add 'precedent' edge to store's membar as we do, for example, for loads: if (field-is_volatile()) { // Memory barrier includes bogus read of value to force load BEFORE membar insert_mem_bar(Op_MemBarAcquire, ld); } MemBarNode::Ideal() will do elimination. Regards, Vladimir On 2/17/15 10:58 AM, Andrew Haley wrote: On 02/17/2015 06:42 PM, John Rose wrote: The remaining store fence is probably a bug. A store fence for scalarized (lifted-out-of-memory) final fields should go away, since the fields are not actually stored in heap memory. After inlining how would escape analysis know that the store fence is associated with final fields rather than, say, an explicit Unsafe.storeFence() ? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 10:15 AM, Florian Weimer wrote: On 02/17/2015 11:00 AM, Andrew Haley wrote: On 02/17/2015 09:39 AM, Florian Weimer wrote: On 02/14/2015 01:09 AM, John Rose wrote: These queries need to go into Unsafe. We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. The safe variants should go into the java.lang.Integer etc. classes IMHO. Even the JDK has quite a few uses for them (particularly the big endian variant). Putting that into Unsafe only encourages further use of Unsafe from application code. They'll all be visible as ByteBuffer methods, which should be enough for application code, shouldn't it? I'm not sure how much sense it makes to put them into java.lang.Integer etc. You'll still have to allocate a wrapping ByteBuffer object to use them. I expect that makes them unattractive in many cases. Hmm. I'm having a hard time trying to understand why. If you need to do a lot of accesses the allocation of the ByteBuffer won't be significant; if you don't need to do a lot of accesses it won't matter either. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 11:22 AM, Andrew Haley wrote: You'll still have to allocate a wrapping ByteBuffer object to use them. I expect that makes them unattractive in many cases. Hmm. I'm having a hard time trying to understand why. If you need to do a lot of accesses the allocation of the ByteBuffer won't be significant; if you don't need to do a lot of accesses it won't matter either. The typical use case I have in mind is exemplified by com.sun.crypto.provider.GHASH(processBlock(byte[] data, int ofs): 174 private void processBlock(byte[] data, int ofs) { 175 if (data.length - ofs AES_BLOCK_SIZE) { 176 throw new RuntimeException(need complete block); 177 } 178 state0 ^= getLong(data, ofs); 179 state1 ^= getLong(data, ofs + 8); 180 blockMult(subkeyH0, subkeyH1); 181 } That is, the byte array is supplied by the caller, and if we wanted to use a ByteBuffer, we would have to allocate a fresh one on every iteration. In this case, neither of the two alternatives you list apply. -- Florian Weimer / Red Hat Product Security
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/14/2015 01:09 AM, John Rose wrote: These queries need to go into Unsafe. We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. The safe variants should go into the java.lang.Integer etc. classes IMHO. Even the JDK has quite a few uses for them (particularly the big endian variant). Putting that into Unsafe only encourages further use of Unsafe from application code. -- Florian Weimer / Red Hat Product Security
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/14/2015 11:10 PM, Dean Long wrote: Even if linux-aarch64 always allows unaligned, checking only for aarch64 is not future-proof because it doesn't take the OS into account. Surely a simple test case is sufficient to ensure that the platform supports misaligned accesses? Then new ports will see the failure immediately and can tweak the code. -- Florian Weimer / Red Hat Product Security
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 11:00 AM, Andrew Haley wrote: On 02/17/2015 09:39 AM, Florian Weimer wrote: On 02/14/2015 01:09 AM, John Rose wrote: These queries need to go into Unsafe. We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. The safe variants should go into the java.lang.Integer etc. classes IMHO. Even the JDK has quite a few uses for them (particularly the big endian variant). Putting that into Unsafe only encourages further use of Unsafe from application code. They'll all be visible as ByteBuffer methods, which should be enough for application code, shouldn't it? I'm not sure how much sense it makes to put them into java.lang.Integer etc. You'll still have to allocate a wrapping ByteBuffer object to use them. I expect that makes them unattractive in many cases. Hmm, maybe I should propose a patch for DataInputStream and see how it's received. :-) -- Florian Weimer / Red Hat Product Security
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/17/2015 09:39 AM, Florian Weimer wrote: On 02/14/2015 01:09 AM, John Rose wrote: These queries need to go into Unsafe. We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. The safe variants should go into the java.lang.Integer etc. classes IMHO. Even the JDK has quite a few uses for them (particularly the big endian variant). Putting that into Unsafe only encourages further use of Unsafe from application code. They'll all be visible as ByteBuffer methods, which should be enough for application code, shouldn't it? I'm not sure how much sense it makes to put them into java.lang.Integer etc. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On Feb 14, 2015, at 12:01 AM, Andrew Haley a...@redhat.com wrote: On 02/14/2015 12:09 AM, John Rose wrote: We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. Indeed. I'm intending to prototype a design for those next week. OK? Yes, please. — John
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 14/02/2015 22:10, Dean Long wrote: Even if linux-aarch64 always allows unaligned, checking only for aarch64 is not future-proof because it doesn't take the OS into account. However, I really don't like having to enumerate all relevant platforms in multiple places in shared code, so I disagree with the existing code and with perpetuating the pattern. As long as the decision is in platform-specific code, a build-time decision may be entirely appropriate. This alignment test in Bits.java has been there for a long time (JDK 1.4). It's technical debt that hasn't surfaces very often as it's so rare to add architectures. If Unsafe gets a method to test the alignment then it would be great to get Bits changed. -Alan
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/16/2015 11:02 AM, Alan Bateman wrote: On 14/02/2015 22:10, Dean Long wrote: Even if linux-aarch64 always allows unaligned, checking only for aarch64 is not future-proof because it doesn't take the OS into account. However, I really don't like having to enumerate all relevant platforms in multiple places in shared code, so I disagree with the existing code and with perpetuating the pattern. As long as the decision is in platform-specific code, a build-time decision may be entirely appropriate. This alignment test in Bits.java has been there for a long time (JDK 1.4). It's technical debt that hasn't surfaces very often as it's so rare to add architectures. If Unsafe gets a method to test the alignment then it would be great to get Bits changed. Hopefully it's getting less rare to add architectures! I'll do that as part of my patch. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 14/02/15 22:10, Dean Long wrote: On 2/14/2015 12:07 AM, Andrew Haley wrote: On 02/13/2015 10:52 PM, Dean Long wrote: My understanding is that whether or not aarch64 allows unaligned accesses is based on a system setting, so this change is too simplistic. Disabling unaligned access would be a really perverse thing to do, and I suspect that GCC and glibc already assume that unaligned accesses work so it would require a recompilation of libjvm (and probably the whole OS) to make it work. However, if you really think there's a point to making this a runtime flag I won't resist. Even if linux-aarch64 always allows unaligned, checking only for aarch64 is not future-proof because it doesn't take the OS into account. Sure, but we can't predict all the crazy things that writers of future operating systems might do. However, I really don't like having to enumerate all relevant platforms in multiple places in shared code, so I disagree with the existing code and with perpetuating the pattern. As long as the decision is in platform-specific code, a build-time decision may be entirely appropriate. That makes sense. I don't like the way that the decision is hidden in shared code either: if it had been in a more obvious place I would have found it earlier. I'll have a look at writing an Unsafe method which does the right thing. Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 2/14/2015 12:07 AM, Andrew Haley wrote: On 02/13/2015 10:52 PM, Dean Long wrote: My understanding is that whether or not aarch64 allows unaligned accesses is based on a system setting, so this change is too simplistic. Disabling unaligned access would be a really perverse thing to do, and I suspect that GCC and glibc already assume that unaligned accesses work so it would require a recompilation of libjvm (and probably the whole OS) to make it work. However, if you really think there's a point to making this a runtime flag I won't resist. Andrew. Even if linux-aarch64 always allows unaligned, checking only for aarch64 is not future-proof because it doesn't take the OS into account. However, I really don't like having to enumerate all relevant platforms in multiple places in shared code, so I disagree with the existing code and with perpetuating the pattern. As long as the decision is in platform-specific code, a build-time decision may be entirely appropriate. dl
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/14/2015 12:09 AM, John Rose wrote: We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. Indeed. I'm intending to prototype a design for those next week. OK? Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/13/2015 10:52 PM, Dean Long wrote: My understanding is that whether or not aarch64 allows unaligned accesses is based on a system setting, so this change is too simplistic. Disabling unaligned access would be a really perverse thing to do, and I suspect that GCC and glibc already assume that unaligned accesses work so it would require a recompilation of libjvm (and probably the whole OS) to make it work. However, if you really think there's a point to making this a runtime flag I won't resist. Andrew.
RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
java.nio.DirectByteBuffer.getXXX is slow for types larger than byte because the runtime does not know that AArch64 can perform unaligned memory accesses. The problem is due to this code in java.nio.Bits.unaligned(): unaligned = arch.equals(i386) || arch.equals(x86) || arch.equals(amd64) || arch.equals(x86_64); If we add AArch64 to this list code quality is very much improved. http://cr.openjdk.java.net/~aph/8073093/ Thanks, Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 02/13/2015 04:05 PM, Alan Bateman wrote: On 13/02/2015 13:38, Andrew Haley wrote: java.nio.DirectByteBuffer.getXXX is slow for types larger than byte because the runtime does not know that AArch64 can perform unaligned memory accesses. The problem is due to this code in java.nio.Bits.unaligned(): unaligned = arch.equals(i386) || arch.equals(x86) || arch.equals(amd64) || arch.equals(x86_64); If we add AArch64 to this list code quality is very much improved. http://cr.openjdk.java.net/~aph/8073093/ Make sense, I assume this will go in when JEP 237 is pushed. It will, but I need approval to push to the JEP 237 staging repo. 'Cos them's the rules. :-) Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 13/02/2015 13:38, Andrew Haley wrote: java.nio.DirectByteBuffer.getXXX is slow for types larger than byte because the runtime does not know that AArch64 can perform unaligned memory accesses. The problem is due to this code in java.nio.Bits.unaligned(): unaligned = arch.equals(i386) || arch.equals(x86) || arch.equals(amd64) || arch.equals(x86_64); If we add AArch64 to this list code quality is very much improved. http://cr.openjdk.java.net/~aph/8073093/ Make sense, I assume this will go in when JEP 237 is pushed. -Alan
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
Changes are fine. I agree with Alan. Please, wait when we merge aarch64 stage into jdk9/dev and then push this fix into jdk9 (by sponsor). We should finish testing of stage repo soon. Thanks, Vladimir On 2/13/15 8:07 AM, Andrew Haley wrote: On 02/13/2015 04:05 PM, Alan Bateman wrote: On 13/02/2015 13:38, Andrew Haley wrote: java.nio.DirectByteBuffer.getXXX is slow for types larger than byte because the runtime does not know that AArch64 can perform unaligned memory accesses. The problem is due to this code in java.nio.Bits.unaligned(): unaligned = arch.equals(i386) || arch.equals(x86) || arch.equals(amd64) || arch.equals(x86_64); If we add AArch64 to this list code quality is very much improved. http://cr.openjdk.java.net/~aph/8073093/ Make sense, I assume this will go in when JEP 237 is pushed. It will, but I need approval to push to the JEP 237 staging repo. 'Cos them's the rules. :-) Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
My understanding is that whether or not aarch64 allows unaligned accesses is based on a system setting, so this change is too simplistic. I would prefer that this was controlled with something more flexible, like sun.cpu.unaligned. dl On 2/13/2015 5:38 AM, Andrew Haley wrote: java.nio.DirectByteBuffer.getXXX is slow for types larger than byte because the runtime does not know that AArch64 can perform unaligned memory accesses. The problem is due to this code in java.nio.Bits.unaligned(): unaligned = arch.equals(i386) || arch.equals(x86) || arch.equals(amd64) || arch.equals(x86_64); If we add AArch64 to this list code quality is very much improved. http://cr.openjdk.java.net/~aph/8073093/ Thanks, Andrew.
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On Feb 13, 2:52pm, dean.l...@oracle.com (Dean Long) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | My understanding is that whether or not aarch64 allows unaligned=20 | accesses is based on a | system setting, so this change is too simplistic. I would prefer that=20 | this was controlled with | something more flexible, like sun.cpu.unaligned. So does x86_64 and you can ask the CPU if it is enabled... I am not sure if a variable setting makes sense because if alignment is required you get a signal (BUS error -- hi linux, SEGV), or incorrect results. christos
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
There is a system register bit to read, but I don't think it can be accessed by an application, only the kernel. If the OS won't provide this information, you could do something similar to safeFetchN and catch the resulting SIGBUS. dl On 2/13/2015 4:05 PM, Vladimir Kozlov wrote: x86 has flag UseUnalignedLoadStores which is set to true depending on which version of CPU VM runs. The CPU version is determined based on CPUID instruction results. Does AARCH64 has something similar? Regards, Vladimir On 2/13/15 3:41 PM, Dean Long wrote: On 2/13/2015 3:04 PM, chris...@zoulas.com wrote: On Feb 13, 2:52pm, dean.l...@oracle.com (Dean Long) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | My understanding is that whether or not aarch64 allows unaligned=20 | accesses is based on a | system setting, so this change is too simplistic. I would prefer that=20 | this was controlled with | something more flexible, like sun.cpu.unaligned. So does x86_64 and you can ask the CPU if it is enabled... I am not sure if a variable setting makes sense because if alignment is required you get a signal (BUS error -- hi linux, SEGV), or incorrect results. christos So it sounds like we need to determine if unaligned accesses are supported during startup, in a platform-specific way. This could be exposed through a property like I suggested, or perhaps a new Unsafe method. Regarding x86_64, there may be places in the JVM that already assume unaligned accesses are allowed, so disabling them may completely break the JVM until those assumptions are fixed. dl
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
These queries need to go into Unsafe. We also need Unsafe.getIntMisaligned, etc., which wire through to whatever second-best mechanism the platform offers. — John
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 2/13/15 4:22 PM, Dean Long wrote: There is a system register bit to read, but I don't think it can be accessed by an application, only the kernel. If the OS won't provide this information, you could do something similar to safeFetchN and catch the resulting SIGBUS. Yes, I agree it could be done this way too. On x86 we trigger SEGV to verify that OS's signal handler correctly save/restore AVX registers so we can use them. Vladimir dl On 2/13/2015 4:05 PM, Vladimir Kozlov wrote: x86 has flag UseUnalignedLoadStores which is set to true depending on which version of CPU VM runs. The CPU version is determined based on CPUID instruction results. Does AARCH64 has something similar? Regards, Vladimir On 2/13/15 3:41 PM, Dean Long wrote: On 2/13/2015 3:04 PM, chris...@zoulas.com wrote: On Feb 13, 2:52pm, dean.l...@oracle.com (Dean Long) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | My understanding is that whether or not aarch64 allows unaligned=20 | accesses is based on a | system setting, so this change is too simplistic. I would prefer that=20 | this was controlled with | something more flexible, like sun.cpu.unaligned. So does x86_64 and you can ask the CPU if it is enabled... I am not sure if a variable setting makes sense because if alignment is required you get a signal (BUS error -- hi linux, SEGV), or incorrect results. christos So it sounds like we need to determine if unaligned accesses are supported during startup, in a platform-specific way. This could be exposed through a property like I suggested, or perhaps a new Unsafe method. Regarding x86_64, there may be places in the JVM that already assume unaligned accesses are allowed, so disabling them may completely break the JVM until those assumptions are fixed. dl
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On 2/13/2015 3:04 PM, chris...@zoulas.com wrote: On Feb 13, 2:52pm, dean.l...@oracle.com (Dean Long) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | My understanding is that whether or not aarch64 allows unaligned=20 | accesses is based on a | system setting, so this change is too simplistic. I would prefer that=20 | this was controlled with | something more flexible, like sun.cpu.unaligned. So does x86_64 and you can ask the CPU if it is enabled... I am not sure if a variable setting makes sense because if alignment is required you get a signal (BUS error -- hi linux, SEGV), or incorrect results. christos So it sounds like we need to determine if unaligned accesses are supported during startup, in a platform-specific way. This could be exposed through a property like I suggested, or perhaps a new Unsafe method. Regarding x86_64, there may be places in the JVM that already assume unaligned accesses are allowed, so disabling them may completely break the JVM until those assumptions are fixed. dl
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
x86 has flag UseUnalignedLoadStores which is set to true depending on which version of CPU VM runs. The CPU version is determined based on CPUID instruction results. Does AARCH64 has something similar? Regards, Vladimir On 2/13/15 3:41 PM, Dean Long wrote: On 2/13/2015 3:04 PM, chris...@zoulas.com wrote: On Feb 13, 2:52pm, dean.l...@oracle.com (Dean Long) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | My understanding is that whether or not aarch64 allows unaligned=20 | accesses is based on a | system setting, so this change is too simplistic. I would prefer that=20 | this was controlled with | something more flexible, like sun.cpu.unaligned. So does x86_64 and you can ask the CPU if it is enabled... I am not sure if a variable setting makes sense because if alignment is required you get a signal (BUS error -- hi linux, SEGV), or incorrect results. christos So it sounds like we need to determine if unaligned accesses are supported during startup, in a platform-specific way. This could be exposed through a property like I suggested, or perhaps a new Unsafe method. Regarding x86_64, there may be places in the JVM that already assume unaligned accesses are allowed, so disabling them may completely break the JVM until those assumptions are fixed. dl
Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer accesses
On Feb 13, 4:29pm, vladimir.koz...@oracle.com (Vladimir Kozlov) wrote: -- Subject: Re: RFR: 8073093: AARCH64: C2 generates poor code for ByteBuffer | On 2/13/15 4:22 PM, Dean Long wrote: | There is a system register bit to read, but I don't think it can be | accessed by an application, only the kernel. | If the OS won't provide this information, you could do something similar | to safeFetchN and catch the | resulting SIGBUS. | | Yes, I agree it could be done this way too. | On x86 we trigger SEGV to verify that OS's signal handler correctly | save/restore AVX registers so we can use them. It is PSL_AC (0x4) and it is accessible by applications. Now if it works or not depends on the flavor of the x86... As I mentioned before there are implementations (for example pre-arm-v6 flavors) where unaligned accesses don't signal (but don't work). There is an even 3rd category where unaligned accesses trap, but the kernel can fix them if the binary is marked specially (sparc with misaligned for example). The portable to verify what's going on is to do the misaligned access and see if it works (dealing with SIGBUS/SIGSEGV). Even then (even when it works) you might not want to do it because of performance reasons (for example when the kernel fixes it). christos