RE: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend support

2020-09-07 Thread Ningsheng Jian
Hi Goetz,

I am sorry about that and thanks for helping to identify the issue.

As I cannot reproduce this on x86 and AArch64, a quick guess is that I may have 
missed the code like:

diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp
index 2f4047dfa8e..f7d2f5b2320 100644
--- a/src/hotspot/share/opto/type.cpp
+++ b/src/hotspot/share/opto/type.cpp
@@ -62,12 +62,14 @@ const Type::TypeInfo Type::_type_info[Type::lastype] = {
   { Bad, T_ARRAY,  "array:",false, 
Node::NotAMachineReg, relocInfo::none  },  // Array

 #if defined(PPC64)
+  { Bad, T_ILLEGAL,"vectora:",  false, Op_VecA,
  relocInfo::none  },  // VectorA.
   { Bad, T_ILLEGAL,"vectors:",  false, 0,  
  relocInfo::none  },  // VectorS
   { Bad, T_ILLEGAL,"vectord:",  false, Op_RegL,
  relocInfo::none  },  // VectorD
   { Bad, T_ILLEGAL,"vectorx:",  false, Op_VecX,
  relocInfo::none  },  // VectorX
   { Bad, T_ILLEGAL,"vectory:",  false, 0,  
  relocInfo::none  },  // VectorY
   { Bad, T_ILLEGAL,"vectorz:",  false, 0,  
  relocInfo::none  },  // VectorZ
 #elif defined(S390)
+  { Bad, T_ILLEGAL,"vectora:",  false, Op_VecA,
  relocInfo::none  },  // VectorA.
   { Bad, T_ILLEGAL,"vectors:",  false, 0,  
  relocInfo::none  },  // VectorS
   { Bad, T_ILLEGAL,"vectord:",  false, Op_RegL,
  relocInfo::none  },  // VectorD
   { Bad, T_ILLEGAL,"vectorx:",  false, 0,  
  relocInfo::none  },  // VectorX

Could you please help to have a try?

Thanks,
Ningsheng


> -Original Message-
> From: Lindenmaier, Goetz 
> Sent: Monday, September 7, 2020 2:35 PM
> To: Ningsheng Jian ; Andrew Dinn ;
> hotspot-compiler-...@openjdk.java.net; build-dev@openjdk.java.net; Vladimir
> Ivanov ; Erik Österlund 
> 
> Cc: aarch64-port-...@openjdk.java.net; Doerr, Martin 
> Subject: Crashes on ppc/s390 after 8231441: AArch64: Initial SVE backend 
> support
> 
> Hi
> 
> Since that change was pushed, the vm crashes in the build:
> 
> To suppress the following error report, specify this argument # after -XX: or
> in .hotspotrc:  SuppressErrorAt=/type.cpp:1022 # # A fatal error has been 
> detected by
> the Java Runtime Environment:
> #
> #  Internal Error (/usr/work/... /share/opto/type.cpp:1022), pid=28717, 
> tid=28983 #
> assert(_type_info[base()].dual_type != Bad) failed: implement with v-call # # 
> JRE
> version: OpenJDK Runtime Environment (16.0.0.1) (fastdebug build 16.0.0.1-
> internal+0-adhoc.openjdk.jdk)
> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.0.1-internal+0-
> adhoc.openjdk.jdk, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64) #
> Problematic frame:
> # V  [libjvm.so+0x1bfe22c]  Type::xdual() const+0xfc # # No core dump will be 
> written.
> Core dumps have been disabled. To enable core dumping, try "ulimit -c 
> unlimited"
> before starting Java again
> 
> Do you have an ad-hoc idea of the problem?
> 
> I locally backed out the change which fixed the issue.
> 
> Best regards,
>   Goetz.



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-09-02 Thread Ningsheng Jian

On 9/2/20 9:58 PM, Andrew Dinn wrote:

Hi Ningsheng,

On 02/09/2020 07:40, Ningsheng Jian wrote:

Thanks a lot for the reviews! I think I have addressed the review
comments from Andrew, Vladimir and Erik. This is the new webrev:

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.05/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/

Tests:
Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on
AArch64 systems with and without SVE as well as some x86_64 systems.
Mach5 submit test also reported passed.

Could you please help to take a look again? OK for jdk/jdk?

That looks good to me.


Thank you Andrew!

Regards,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-09-02 Thread Ningsheng Jian

Hi,

Thanks a lot for the reviews! I think I have addressed the review 
comments from Andrew, Vladimir and Erik. This is the new webrev:


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.05/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/

Tests:
Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on 
AArch64 systems with and without SVE as well as some x86_64 systems. 
Mach5 submit test also reported passed.


Could you please help to take a look again? OK for jdk/jdk?

Thanks,
Ningsheng

On 8/19/20 5:53 PM, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742





Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-25 Thread Ningsheng Jian

Hi Erik,

On 8/24/20 11:26 PM, Erik Österlund wrote:

Hi Ningsheng,

On 2020-08-24 11:59, Ningsheng Jian wrote:

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik Österlund wrote:

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?



I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector 
register? Currently, it just simply saves the whole NEON register.


What we found on x86_64 was that there was a significant cost in saving 
vector registers in load barriers. That is why we perform some analysis 
so that only the exact registers that are affected, and only the parts 
of the registers that are affected, get spilled. It actually mattered. 
It will of course work either way, but that was our observation on 
x86_64. But I am okay with that being deferred to a separate RFE. I just 
wanted to make sure that it at the very least works with the new code, 
for a start, so it doesn't start crashing.




OK, I will make it to save the whole reg in this patch and have a 
separate RFE to optimize as what x86 does.


And in ZBarrierSetAssembler::load_at(), before calling to runtime 
code, we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?


Yes. The call contexts are:
1) Interpreter. Does not use vector registers.
2) Method handle intrinsic. Uses only floats that are part of the Java 
calling convention, rest is garbage. No vectors here.

3) Checkcast arraycopy. Does not use vectors.


Thanks for sharing this.


Thanks,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-24 Thread Ningsheng Jian

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik Österlund wrote:

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?



I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector register? 
Currently, it just simply saves the whole NEON register.


And in ZBarrierSetAssembler::load_at(), before calling to runtime code, 
we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?


Thanks,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Andrew,

On 8/19/20 9:01 PM, Andrew Dinn wrote:

Hi Ningsheng,

On 19/08/2020 10:53, Ningsheng Jian wrote:

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/


That looks ok. A few suggested tweaks:



Thanks!


aarch64.ad:168

I think the following comment explains more clearly what is going on:

// For SVE vector registers, we simply extend vector register size to 8
// 'logical' slots. This is nominally 256 bits but it actually covers
// all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits.
// The 'physical' SVE vector register length is detected during startup
// so the register allocator is able to identify the correct number of
// bytes needed for an SVE spill/unspill.
// Note that a vector register with 4 slots, denotes a 128-bit NEON
// register allowing it to be distinguished from the
//  corresponding SVE vector register when the SVE vector length
// is 128 bits.



This looks better than mine. Thanks! :-)


postaloc.cpp:312 & 322

311 if (lrgs(val_idx).is_scalable()) {
312   assert(val->ideal_reg() == Op_VecA, "scalable vector register");

 . . .

321   if (lrgs(val_idx).is_scalable()) {
322 assert(val->ideal_reg() == Op_VecA, "scalable vector register");

You don't strictly need the asserts here as this is already asserted in
the call to is_scalable().



The assertion in LRG::is_scalable() is different, while this is an 
assertion for ideal_reg of a given node.





JTreg tests are still running, and so far no new failure found.

Ok, well assuming they pass I am happy with this latest patch modulo the
tweaks above.



Will report back once the tests on real hardware passed.

Thanks,
Ningsheng



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Magnus,

Thanks for the review!

On 8/19/20 6:05 PM, Magnus Ihse Bursie wrote:

On 2020-08-19 11:53, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Build changes look good. Thank you for remembering to cc build-dev!

This is maybe not relevant, but I was surprised to find
src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
and b) the name implies that it is a test, even though that it resides
in src. Is this really proper?


This handy script is used to (manually) generate some code in 
assembler_aarch64.cpp. The generated code is for assembler smoke test, 
so it named that. It's helpful to make sure the assembler emits correct 
binary code, but I am not sure whether a python code in the project is 
proper or not.


Thanks,
Ningsheng



/Magnus


Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending
questions.


I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch



Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to
alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I
just
want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out

Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending questions.


I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch


Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I just
want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out a new vectorizer to support all SVE vector sizes, so
we would expect a size like 384 could go to that path. I tried current
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
vector length 32. Set MaxVectorSize to 32

(Fallbacks to 32 and issue a warning, as the prctl() call returns 32
instead of unsupported 48:
https://www.kernel.org/doc/Documentation/arm64/sve.txt)

Do you think we need to exit vm instead of warning and fallbacking to 32
here?


Yes, I think a vm exit would probably be a better choice.

regards

Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes

2020-07-26 Thread Ningsheng Jian

Thank you Erik!

Regards,
Ningsheng

On 7/23/20 9:06 PM, Erik Joelsson wrote:

Hello Ningsheng,

Build change looks good.

/Erik

On 2020-07-23 01:02, Ningsheng Jian wrote:

Hi Vladimir,

Thanks for pointing out this. Yes, I missed that change in shared 
code. I've regenerated the webrev, with GensrcAdlc.gmk file change 
included:


http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ 



Also add build-dev.

Thanks,
Ningsheng

On 7/23/20 5:36 AM, Vladimir Ivanov wrote:
http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ 





FTR there's one more aarch64-specific change in shared code to enable 
aarch64_neon.ad processing:


diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk 
b/make/hotspot/gensrc/GensrcAdlc.gmk

--- a/make/hotspot/gensrc/GensrcAdlc.gmk
+++ b/make/hotspot/gensrc/GensrcAdlc.gmk
@@ -129,6 +129,12 @@

$d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad 
\

  )))

+  ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64)
+    AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, 
$(AD_SRC_ROOTS), \

+ $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \
+    )))
+  endif
+
    ifeq ($(call check-jvm-feature, shenandoahgc), true)
  AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, 
$(AD_SRC_ROOTS), \


$d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad 
\


Best regards,
Vladimir Ivanov


On 7/8/20 3:05 PM, Yang Zhang wrote:

Hi Andrew

I have updated this patch. Could you please help to review it again?
In this patch, the following changes are made:
1. Separate newly added NEON instructions to a new ad file
    aarch64_neon.ad
2. Add assembler tests for NEON instructions. Trailing spaces
    in the python script are also removed.

http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ 



Thanks,
Yang


-Original Message-
From: Andrew Haley 
Sent: Tuesday, June 30, 2020 12:10 AM
To: Yang Zhang ; Viswanathan, Sandhya 
; Paul Sandoz 
Cc: nd ; hotspot-compiler-...@openjdk.java.net; 
hotspot-...@openjdk.java.net; core-libs-...@openjdk.java.net; 
aarch64-port-...@openjdk.java.net
Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of 
Vector API (Incubator): AArch64 backend changes


On 29/06/2020 08:48, Yang Zhang wrote:

1. Instructions that can be matched with NEON instructions directly.
MulVB, SqrtVF and AbsV have been merged into jdk master already.

2. Instructions that jdk master has middle end support for, but 
they cannot be matched with NEON instructions directly.
Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These 
new instructions can be moved into jdk master first, but for 
auto-vectorization, the performance might not get improved.


3. Panama/Vector API specific  instructions such as 
Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, 
MaxV/MinV, VectorBlend etc.
These instructions cannot be moved into jdk master first because 
there isn't middle-end support.


I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also 
update aarch64_asmtest.py and macroassemler.cpp. When the patch is 
ready, I will send it again.


Thank you *very* much for your hard work. Appreciated!

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com> 
https://keybase.io/andrewhaley

EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671









Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes

2020-07-23 Thread Ningsheng Jian

Hi Vladimir,

Thanks for pointing out this. Yes, I missed that change in shared code. 
I've regenerated the webrev, with GensrcAdlc.gmk file change included:


http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/

Also add build-dev.

Thanks,
Ningsheng

On 7/23/20 5:36 AM, Vladimir Ivanov wrote:
http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ 



FTR there's one more aarch64-specific change in shared code to enable 
aarch64_neon.ad processing:


diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk 
b/make/hotspot/gensrc/GensrcAdlc.gmk

--- a/make/hotspot/gensrc/GensrcAdlc.gmk
+++ b/make/hotspot/gensrc/GensrcAdlc.gmk
@@ -129,6 +129,12 @@

$d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad 
\

  )))

+  ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64)
+    AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, 
$(AD_SRC_ROOTS), \

+ $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \
+    )))
+  endif
+
    ifeq ($(call check-jvm-feature, shenandoahgc), true)
  AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, 
$(AD_SRC_ROOTS), \


$d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad 
\


Best regards,
Vladimir Ivanov


On 7/8/20 3:05 PM, Yang Zhang wrote:

Hi Andrew

I have updated this patch. Could you please help to review it again?
In this patch, the following changes are made:
1. Separate newly added NEON instructions to a new ad file
    aarch64_neon.ad
2. Add assembler tests for NEON instructions. Trailing spaces
    in the python script are also removed.

http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ 



Thanks,
Yang


-Original Message-
From: Andrew Haley 
Sent: Tuesday, June 30, 2020 12:10 AM
To: Yang Zhang ; Viswanathan, Sandhya 
; Paul Sandoz 
Cc: nd ; hotspot-compiler-...@openjdk.java.net; 
hotspot-...@openjdk.java.net; core-libs-...@openjdk.java.net; 
aarch64-port-...@openjdk.java.net
Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of 
Vector API (Incubator): AArch64 backend changes


On 29/06/2020 08:48, Yang Zhang wrote:

1. Instructions that can be matched with NEON instructions directly.
MulVB, SqrtVF and AbsV have been merged into jdk master already.

2. Instructions that jdk master has middle end support for, but they 
cannot be matched with NEON instructions directly.
Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These 
new instructions can be moved into jdk master first, but for 
auto-vectorization, the performance might not get improved.


3. Panama/Vector API specific  instructions such as Load/StoreVector 
( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend 
etc.
These instructions cannot be moved into jdk master first because 
there isn't middle-end support.


I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also 
update aarch64_asmtest.py and macroassemler.cpp. When the patch is 
ready, I will send it again.


Thank you *very* much for your hard work. Appreciated!

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd.  https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671







Re: [aarch64-port-dev ] RFC: JEP - One AArch64 Port, Not Two

2018-08-27 Thread Ningsheng Jian
On 27 August 2018 at 17:00, Andrew Haley  wrote:
> On 08/27/2018 09:49 AM, Ningsheng Jian wrote:
>> Yes, some benchmark data is unstable. We did analyze the arm64 and
>> aarch64 codegen difference with microbenchmarks. AArch64 backend
>> generates better code on most cases, and I believe that we have
>> addressed those minor worse cases we found. E.g. JDK-8183533,
>> JDK-8185786, JDK-8187601.
>
> Mmm, and it's not really possible to show for sure that changes
> generalize well. I'm still not convinced by tweaks to unrolling,
> but they work spectacularly well in some small benchmarks.
>

I just searched our work log and found those items. At least they were
exposed by microbenchmarks [1] we analyzed where arm64 performed
better than AArch64. Yes, they are just small benchmarks. :-)

[1] https://git.linaro.org/leg/openjdk/jmh-linaro-org.git

Thanks,
Ningsheng


Re: [aarch64-port-dev ] RFC: JEP - One AArch64 Port, Not Two

2018-08-27 Thread Ningsheng Jian
Hi,

On 27 August 2018 at 16:13, Andrew Haley  wrote:
> On 08/27/2018 07:30 AM, Magnus Ihse Bursie wrote:
>> One question: I remember there were some guys from Linaro who compared
>> aarch64 vs arm64 in microbenchmarks, and found that while aarch64 had
>> the superior performance most of the time, there were some benchmarks
>> where arm64 was fastest.
>>
>> Has these been analyzed, and the performance on aarch64 brought in line
>> with arm64?
>
> I doubt that's possible. As we know, JIT heuristics can sometimes go one
> way and then the other, and it can be just a matter of luck. I haven't
> seen any optimizations in the arm64 port that were absent in the aarch64
> port. If anyone can find anything we'll port it over.
>
> Looking here
>
> https://docs.google.com/spreadsheets/d/18iklOrbaL67i46XHsPTrdTObWqUJJK8EO4-k0eqdLKM/edit#gid=909743165
>
> we see that tradebeans is faster on one implementation, slower on another.
> sunflow is spectacularly better on arm64, but it failed to converge most
> of the time on one implementation, so I don't believe the result.
>

Yes, some benchmark data is unstable. We did analyze the arm64 and
aarch64 codegen difference with microbenchmarks. AArch64 backend
generates better code on most cases, and I believe that we have
addressed those minor worse cases we found. E.g. JDK-8183533,
JDK-8185786, JDK-8187601.

Thanks,
Ningsheng