Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-09-02 Thread Ningsheng Jian

On 9/2/20 9:58 PM, Andrew Dinn wrote:

Hi Ningsheng,

On 02/09/2020 07:40, Ningsheng Jian wrote:

Thanks a lot for the reviews! I think I have addressed the review
comments from Andrew, Vladimir and Erik. This is the new webrev:

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.05/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/

Tests:
Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on
AArch64 systems with and without SVE as well as some x86_64 systems.
Mach5 submit test also reported passed.

Could you please help to take a look again? OK for jdk/jdk?

That looks good to me.


Thank you Andrew!

Regards,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-09-02 Thread Andrew Dinn
Hi Ningsheng,

On 02/09/2020 07:40, Ningsheng Jian wrote:
> Thanks a lot for the reviews! I think I have addressed the review
> comments from Andrew, Vladimir and Erik. This is the new webrev:
> 
> Full:
> http://cr.openjdk.java.net/~njian/8231441/webrev.05/
> 
> Incremental:
> http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/
> 
> Tests:
> Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on
> AArch64 systems with and without SVE as well as some x86_64 systems.
> Mach5 submit test also reported passed.
> 
> Could you please help to take a look again? OK for jdk/jdk?
That looks good to me.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-09-02 Thread Ningsheng Jian

Hi,

Thanks a lot for the reviews! I think I have addressed the review 
comments from Andrew, Vladimir and Erik. This is the new webrev:


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.05/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.05-vs-04/

Tests:
Tested with jtreg hotspot_all_no_apps, jdk_core and langtools:tier1 on 
AArch64 systems with and without SVE as well as some x86_64 systems. 
Mach5 submit test also reported passed.


Could you please help to take a look again? OK for jdk/jdk?

Thanks,
Ningsheng

On 8/19/20 5:53 PM, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742





Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-25 Thread Ningsheng Jian

Hi Erik,

On 8/24/20 11:26 PM, Erik Österlund wrote:

Hi Ningsheng,

On 2020-08-24 11:59, Ningsheng Jian wrote:

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik Österlund wrote:

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?



I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector 
register? Currently, it just simply saves the whole NEON register.


What we found on x86_64 was that there was a significant cost in saving 
vector registers in load barriers. That is why we perform some analysis 
so that only the exact registers that are affected, and only the parts 
of the registers that are affected, get spilled. It actually mattered. 
It will of course work either way, but that was our observation on 
x86_64. But I am okay with that being deferred to a separate RFE. I just 
wanted to make sure that it at the very least works with the new code, 
for a start, so it doesn't start crashing.




OK, I will make it to save the whole reg in this patch and have a 
separate RFE to optimize as what x86 does.


And in ZBarrierSetAssembler::load_at(), before calling to runtime 
code, we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?


Yes. The call contexts are:
1) Interpreter. Does not use vector registers.
2) Method handle intrinsic. Uses only floats that are part of the Java 
calling convention, rest is garbage. No vectors here.

3) Checkcast arraycopy. Does not use vectors.


Thanks for sharing this.


Thanks,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-24 Thread Erik Österlund

Hi Ningsheng,

On 2020-08-24 11:59, Ningsheng Jian wrote:

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik Österlund wrote:

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?



I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector 
register? Currently, it just simply saves the whole NEON register.


What we found on x86_64 was that there was a significant cost in saving 
vector registers in load barriers. That is why we perform some analysis 
so that only the exact registers that are affected, and only the parts 
of the registers that are affected, get spilled. It actually mattered. 
It will of course work either way, but that was our observation on 
x86_64. But I am okay with that being deferred to a separate RFE. I just 
wanted to make sure that it at the very least works with the new code, 
for a start, so it doesn't start crashing.


And in ZBarrierSetAssembler::load_at(), before calling to runtime 
code, we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?


Yes. The call contexts are:
1) Interpreter. Does not use vector registers.
2) Method handle intrinsic. Uses only floats that are part of the Java 
calling convention, rest is garbage. No vectors here.

3) Checkcast arraycopy. Does not use vectors.

Thanks,
/Erik


Thanks,
Ningsheng




Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-24 Thread Ningsheng Jian

Hi Erik,

Thanks for the review!

On 8/22/20 12:21 AM, Erik Österlund wrote:

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash
instead.
The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?



I didn't add ZGC option while running tests. I think I need to update 
push_fp() which is called by ZSaveLiveRegisters. But do we need to get 
size info (float/neon/sve) instead of saving the whole vector register? 
Currently, it just simply saves the whole NEON register.


And in ZBarrierSetAssembler::load_at(), before calling to runtime code, 
we call push_call_clobbered_registers_except(), which just saves 
floating point registers instead of the whole NEON vector registers. 
Similar behavior in x86 implementation. Is that correct (not saving 
vectors)?


Thanks,
Ningsheng


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-21 Thread Erik Österlund

Hi,

Have you tried this with ZGC on AArch64? It has custom code for saving 
live registers in the load barrier slow path.
I can't see any code changes there, so assuming this will just crash 
instead.

The relevant code is in ZBarrierSetAssembler on aarch64.

Maybe I missed something?

Thanks,
/Erik

On 2020-08-19 11:53, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending 
questions.



I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch 



Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to 
alloc_class

chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I 
just

want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out a new vectorizer to support all SVE vector sizes, so
we would expect a size like 384 could go to that path. I tried current
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only supports max 

Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Magnus Ihse Bursie

On 2020-08-20 12:38, Andrew Dinn wrote:

On 20/08/2020 11:14, Magnus Ihse Bursie wrote:

On 2020-08-20 11:40, Nick Gasson wrote:

http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/

Then you'd run it with

    make exploded-test TEST="gtest:AssemblerAArch64"

The downside is that it won't run on every startup of a debug build, but
it will run in the tier1 tests, including for release builds, which
arguably gives more coverage. It looks a lot tidier to me, but that's
clearly subjective.

FWIW, it definitely looks tidier to me too.

Well, perhaps this check ought to be done as a standalone test rather
than as debug build validation. I don't really have any deep commitment
either way. However, if we do proceed with this I think it ought to be
in a separate follow-up patch and with its own JIRA. It should not stop
Ningsheng's SVE patch going in as is since that merely corrects the
status quo to allow for SVE instructions.

Yes, I fully agree, and never meant to imply anything else.

/Magnus


regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill





Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Haley
On 20/08/2020 11:38, Andrew Dinn wrote:
> Well, perhaps this check ought to be done as a standalone test rather
> than as debug build validation. I don't really have any deep commitment
> either way. However, if we do proceed with this I think it ought to be
> in a separate follow-up patch and with its own JIRA. It should not stop
> Ningsheng's SVE patch going in as is since that merely corrects the
> status quo to allow for SVE instructions.

I agree.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Dinn
On 20/08/2020 11:14, Magnus Ihse Bursie wrote:
> On 2020-08-20 11:40, Nick Gasson wrote:
>> http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/
>>
>> Then you'd run it with
>>
>>    make exploded-test TEST="gtest:AssemblerAArch64"
>>
>> The downside is that it won't run on every startup of a debug build, but
>> it will run in the tier1 tests, including for release builds, which
>> arguably gives more coverage. It looks a lot tidier to me, but that's
>> clearly subjective.
> FWIW, it definitely looks tidier to me too.
Well, perhaps this check ought to be done as a standalone test rather
than as debug build validation. I don't really have any deep commitment
either way. However, if we do proceed with this I think it ought to be
in a separate follow-up patch and with its own JIRA. It should not stop
Ningsheng's SVE patch going in as is since that merely corrects the
status quo to allow for SVE instructions.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Magnus Ihse Bursie

On 2020-08-20 11:40, Nick Gasson wrote:

On 08/20/20 17:08 pm, Andrew Haley wrote:

I meant move the test itself - entry() and asm_check() in
assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
move with it.

Hmm. I'm still not sure how this would work. Let's see the patch and
we can talk about it. It still sounds to me rather like pointlessly
moving the furniture around.

http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/

Then you'd run it with

   make exploded-test TEST="gtest:AssemblerAArch64"

The downside is that it won't run on every startup of a debug build, but
it will run in the tier1 tests, including for release builds, which
arguably gives more coverage. It looks a lot tidier to me, but that's
clearly subjective.

FWIW, it definitely looks tidier to me too.

/Magnus


--
Thanks,
Nick




Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Nick Gasson
On 08/20/20 17:08 pm, Andrew Haley wrote:
>>
>> I meant move the test itself - entry() and asm_check() in
>> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
>> move with it.
>
> Hmm. I'm still not sure how this would work. Let's see the patch and
> we can talk about it. It still sounds to me rather like pointlessly
> moving the furniture around.

http://cr.openjdk.java.net/~ngasson/asmtest/webrev.0/

Then you'd run it with

  make exploded-test TEST="gtest:AssemblerAArch64"

The downside is that it won't run on every startup of a debug build, but
it will run in the tier1 tests, including for release builds, which
arguably gives more coverage. It looks a lot tidier to me, but that's
clearly subjective.

--
Thanks,
Nick


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Haley
Hi,

On 20/08/2020 09:58, Nick Gasson wrote:
>
> On 08/20/20 16:48 pm, Andrew Dinn wrote:
>>>
>>> It is perhaps a bit strange to have the test code under src/ and
>>> embedded in the assembler implementation. How about we move it under
>>> test/ using the existing gtest framework for native code tests? That
>>> runs in tier1 and also for release builds. I tried this just now and
>>> it's easy to do.
>> I'm not sure that would be an improvement. This python code is used to
>> generate C code run as part of JVM startup in a debug JVM build i.e.
>> code that is linked into the JVM itself. So, the code it generates is
>> really the same as the debug code embedded in the JVM. It doesn't really
>> bear any relation to the code in the test tree.
>
> I meant move the test itself - entry() and asm_check() in
> assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
> move with it.

Hmm. I'm still not sure how this would work. Let's see the patch and
we can talk about it. It still sounds to me rather like pointlessly
moving the furniture around.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Nick Gasson
Hi Andrew,

On 08/20/20 16:48 pm, Andrew Dinn wrote:
>> 
>> It is perhaps a bit strange to have the test code under src/ and
>> embedded in the assembler implementation. How about we move it under
>> test/ using the existing gtest framework for native code tests? That
>> runs in tier1 and also for release builds. I tried this just now and
>> it's easy to do.
> I'm not sure that would be an improvement. This python code is used to
> generate C code run as part of JVM startup in a debug JVM build i.e.
> code that is linked into the JVM itself. So, the code it generates is
> really the same as the debug code embedded in the JVM. It doesn't really
> bear any relation to the code in the test tree.
>

I meant move the test itself - entry() and asm_check() in
assembler_aarch64.cpp - under test/hotspot/gtest. The generator would
move with it.

> If the generator code were to go anywhere else it would perhaps make
> most sense to put it in the make tree. I'm not sure that is required
> though or even appropriate. There is already a precedent for keeping
> generator code in the source tree and, when it is specific to a given
> arch, keeping it next to the related source. The adlc generator code
> sits in the shared source tree. The m4 file used to generate parts of
> aarch64.ad is in the aarch64 source tree.
>
> regards,
>
>
> Andrew Dinn
> ---
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Haley
On 20/08/2020 05:48, Nick Gasson wrote:
> On 08/19/20 19:10 pm, Andrew Haley wrote:
>> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>>> This is maybe not relevant, but I was surprised to find
>>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>>> and b) the name implies that it is a test, even though that it resides
>>> in src. Is this really proper?
>>
>> I have no idea whether it's really proper, but it allows us to check
>> that instructions are encoded correctly by cross-checking with the
>> system's assembler. There might well be a more hygienic way to do
>> that, but I don't want to be without it.
>
> It is perhaps a bit strange to have the test code under src/ and
> embedded in the assembler implementation. How about we move it under
> test/ using the existing gtest framework for native code tests? That
> runs in tier1 and also for release builds. I tried this just now and
> it's easy to do.

Go on, then!

Bear in mind that the idea of this test is that it checks the encoding
of all instructions, regardless of whether the processor supports them
or not.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Dinn
On 20/08/2020 05:48, Nick Gasson wrote:
> On 08/19/20 19:10 pm, Andrew Haley wrote:
>> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>>> This is maybe not relevant, but I was surprised to find
>>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>>> and b) the name implies that it is a test, even though that it resides
>>> in src. Is this really proper?
>>
>> I have no idea whether it's really proper, but it allows us to check
>> that instructions are encoded correctly by cross-checking with the
>> system's assembler. There might well be a more hygienic way to do
>> that, but I don't want to be without it.
> 
> It is perhaps a bit strange to have the test code under src/ and
> embedded in the assembler implementation. How about we move it under
> test/ using the existing gtest framework for native code tests? That
> runs in tier1 and also for release builds. I tried this just now and
> it's easy to do.
I'm not sure that would be an improvement. This python code is used to
generate C code run as part of JVM startup in a debug JVM build i.e.
code that is linked into the JVM itself. So, the code it generates is
really the same as the debug code embedded in the JVM. It doesn't really
bear any relation to the code in the test tree.

If the generator code were to go anywhere else it would perhaps make
most sense to put it in the make tree. I'm not sure that is required
though or even appropriate. There is already a precedent for keeping
generator code in the source tree and, when it is specific to a given
arch, keeping it next to the related source. The adlc generator code
sits in the shared source tree. The m4 file used to generate parts of
aarch64.ad is in the aarch64 source tree.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Haley
On 20/08/2020 03:27, Ningsheng Jian wrote:
> // Note that a vector register with 4 slots, denotes a 128-bit NEON

Lose the comma.  :-)

Never known to miss a trivial point,

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-20 Thread Andrew Dinn
Hi Ningsheng,

>> postaloc.cpp:312 & 322
>>
>> 311 if (lrgs(val_idx).is_scalable()) {
>> 312   assert(val->ideal_reg() == Op_VecA, "scalable vector
>> register");
>>
>>  . . .
>>
>> 321   if (lrgs(val_idx).is_scalable()) {
>> 322 assert(val->ideal_reg() == Op_VecA, "scalable vector
>> register");
>>
>> You don't strictly need the asserts here as this is already asserted in
>> the call to is_scalable().
> 
> The assertion in LRG::is_scalable() is different, while this is an
> assertion for ideal_reg of a given node.
Yes, my apologies for misreading that. These assertions should be retained.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Nick Gasson
On 08/19/20 19:10 pm, Andrew Haley wrote:
> On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
>> This is maybe not relevant, but I was surprised to find
>> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
>> and b) the name implies that it is a test, even though that it resides
>> in src. Is this really proper?
>
> I have no idea whether it's really proper, but it allows us to check
> that instructions are encoded correctly by cross-checking with the
> system's assembler. There might well be a more hygienic way to do
> that, but I don't want to be without it.

It is perhaps a bit strange to have the test code under src/ and
embedded in the assembler implementation. How about we move it under
test/ using the existing gtest framework for native code tests? That
runs in tier1 and also for release builds. I tried this just now and
it's easy to do.

--
Thanks,
Nick


Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Andrew,

On 8/19/20 9:01 PM, Andrew Dinn wrote:

Hi Ningsheng,

On 19/08/2020 10:53, Ningsheng Jian wrote:

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/


That looks ok. A few suggested tweaks:



Thanks!


aarch64.ad:168

I think the following comment explains more clearly what is going on:

// For SVE vector registers, we simply extend vector register size to 8
// 'logical' slots. This is nominally 256 bits but it actually covers
// all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits.
// The 'physical' SVE vector register length is detected during startup
// so the register allocator is able to identify the correct number of
// bytes needed for an SVE spill/unspill.
// Note that a vector register with 4 slots, denotes a 128-bit NEON
// register allowing it to be distinguished from the
//  corresponding SVE vector register when the SVE vector length
// is 128 bits.



This looks better than mine. Thanks! :-)


postaloc.cpp:312 & 322

311 if (lrgs(val_idx).is_scalable()) {
312   assert(val->ideal_reg() == Op_VecA, "scalable vector register");

 . . .

321   if (lrgs(val_idx).is_scalable()) {
322 assert(val->ideal_reg() == Op_VecA, "scalable vector register");

You don't strictly need the asserts here as this is already asserted in
the call to is_scalable().



The assertion in LRG::is_scalable() is different, while this is an 
assertion for ideal_reg of a given node.





JTreg tests are still running, and so far no new failure found.

Ok, well assuming they pass I am happy with this latest patch modulo the
tweaks above.



Will report back once the tests on real hardware passed.

Thanks,
Ningsheng



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Andrew Dinn
Hi Ningsheng,

On 19/08/2020 10:53, Ningsheng Jian wrote:
> I have updated the patch based on the review comments. Would you mind
> taking another look? Thanks!
> 
> Full:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
> 
> Incremental:
> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

That looks ok. A few suggested tweaks:

aarch64.ad:168

I think the following comment explains more clearly what is going on:

// For SVE vector registers, we simply extend vector register size to 8
// 'logical' slots. This is nominally 256 bits but it actually covers
// all possible 'physical' SVE vector register lengths from 128 ~ 2048 bits.
// The 'physical' SVE vector register length is detected during startup
// so the register allocator is able to identify the correct number of
// bytes needed for an SVE spill/unspill.
// Note that a vector register with 4 slots, denotes a 128-bit NEON
// register allowing it to be distinguished from the
//  corresponding SVE vector register when the SVE vector length
// is 128 bits.

postaloc.cpp:312 & 322

311 if (lrgs(val_idx).is_scalable()) {
312   assert(val->ideal_reg() == Op_VecA, "scalable vector register");

. . .

321   if (lrgs(val_idx).is_scalable()) {
322 assert(val->ideal_reg() == Op_VecA, "scalable vector register");

You don't strictly need the asserts here as this is already asserted in
the call to is_scalable().


> JTreg tests are still running, and so far no new failure found.
Ok, well assuming they pass I am happy with this latest patch modulo the
tweaks above.

regards,


Andrew Dinn
---
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Andrew Haley
On 19/08/2020 11:05, Magnus Ihse Bursie wrote:
> This is maybe not relevant, but I was surprised to find
> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
> and b) the name implies that it is a test, even though that it resides
> in src. Is this really proper?

I have no idea whether it's really proper, but it allows us to check
that instructions are encoded correctly by cross-checking with the
system's assembler. There might well be a more hygienic way to do
that, but I don't want to be without it.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Magnus,

Thanks for the review!

On 8/19/20 6:05 PM, Magnus Ihse Bursie wrote:

On 2020-08-19 11:53, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind
taking another look? Thanks!

Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Build changes look good. Thank you for remembering to cc build-dev!

This is maybe not relevant, but I was surprised to find
src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
and b) the name implies that it is a test, even though that it resides
in src. Is this really proper?


This handy script is used to (manually) generate some code in 
assembler_aarch64.cpp. The generated code is for assembler smoke test, 
so it named that. It's helpful to make sure the assembler emits correct 
binary code, but I am not sure whether a python code in the project is 
proper or not.


Thanks,
Ningsheng



/Magnus


Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending
questions.


I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch



Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to
alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I
just
want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out 

Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Magnus Ihse Bursie

On 2020-08-19 11:53, Ningsheng Jian wrote:

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Build changes look good. Thank you for remembering to cc build-dev!

This is maybe not relevant, but I was surprised to find 
src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code, 
and b) the name implies that it is a test, even though that it resides 
in src. Is this really proper?


/Magnus


Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending 
questions.



I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch 



Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to 
alloc_class

chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I 
just

want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out a new vectorizer to support all SVE vector sizes, so
we would expect a size like 384 could go to that path. I tried current
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only 

Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

2020-08-19 Thread Ningsheng Jian

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!


Full:
http://cr.openjdk.java.net/~njian/8231441/webrev.04/

Incremental:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/

Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature

2) c2 register allocation:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra

3) SVE c2 backend:
http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.

Thanks,
Ningsheng

On 8/17/20 5:16 PM, Andrew Dinn wrote:

Hi Pengfei,

On 17/08/2020 07:00, Ningsheng Jian wrote:

Thanks a lot for the review! Sorry for the late reply, as I was on
vacation last week. And thanks to Pengfei and Joshua for helping
clarifying some details in the patch.


Yes, they did a very good job of answering most of the pending questions.


I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.


Yes, we would expect Pengfei's OptoAssembly check patch can get merged
in future.


I'm fine with that as a follow-up patch if you raise a JIRA for it.


I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?


We do the re-init at any possible return points to c2 code, not in any
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code
below we use gcc option -finstrument-functions to build hotspot. With
this option, each C/C++ function entry/exit will call the instrument
functions we defined. In instrument functions, we clobber p7 (or other
reg for test) register, and in c2 function return we verify that p7 (or
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch


Nice work. It's very good to have that documented. I'm willing to accept
i) that this has found all current cases and ii) that the verify will
catch any cases that might get introduced by future changes (e.g. the
callout introduced by ZGC that you mention below). As the above mot say
there is a slim chance this might have missed some cases but I think it
is pretty unlikely.



Specific Comments (register allocator webrev):


aarch64.ad:97-100

Why have you added a reg_def for R8 and R9 here and also to alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?



I think Pengfei has helped to explain that. I will either add clear
comments or rename the register name as you suggested.


Ok, good.


As Joshua clarified, we are also working on predicate scalable reg,
which is not in this patch. Thanks for the suggestion, I will try to
refactor this a bit.


Ok, I'll wait for an updated patch. Are you planning to include the
scalable predicate reg code as part of this patch? I think that would be
better as it would help to clarify the need to distinguish vector regs
as a subset of scalable regs.


zBarrierSetAssembler_aarch64.cpp:434

Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I just
want to know how you decided where re-init of p7 was needed.



Actually I found this by my hack patch above while running jtreg tests.
The stub slowpath here can be a c++ function.


Yes, good catch.


superword.cpp:97

Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?



Current SLP vectorizer only supports power-of-2 vector size. We are
trying to work out a new vectorizer to support all SVE vector sizes, so
we would expect a size like 384 could go to that path. I tried current
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
vector length 32. Set MaxVectorSize to 32

(Fallbacks to 32 and issue a warning, as the prctl() call returns 32
instead of unsupported 48:
https://www.kernel.org/doc/Documentation/arm64/sve.txt)

Do you think we need to exit vm instead of warning and fallbacking to 32
here?


Yes, I think a vm exit would probably be a better choice.

regards,


Andrew