Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-12-18 Thread Akram Ahmad

Hi Kyrill,

On 17/12/2024 15:15, Kyrylo Tkachov wrote:

We avoid using the __builtin_aarch64_* builtins in test cases as they are 
undocumented and we don’t make any guarantees about their stability to users.
I’d prefer if the saturating operation was open-coded in C. I expect the midend 
machinery is smart enough to recognize the saturating logic for scalars by now?


Thanks for the detailed feedback. It's been really helpful, and I've 
gone ahead and implemented almost all of it. I'm struggling to find a 
pattern that's recognised for signed arithmetic though- the following 
emits branching code:


int64_t  __attribute__((noipa))
sadd64 (int64_t __a, int64_t __b)
{
  if (__a > 0) {
    if (__b > INT64_MAX - __a)
  return INT64_MAX;
  } else if (__b < INT64_MIN - __a) {
    return INT64_MIN;
  }
  return __a + __b;
}

Resulting assembly:

|sadd64: .LFB6: .cfi_startproc mov x3, x0 cmp x0, 0 ble .L9 mov x2, 
9223372036854775807 sub x4, x2, x0 mov x0, x2 cmp x4, x1 blt .L8 .L11: 
add x0, x3, x1 .L8: ret .p2align 2,,3 .L9: mov x2, -9223372036854775808 
sub x0, x2, x0 cmp x0, x1 ble .L11 mov x0, x2 ret Is there a way to 
force this not to use branches by any chance? I'll keep looking and see 
if there are some patterns recently added to match that will work here. 
If I don't find something, would it be sufficient to use the scalar NEON 
intrinsics for this? And if so, would that mean the test should move to 
the Adv. SIMD directory? Many thanks once again, Akram |


Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-12-17 Thread Kyrylo Tkachov
Hi Akram,

> On 14 Nov 2024, at 16:53, Akram Ahmad  wrote:
> 
> This renames the existing {s,u}q{add,sub} instructions to use the
> standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
> IFN_SAT_SUB.
> 
> The NEON intrinsics for saturating arithmetic and their corresponding
> builtins are changed to use these standard names too.
> 
> Using the standard names for the instructions causes 32 and 64-bit
> unsigned scalar saturating arithmetic to use the NEON instructions,
> resulting in an additional (and inefficient) FMOV to be generated when
> the original operands are in GP registers. This patch therefore also
> restores the original behaviour of using the adds/subs instructions
> in this circumstance.
> 
> Furthermore, this patch introduces a new optimisation for signed 32
> and 64-bit scalar saturating arithmetic which uses adds/subs in place
> of the NEON instruction.
> 
> Addition, before:
> fmov d0, x0
> fmov d1, x1
> sqadd d0, d0, d1
> fmov x0, d0
> 
> Addition, after:
> asr x2, x1, 63
> adds x0, x0, x1
> eor x2, x2, 0x8000
> csinv x0, x0, x2, vc
> 
> In the above example, subtraction replaces the adds with subs and the
> csinv with csel. The 32-bit case follows the same approach. Arithmetic
> with a constant operand is simplified further by directly storing the
> saturating limit in the temporary register, resulting in only three
> instructions being used. It is important to note that this only works
> when early-ra is disabled due to an early-ra bug which erroneously
> assigns FP registers to the operands; if early-ra is enabled, then the
> original behaviour (NEON instruction) occurs.
> 
> Additional tests are written for the scalar and Adv. SIMD cases to
> ensure that the correct instructions are used. The NEON intrinsics are
> already tested elsewhere. The signed scalar case is also tested with
> an execution test to check the results.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-builtins.cc: Expand iterators.
> * config/aarch64/aarch64-simd-builtins.def: Use standard names
> * config/aarch64/aarch64-simd.md: Use standard names, split insn
> definitions on signedness of operator and type of operands.
> * config/aarch64/arm_neon.h: Use standard builtin names.
> * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
> simplify splitting of insn for scalar arithmetic.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
> Template file for unsigned vector saturating arithmetic tests.
> * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
> 8-bit vector type tests.
> * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
> 16-bit vector type tests.
> * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
> 32-bit vector type tests.
> * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
> 64-bit vector type tests.
> * gcc.target/aarch64/saturating_arithmetic.inc: Template file
> for scalar saturating arithmetic tests.
> * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
> * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
> * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
> * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
> * gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
> ---
> gcc/config/aarch64/aarch64-builtins.cc|  13 +
> gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
> gcc/config/aarch64/aarch64-simd.md| 209 ++-
> gcc/config/aarch64/arm_neon.h |  96 +++
> gcc/config/aarch64/iterators.md   |   4 +
> .../saturating_arithmetic_autovect.inc|  58 +
> .../saturating_arithmetic_autovect_1.c|  79 ++
> .../saturating_arithmetic_autovect_2.c|  79 ++
> .../saturating_arithmetic_autovect_3.c|  75 ++
> .../saturating_arithmetic_autovect_4.c|  77 ++
> .../aarch64/saturating-arithmetic-signed.c| 244 ++
> .../aarch64/saturating_arithmetic.inc |  39 +++
> .../aarch64/saturating_arithmetic_1.c |  36 +++
> .../aarch64/saturating_arithmetic_2.c |  36 +++
> .../aarch64/saturating_arithmetic_3.c |  30 +++
> .../aarch64/saturating_arithmetic_4.c |  30 +++
> 16 files changed, 1057 insertions(+), 56 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
> create mode 100644 
> gcc/testsu

[PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-11-14 Thread Akram Ahmad
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Furthermore, this patch introduces a new optimisation for signed 32
and 64-bit scalar saturating arithmetic which uses adds/subs in place
of the NEON instruction.

Addition, before:
fmovd0, x0
fmovd1, x1
sqadd   d0, d0, d1
fmovx0, d0

Addition, after:
asr x2, x1, 63
addsx0, x0, x1
eor x2, x2, 0x8000
csinv   x0, x0, x2, vc

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere. The signed scalar case is also tested with
an execution test to check the results.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
* gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
---
 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 209 ++-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 +
 .../saturating_arithmetic_autovect_1.c|  79 ++
 .../saturating_arithmetic_autovect_2.c|  79 ++
 .../saturating_arithmetic_autovect_3.c|  75 ++
 .../saturating_arithmetic_autovect_4.c|  77 ++
 .../aarch64/saturating-arithmetic-signed.c| 244 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 +++
 .../aarch64/saturating_arithmetic_4.c |  30 +++
 16 files changed, 1057 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_auto