from:"Kyrill Tkachov"

Re: [PATCH v3][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov


Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540417.html





Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the single-precision 
arithmetic
operations for MVE. This changes are to support the MVE ACLE 
intrinsics which

operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Srinath Parvathaneni 

    * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
function to add

    emulator calls for dobule precision arithmetic operations for MVE.

2020-03-06  Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/mve_libcall1.c: New test.
    * gcc.target/arm/mve/intrinsics/mve_libcall2.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c28a475629c7fbad48730beed5550e0cffdf2e1b..40db35a2a8b6dedb4f536b4995e80c8b9a38b588 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5754,9 +5754,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
   /* Values from double-precision helper functions are returned 
in core

  registers if the selected core only supports single-precision
  arithmetic, even if we are using the hard-float ABI.  The 
same is

-    true for single-precision helpers, but we will never be using the
-    hard-float ABI on a CPU which doesn't support single-precision
-    operations in hardware.  */
+    true for single-precision helpers except in case of MVE, 
because in
+    MVE we will be using the hard-float ABI on a CPU which 
doesn't support
+    single-precision operations in hardware.  In MVE the 
following check

+    enables use of emulation for the single-precision arithmetic
+    operations.  */
+  if (TARGET_HAVE_MVE)
+   {
+ add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+   }
   add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));
diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c

new file mode 100644
index 
..f89301228c577291fc3095420df1937e1a0c7104

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c
@@ -0,0 +1,70 @@
+/* { dg-do compile  } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve 
-mfloat-abi=hard -mthumb -mfpu=auto" } */

+
+float
+foo (float a, float b, float c)
+{
+  return a + b + c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fadd" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fadd" 2 } } */



What is the point of repeating the scan-assembler directives here? The 
first scan-assembler should be redundant given the scan-assembler-times ?


Otherwise ok.

Thanks,

Kyrill



+
+float
+foo1 (float a, float b, float c)
+{
+  return a - b - c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fsub" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fsub" 2 } } */
+
+float
+foo2 (float a, float b, float c)
+{
+  return a * b * c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fmul" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fmul" 2 } } */
+
+float
+foo3 (float b, float c)
+{
+  return b / c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fdiv" }  } */
+
+int
+foo4 (float b, float c)
+{
+  return b < c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fcmplt" }  } */
+
+int
+foo5 (float b, float c)
+{
+  return b > c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fcmpgt" }  } */
+
+int
+foo6

Re: [PATCH v3][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov


Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540416.html





Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application 
Program Status Register)
register and FPSCR (Floating-point Status and Control Register) 
register for MVE.

This patch also enables thumb2 mov RTL patterns for MVE.

A new feature bit vfp_base is added. This bit is enabled for all VFP, 
MVE and MVE with floating point
extensions. This bit is used to enable the macro TARGET_VFP_BASE. For 
all the VFP instructions, RTL patterns,
status and control registers are guarded by TARGET_HAVE_FLOAT. But 
this patch modifies that and the
common instructions, RTL patterns, status and control registers 
bewteen MVE and VFP are guarded by

TARGET_VFP_BASE macro.

The RTL pattern set_fpscr and get_fpscr are updated to use 
VFPCC_REGNUM because few MVE intrinsics

set/get carry bit of FPSCR register.

Please refer to Arm reference manual [1] for more details.
[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?



Ok, but make sure it bootstraps on arm-none-linux-gnueabihf (as with the 
other patches in this series)


Thanks,

Kyrill




Thanks,
Srinath
gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * common/config/arm/arm-common.c (arm_asm_auto_mfpu): When 
vfp_base
    feature bit is on and -mfpu=auto is passed as compiler option, 
do not
    generate error on not finding any match fpu. Because in this 
case fpu

    is not required.
    * config/arm/arm-cpus.in (vfp_base): Define feature bit, this 
bit is

    enabled for MVE and also for all VFP extensions.
    (VFPv2): Modify fgroup to enable vfp_base feature bit when 
ever VFPv2

    is enabled.
    (MVE): Define fgroup to enable feature bits mve, vfp_base and 
armv7em.
    (MVE_FP): Define fgroup to enable feature bits is fgroup MVE 
and FPv5

    along with feature bits mve_float.
    (mve): Modify add options in armv8.1-m.main arch for MVE.
    (mve.fp): Modify add options in armv8.1-m.main arch for MVE with
    floating point.
    * config/arm/arm.c (use_return_insn): Replace the
    check with TARGET_VFP_BASE.
    (thumb2_legitimate_index_p): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE.
    (arm_rtx_costs_internal): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"
    with TARGET_VFP_BASE, to allow cost calculations for copies in 
MVE as

    well.
    (arm_get_vfp_saved_size): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE, to allow space calculation for VFP registers 
in MVE

    as well.
    (arm_compute_frame_layout): Likewise.
    (arm_save_coproc_regs): Likewise.
    (arm_fixed_condition_code_regs): Modify to enable using 
VFPCC_REGNUM

    in MVE as well.
    (arm_hard_regno_mode_ok): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"

    with equivalent macro TARGET_VFP_BASE.
    (arm_expand_epilogue_apcs_frame): Likewise.
    (arm_expand_epilogue): Likewise.
    (arm_conditional_register_usage): Likewise.
    (arm_declare_function_name): Add check to skip printing .fpu 
directive
    in assembly file when TARGET_VFP_BASE is enabled and 
fpu_to_print is

    "softvfp".
    * config/arm/arm.h (TARGET_VFP_BASE): Define.
    * config/arm/arm.md (arch): Add "mve" to arch.
    (eq_attr "arch" "mve"): Enable on TARGET_HAVE_MVE is true.
    (vfp_pop_multiple_with_writeback): Replace "TARGET_HARD_FLOAT
    || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE.
    * config/arm/constraints.md (Uf): Define to allow modification 
to FPCCR

    in MVE.
    * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Modify 
target guard

    to not allow for MVE.
    * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Move to volatile 
unspecs

    enum.
    (VUNSPEC_GET_FPSCR): Define.
    * config/arm/vfp.md (thumb2_movhi_vfp): Add support for VMSR 
and VMRS
    instructions which move to general-purpose Register from 
Floating-point

    Special register and vice-versa.
    (thumb2_movhi_fp16): Likewise.
    (thumb2_movsi_vfp): Add support for VMSR and VMRS instructions 
along
    with MCR and MRC instructions which set and get Floating-point 
Status

    and Control Register (FPSCR).
    (movdi_vfp): Modify pattern to enable Single-precision scalar 
float move

    in MVE.
    (thumb2_movdf_vfp): Modify pattern to enable Double-precision 
scalar

    float move patterns in MVE.
    (thumb2_movsfcc_vfp): Modify pattern to enable single float 
conditional

Re: [PATCH v3][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov


Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html




Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE 
intrinsics.


Header file arm_mve.h is added to source code, which contains the 
definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md 
is also added which contains the

RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated 
patterns. A new register class "VPR_REG"

is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move 
patterns. The prefix of neon functions

which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to 
simd_immediate_valid_for_move.


In the patch standard patterns mve_move, mve_store and move_load for 
MVE are added and neon.md and vfp.md

files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?



This is ok but please bootstrap it on arm-none-linux-gnueabihf as well.

Thanks,

Kyrill




Thanks,
Srinath

gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config.gcc (arm_mve.h): Include mve intrinsics header file.
    * config/arm/aout.h (p0): Add new register name for MVE predicated
    cases.
    * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define 
macro

    common to Neon and MVE.
    (ARM_BUILTIN_NEON_LANE_CHECK): Renamed to 
ARM_BUILTIN_SIMD_LANE_CHECK.

    (arm_init_simd_builtin_types): Disable poly types for MVE.
    (arm_init_neon_builtins): Move a check to arm_init_builtins 
function.

    (arm_init_builtins): Use ARM_BUILTIN_SIMD_LANE_CHECK instead of
    ARM_BUILTIN_NEON_LANE_CHECK.
    (mve_dereference_pointer): Add function.
    (arm_expand_builtin_args): Call to mve_dereference_pointer 
when MVE is

    enabled.
    (arm_expand_neon_builtin): Moved to arm_expand_builtin function.
    (arm_expand_builtin): Moved from arm_expand_neon_builtin function.
    * config/arm/arm-c.c (__ARM_FEATURE_MVE): Define macro for MVE 
and MVE

    with floating point enabled.
    * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
Renamed to

    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Renamed from
    neon_immediate_valid_for_move function.
    * config/arm/arm.c (arm_options_perform_arch_sanity_checks): 
Generate

    error if vfpv2 feature bit is disabled and mve feature bit is also
    disabled for HARD_FLOAT_ABI.
    (use_return_insn): Check to not push VFP regs for MVE.
    (aapcs_vfp_allocate): Add MVE check to have same Procedure 
Call Standard

    as Neon.
    (aapcs_vfp_allocate_return_reg): Likewise.
    (thumb2_legitimate_address_p): Check to return 0 on valid Thumb-2
    address operand for MVE.
    (arm_rtx_costs_internal): MVE check to determine cost of rtx.
    (neon_valid_immediate): Rename to simd_valid_immediate.
    (simd_valid_immediate): Rename from neon_valid_immediate.
    (simd_valid_immediate): MVE check on size of vector is 128 bits.
    (neon_immediate_valid_for_move): Rename to
    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Rename from
    neon_immediate_valid_for_move.
    (neon_immediate_valid_for_logic): Modify call to 
neon_valid_immediate

    function.
    (neon_make_constant): Modify call to neon_valid_immediate 
function.
    (neon_vector_mem_operand): Return VFP register for POST_INC or 
PRE_DEC

    for MVE.
    (output_move_neon): Add MVE check to generate vldm/vstm 
instrcutions.
    (arm_compute_frame_layout): Calculate space for saved VFP 
registers for

    MVE.
    (arm_save_coproc_regs): Save coproc registers for MVE.
    (arm_print_operand): Add case 'E' to print memory operands for 
MVE.
    (arm_print_operand_address): Check to print register number 
for MVE.
    (arm_hard_regno_mode_ok): Check for arm hard regno mode ok for 
MVE.

    (arm_modes_tieable_p): Check to allow structure mode for MVE.
    (arm_regno_class): Add VPR_REGNUM check.
    (arm_expand_epilogue_apcs_frame): MVE check to calculate 
epilogue code

    for APCS frame.
    (arm_expand_epilogue): MVE check for enabling pop instructions in
    epilogue.
    (arm_print_asm_arch_directives): Modify function to disable 
print of

Re: [GCC][Patch]Bug fix: cannot convert 'const short int' to 'const __bf16'

2020-03-11 Thread Kyrill Tkachov




On 3/11/20 5:59 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/11/20 5:49 PM, Delia Burduv wrote:

This patch fixes a bug introduced by my earlier patch (
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541680.html ).
It introduces a new scalar builtin type that was missing in the original
patch.

Bootstrapped cleanly on arm-none-linux-gnueabihf.
Tested for regression on arm-none-linux-gnueabihf. No regression from
before the original patch.
Tests that failed or became unsupported because of the original tests
now work as they did before it.

gcc/ChangeLog:

2020-03-11  Delia Burduv  

    * config/arm/arm-builtins.c
 (arm_init_simd_builtin_scalar_types): New
    * config/arm/arm_neon.h (vld2_bf16): Used new builtin type
    (vld2q_bf16): Used new builtin type
    (vld3_bf16): Used new builtin type
    (vld3q_bf16): Used new builtin type
    (vld4_bf16): Used new builtin type
    (vld4q_bf16): Used new builtin type
    (vld2_dup_bf16): Used new builtin type
    (vld2q_dup_bf16): Used new builtin type
    (vld3_dup_bf16): Used new builtin type
    (vld3q_dup_bf16): Used new builtin type
    (vld4_dup_bf16): Used new builtin type
    (vld4q_dup_bf16): Used new builtin type


ChangeLog entries should have a full stop after each entry.

The patch is ok.

Thanks for the quick fix,



To be clear, I've pushed it to master with a fixed ChangeLog as 
1c43ee69f4f6148fff4b5ace80d709d7f8b250d7


Kyrill




Kyrill

Re: [GCC][Patch]Bug fix: cannot convert 'const short int' to 'const __bf16'

2020-03-11 Thread Kyrill Tkachov


Hi Delia,

On 3/11/20 5:49 PM, Delia Burduv wrote:

This patch fixes a bug introduced by my earlier patch (
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541680.html ).
It introduces a new scalar builtin type that was missing in the original
patch.

Bootstrapped cleanly on arm-none-linux-gnueabihf.
Tested for regression on arm-none-linux-gnueabihf. No regression from
before the original patch.
Tests that failed or became unsupported because of the original tests
now work as they did before it.

gcc/ChangeLog:

2020-03-11  Delia Burduv  

    * config/arm/arm-builtins.c
 (arm_init_simd_builtin_scalar_types): New
    * config/arm/arm_neon.h (vld2_bf16): Used new builtin type
    (vld2q_bf16): Used new builtin type
    (vld3_bf16): Used new builtin type
    (vld3q_bf16): Used new builtin type
    (vld4_bf16): Used new builtin type
    (vld4q_bf16): Used new builtin type
    (vld2_dup_bf16): Used new builtin type
    (vld2q_dup_bf16): Used new builtin type
    (vld3_dup_bf16): Used new builtin type
    (vld3q_dup_bf16): Used new builtin type
    (vld4_dup_bf16): Used new builtin type
    (vld4q_dup_bf16): Used new builtin type


ChangeLog entries should have a full stop after each entry.

The patch is ok.

Thanks for the quick fix,

Kyrill

Re: [AArch64] Backporting -moutline-atomics to gcc 9.x and 8.x

2020-03-11 Thread Kyrill Tkachov

y expand LSE operations here.
 (atomic_fetch_): Likewise.
 (atomic__fetch): Likewise.
 (aarch64_atomic__lse): Drop atomic_op iterator
 and use ATOMIC_LDOP instead; use register_operand for the input;
 drop the split and emit insns directly.
 (aarch64_atomic_fetch__lse): Likewise.
 (aarch64_atomic__fetch_lse): Remove.
 (@aarch64_atomic_load): Remove.

From-SVN: r265660

 From 53de1ea800db54b47290d578c43892799b66c8dc Mon Sep 17 00:00:00 2001
From: Richard Henderson 
Date: Wed, 31 Oct 2018 23:11:22 +
Subject: [PATCH] aarch64: Remove early clobber from ATOMIC_LDOP scratch

 * config/aarch64/atomics.md (aarch64_atomic__lse):
 The scratch register need not be early-clobber.  Document the reason
 why we cannot use ST.

From-SVN: r265703

On 2/27/20, 12:06 PM, "Kyrill Tkachov"  wrote:

 Hi Sebastian,

 On 2/27/20 4:53 PM, Pop, Sebastian wrote:

 >
 > Hi,
 >
 > is somebody already working on backporting -moutline-atomics to gcc
 > 8.x and 9.x branches?
 >
 I'm not aware of such work going on.

 Thanks,

 Kyrill

 > Thanks,

 >
 > Sebastian
 >

Re: [PATCH] aarch64: Fix ICE in aarch64_add_offset_1 [PR94121]

2020-03-11 Thread Kyrill Tkachov


Hi Jakub,

On 3/11/20 7:22 AM, Jakub Jelinek wrote:

Hi!

abs_hwi asserts that the argument is not HOST_WIDE_INT_MIN and as the
(invalid) testcase shows, the function can be called with such an offset.
The following patch is IMHO minimal fix, absu_hwi unlike abs_hwi 
allows even

that value and will return (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN
in that case.  The function then uses moffset in two spots which wouldn't
care if the value is (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN or
HOST_WIDE_INT_MIN and wouldn't accept it (!moffset and
aarch64_uimm12_shift (moffset)), then in one spot where the signedness of
moffset does matter and using unsigned is the right thing -
moffset < 0x100 - and finally has code which will handle even this
value right; the assembler doesn't really care for DImode immediates if
    mov x1, -9223372036854775808
or
    mov x1, 9223372036854775808
is used and similarly it doesn't matter if we add or sub it in DImode.

Bootstrapped/regtested on aarch64-linux, ok for trunk?


Ok.

Thanks,

Kyrill



2020-03-10  Jakub Jelinek  

    PR target/94121
    * config/aarch64/aarch64.c (aarch64_add_offset_1): Use absu_hwi
    instead of abs_hwi, change moffset type to unsigned HOST_WIDE_INT.

    * gcc.dg/pr94121.c: New test.

--- gcc/config/aarch64/aarch64.c.jj 2020-02-28 17:33:03.414258503 
+0100
+++ gcc/config/aarch64/aarch64.c    2020-03-10 17:01:39.435302124 
+0100

@@ -3713,7 +3713,7 @@ aarch64_add_offset_1 (scalar_int_mode mo
   gcc_assert (emit_move_imm || temp1 != NULL_RTX);
   gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, 
src));


-  HOST_WIDE_INT moffset = abs_hwi (offset);
+  unsigned HOST_WIDE_INT moffset = absu_hwi (offset);
   rtx_insn *insn;

   if (!moffset)
--- gcc/testsuite/gcc.dg/pr94121.c.jj   2020-03-10 16:58:40.246974306 
+0100
+++ gcc/testsuite/gcc.dg/pr94121.c  2020-03-10 16:58:40.246974306 
+0100

@@ -0,0 +1,16 @@
+/* PR target/94121 */
+/* { dg-do compile { target pie } } */
+/* { dg-options "-O2 -fpie -w" } */
+
+#define DIFF_MAX __PTRDIFF_MAX__
+#define DIFF_MIN (-DIFF_MAX - 1)
+
+extern void foo (char *);
+extern char v[];
+
+void
+bar (void)
+{
+  char *p = v;
+  foo ([DIFF_MIN]);
+}

    Jakub

[PATCH][AArch64][SVE] Add missing movprfx attribute to some ternary arithmetic patterns

2020-03-06 Thread Kyrill Tkachov


Hi all,

The two affected SVE2 patterns in this patch output a movprfx'ed 
instruction in their second alternative
but don't set the "movprfx" attribute, which will result in the wrong 
instruction length being assumed by the midend.


This patch fixes that in the same way as the other SVE patterns in the 
backend.


Bootstrapped and tested on aarch64-none-linux-gnu.
Committing to trunk.

Thanks,
Kyrill

2020-03-06  Kyrylo Tkachov  

    * config/aarch64/aarch64-sve2.md (@aarch64_sve_:
    Specify movprfx attribute.
    (@aarch64_sve__lane_): Likewise.

commit b9694320e1bfbfc92255b30cc108a81a243770c6
Author: Kyrylo Tkachov 
Date:   Fri Mar 6 15:26:20 2020 +

[AArch64] Add movprfx attribute to a couple of SVE2 patterns

diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index f82e60e25c7..e18b9fef16e 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -690,6 +690,7 @@
   "@
\t%0., %2., %3.
movprfx\t%0, %1\;\t%0., %2., %3."
+  [(set_attr "movprfx" "*,yes")]
 )
 
 (define_insn "@aarch64_sve__lane_"
@@ -706,6 +707,7 @@
   "@
\t%0., %2., %3.[%4]
movprfx\t%0, %1\;\t%0., %2., %3.[%4]"
+  [(set_attr "movprfx" "*,yes")]
 )
 
 ;; -

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-06 Thread Kyrill Tkachov

Hi Delia,

On 3/5/20 4:38 PM, Delia Burduv wrote:

Hi,

This is the latest version of the patch. I am forcing -mfloat-abi=hard 
because the code generated is slightly differently depending on the 
float-abi used.

Thanks, I've pushed it with an updated ChangeLog.

2020-03-06  Delia Burduv  

    * config/arm/arm_neon.h (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
    (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
    * config/arm/arm_neon_builtins.def
    (vld2): Changed to VAR13 and added v4bf, v8bf
    (vld2_dup): Changed to VAR8 and added v4bf, v8bf
    (vld3): Changed to VAR13 and added v4bf, v8bf
    (vld3_dup): Changed to VAR8 and added v4bf, v8bf
    (vld4): Changed to VAR13 and added v4bf, v8bf
    (vld4_dup): Changed to VAR8 and added v4bf, v8bf
    * config/arm/iterators.md (VDXBF2): New iterator.
    *config/arm/neon.md (neon_vld2): Use new iterators.
    (neon_vld2_dup): Likewise.
    (neon_vld3qa): Likewise.
    (neon_vld3qb): Likewise.
    (neon_vld3_dup): Likewise.
    (neon_vld4): Likewise.
    (neon_vld4qa): Likewise.
    (neon_vld4qb): Likewise.
    (neon_vld4_dup): Likewise.
    (neon_vld2_dupv8bf): New.
    (neon_vld3_dupv8bf): Likewise.
    (neon_vld4_dupv8bf): Likewise.

Kyrill

Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
 (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.

gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

    * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output 
registers as

>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld{q}_bf16 as part of the BFloat16 extension.
>>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 

>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit 
it for

>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv 
>>>
>>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>  (bfloat16x4x2_t): New typedef.
>>>  (bfloat16x8x2_t): New typedef.
>>>  (bfloat16x4x3_t): New typedef.
>>>  (bfloat16x

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-03-06 Thread Kyrill Tkachov

Hi Delia,

On 3/5/20 3:53 PM, Delia Burduv wrote:

Hi,

This is the latest version of the patch. I am forcing -mfloat-abi=hard 
because the register allocator behaves differently depending on the 
float-abi used.

Thanks, I've pushed it to master with an updated ChangeLog reflecting 
the recent changes. In the future, please send an updated ChangeLog 
whenever something changes in the patches.

Thanks again!

Kyrill

2020-03-06  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
    (bfloat16x8x2_t): New typedef.
    (bfloat16x4x3_t): New typedef.
    (bfloat16x8x3_t): New typedef.
    (bfloat16x4x4_t): New typedef.
    (bfloat16x8x4_t): New typedef.
    (vst2_bf16): New.
    (vst2q_bf16): New.
    (vst3_bf16): New.
    (vst3q_bf16): New.
    (vst4_bf16): New.
    (vst4q_bf16): New.
    * config/arm/arm-builtins.c (v2bf_UP): Define.
    (VAR13): New.
    (arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
    * config/arm/arm-modes.def (V2BF): New mode.
    * config/arm/arm-simd-builtin-types.def
    (Bfloat16x2_t): New entry.
    * config/arm/arm_neon_builtins.def
    (vst2): Changed to VAR13 and added v4bf, v8bf
    (vst3): Changed to VAR13 and added v4bf, v8bf
    (vst4): Changed to VAR13 and added v4bf, v8bf
    * config/arm/iterators.md (VDXBF): New iterator.
    (VQ2BF): New iterator.
    *config/arm/neon.md (neon_vst2): Used new iterators.
    (neon_vst2): Used new iterators.
    (neon_vst3): Used new iterators.
    (neon_vst3): Used new iterators.
    (neon_vst3qa): Used new iterators.
    (neon_vst3qb): Used new iterators.
    (neon_vst4): Used new iterators.
    (neon_vst4): Used new iterators.
    (neon_vst4qa): Used new iterators.
    (neon_vst4qb): Used new iterators.

Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/3/20 5:23 PM, Delia Burduv wrote:

Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and this 
is the latest version.

Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:

Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:

Hi,

I made a mistake in the previous patch. This is the latest 
version. Please let me know if it is ok.

Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how 
the aarch64 versions are implemented too.

Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in 
the

AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output 
registers as

> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 

>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. 
I don't
>> have commit rights, so if this is ok can someone please 
commit it for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-03-05 Thread Kyrill Tkachov




On 3/5/20 11:22 AM, Kyrill Tkachov wrote:

Hi Delia,

On 3/4/20 5:20 PM, Delia Burduv wrote:

Hi,

This is the latest version of the patch.

Thanks,
Delia

On 2/21/20 11:41 AM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:23 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/31/20 3:23 PM, Delia Burduv wrote:
Here is the updated patch. The changes are minor, so let me know 
if there is anything else to fix or if it can be committed.


Thank you,
Delia

On 1/30/20 2:55 PM, Kyrill Tkachov wrote:

Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.
 


*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 

*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 
vmmla and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches 
and I
will apply what is relevant to this patch as well. Particularly, 
I will
change the tests to use the exact input and output registers and 
I will

change the types of the rtl patterns.



Please send the updated patches so that someone can commit them 
for you once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab 
and vfmat

> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns 
are

> defined in neon.md.
> Two new tests are added to check assembler output and lane 
indices.

>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I 
don't have
> commit rights, so if this is ok can someone please commit it 
for me?

>
> gcc/ChangeLog:
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon.h (vbfmmlaq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon_builtins.def (vbfmmla): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ * config/arm/iterators.md (BF_MA): New int 
iterator.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (bt): New int attribute.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (VQXBF): Copy of VQX with V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V_HALF): Added V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ * config/arm/neon.md (neon_vbfmmlav8hi): 
New insn.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfmav8hi): New insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_lanev8hi): New 
insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_laneqv8hi): New 
expand.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vget_high): Changed 
iterator to VQXBF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/unspecs.md (UNSPEC_BFMMLA): New 
UNSPEC.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAB): New UNSPEC.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_1.c: 
New test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_2.c: 
New test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_mmla_1.c: 
New test.


This looks good, a few minor things though...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,
 ï¿½ï¿½ return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, 
__index);

 ï¿½}

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+ï¿½ return __builtin_neon_vbfmmlav8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r,

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-03-05 Thread Kyrill Tkachov


Hi Delia,

On 3/4/20 5:20 PM, Delia Burduv wrote:

Hi,

This is the latest version of the patch.

Thanks,
Delia

On 2/21/20 11:41 AM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:23 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/31/20 3:23 PM, Delia Burduv wrote:
Here is the updated patch. The changes are minor, so let me know if 
there is anything else to fix or if it can be committed.


Thank you,
Delia

On 1/30/20 2:55 PM, Kyrill Tkachov wrote:

Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.
 


*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 

*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 
vmmla and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches 
and I
will apply what is relevant to this patch as well. Particularly, 
I will
change the tests to use the exact input and output registers and 
I will

change the types of the rtl patterns.



Please send the updated patches so that someone can commit them 
for you once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab 
and vfmat

> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns are
> defined in neon.md.
> Two new tests are added to check assembler output and lane 
indices.

>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I 
don't have
> commit rights, so if this is ok can someone please commit it 
for me?

>
> gcc/ChangeLog:
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon.h (vbfmmlaq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon_builtins.def (vbfmmla): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ * config/arm/iterators.md (BF_MA): New int 
iterator.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (bt): New int attribute.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (VQXBF): Copy of VQX with V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V_HALF): Added V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ * config/arm/neon.md (neon_vbfmmlav8hi): New 
insn.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfmav8hi): New insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_lanev8hi): New 
insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_laneqv8hi): New 
expand.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vget_high): Changed 
iterator to VQXBF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/unspecs.md (UNSPEC_BFMMLA): New 
UNSPEC.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAB): New UNSPEC.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_1.c: New 
test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_2.c: New 
test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_mmla_1.c: 
New test.


This looks good, a few minor things though...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,
 ï¿½ï¿½ return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, 
__index);

 ï¿½}

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+ï¿½ return __builtin_neon_vbfmmlav8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+ï¿½ return

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-03-04 Thread Kyrill Tkachov


Hi Delia,

On 3/3/20 5:23 PM, Delia Burduv wrote:

Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and this is 
the latest version.


Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:

Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:

Hi,

I made a mistake in the previous patch. This is the latest version. 
Please let me know if it is ok.


Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the 
aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output 
registers as

> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I 
don't
>> have commit rights, so if this is ok can someone please commit 
it for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 
patches...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,

    return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the 
main arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+



diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c 
b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
new file mode 100644
index 
..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-04 Thread Kyrill Tkachov

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
 (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.

gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

    * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output registers as
>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld{q}_bf16 as part of the BFloat16 extension.
>>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 

>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit it for
>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv 
>>>
>>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>  (bfloat16x4x2_t): New typedef.
>>>  (bfloat16x8x2_t): New typedef.
>>>  (bfloat16x4x3_t): New typedef.
>>>  (bfloat16x8x3_t): New typedef.
>>>  (bfloat16x4x4_t): New typedef.
>>>  (bfloat16x8x4_t): New typedef.
>>>  (vld2_bf16): New.
>>>  (vld2q_bf16): New.
>>>  (vld3_bf16): New.
>>>  (vld3q_bf16): New.
>>>  (vld4_bf16): New.
>>>  (vld4q_bf16): New.
>>>  (vld2_dup_bf16): New.
>>>  (vld2q_dup_bf16): New.
>>>   (vld3_dup_bf16): New.
>>>  (vld3q_dup_bf16): New.
>>>  (vld4_dup_bf16): New.
>>>  (vld4q_dup_bf16): New.
>>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>  (VAR13): New.
>>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>>  * config/arm/arm-modes.def (V2BF): New mode.
>>>  * config/arm/arm-simd-builtin-types.def
>>>  (Bfloat16x2_t): New entry.
>>>  * config/arm/arm_neon_builtins.def
>>>  (vld2): Changed to VAR13 and added v4bf, v8bf
>>>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld3): Changed to VAR13 and added v4bf, v8bf
>>>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld4): Changed to VAR13 and added v4bf, v8bf
>>>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>>  * config/arm/iterators.md (VDXBF): New iterator.
>>>  (VQ2BF): New iterator.
>>>  (V_elem): Added V4BF, V8BF.
>>>  (V_sz_elem): Added V4BF, V8BF.
>>>  (V_mode_nunits): Added V4BF, V8BF.
>>>  (q): Added V4BF, V8BF.
>>>  *config/arm/neon.md (vld2): Used new iterators.
>>>  (vld2_dup): Used new iterators.
>>>  (vld2_dupv8bf): New.
>>>  (vst3): Used new iterators.
>>>  (vst3qa): Used new iterators.
>>>  (vst3qb): Used new iterators.
>>>  (vld3_dup): Used new iterators.
>>>

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-03-04 Thread Kyrill Tkachov




On 3/4/20 2:14 PM, Tamar Christina wrote:

Hi Kyrill,

Ok for backporting this patch to GCC 8 and GCC 9?



Ok assuming bootstrap and test shows no problems.

Thanks,

Kyrill




Thanks,
Tamar


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org 
On Behalf Of Kyrill Tkachov
Sent: Thursday, January 30, 2020 14:55
To: Stam Markianos-Wright ; gcc-
patc...@gcc.gnu.org
Cc: ni...@redhat.com; Ramana Radhakrishnan
; Richard Earnshaw

Subject: Re: [PING][PATCH][GCC][ARM] Arm generates out of range
conditional branches in Thumb2 (PR91816)


On 1/30/20 2:42 PM, Stam Markianos-Wright wrote:


On 1/28/20 10:35 AM, Kyrill Tkachov wrote:

Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:

On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this
approved first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary

extra

#defines from the test.


This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c
b/gcc/testsuite/gcc.target/arm/pr91816.c
new file mode 100644
index


..757c897e9c0db32709227b3fdf
1

b4a8033428232
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */ int
+printf(const char *, ...);
+

I think this needs a couple of effective target checks like
arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm
that add -mthumb to the options.

Hmm, looking back at this now, is there any reason why it can't just be:

/* { dg-do compile } */
/* { dg-require-effective-target arm_thumb2_ok } */
/* { dg-additional-options "-mthumb" }  */

were we don't override the march or fpu options at all, but just use
`require-effective-target arm_thumb2_ok` to make sure that thumb2 is
supported?

The attached new diff does just that.


Works for me, there are plenty of configurations run with fpu that it should
get the right coverage.

Ok (make sure commit the updated, if needed, ChangeLog as well)

Thanks!

Kyrill



Cheers :)

Stam.


Thanks,
Kyrill

Re: [Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-03-03 Thread Kyrill Tkachov


Hi Dennis,

On 3/2/20 5:41 PM, Dennis Zhang wrote:

Hi all,

On 17/01/2020 16:46, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on Arm BFMode patch
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch implements intrinsics to convert between bfloat16 and 
float32

> formats.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested.
>
> Is it OK for trunk please?



Ok.

Thanks,

Kyrill



>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
>  * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
>  (vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
>  (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New 
entries.

>  (vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
>  * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
>  (V_bf_low, V_bf_cvt_m): New mode attributes.
>  * config/arm/neon.md (neon_vbfcvtv4sf): New.
>  (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
>  (neon_vbfcvt, neon_vbfcvt_highv8bf): New.
>  (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
>  * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_cvt_1.c: New test.
>
>

The tests are updated in this patch for assembly test.
Rebased to trunk top.

Is it OK to commit please?

Cheers
Dennis

Re: [GCC][PATCH][ARM] Add multilib mapping for Armv8.1-M+MVE with -mfloat-abi=hard

2020-02-27 Thread Kyrill Tkachov


Hi Mihail,

On 2/20/20 4:15 PM, Mihail Ionescu wrote:

Hi,

This patch adds a new multilib for armv8.1-m.main+mve with hard float 
abi. For

armv8.1-m.main+mve soft and softfp, the v8-M multilibs will be reused.
The following mappings are also updated:
"-mfloat-abi=hard -march=armv8.1-m.main+mve.fp -> armv8-m.main+fp/hard"
"-mfloat-abi=softfp -march=armv8.1-m.main+mve.fp -> 
armv8-m.main+fp/softfp"

"-mfloat-abi=soft -march=armv8.1-m.main+mve.fp -> armv8-m.main/nofp"

The patch also includes a libgcc change to prevent 
cmse_nonsecure_call.S from being
compiled for v8.1-M. v8.1-M doesn't need it since the same behaviour 
is achieved during

code generation by using the new instructions[1].

[1] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html

Tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-20  Mihail Ionescu  

    * config/arm/t-rmprofile: create new multilib for
    armv8.1-m.main+mve hard float and reuse v8-m.main ones for
    v8.1-m.main+mve .

gcc/testsuite/ChangeLog:

2020-02-20  Mihail Ionescu  

    * testsuite/gcc.target/arm/multilib.exp: Add new v8.1-M entry.



No testsuite/ in the prefix here.



2020-02-20  Mihail Ionescu  

libgcc/ChangLog:

    * config/arm/t-arm: Do not compile cmse_nonsecure_call.S for 
v8.1-m.


Ok for trunk?


Ok.

Thanks,

Kyrill




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
0fb3084c8b20f16ccadba632fc55162b196651d5..16e368f25cc2e3ad341adc2752120ad0defdf2a4 
100644

--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@

 # Arch and FPU variants to build libraries with

-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base 
v8-m.main v8-m.main+fp v8-m.main+dp
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base 
v8-m.main v8-m.main+fp v8-m.main+dp v8.1-m.main+mve


 # Base M-profile (no fp)
 MULTILIB_REQUIRED   += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -48,8 +48,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv8-m.main+fp/mfloat-abi=hard

 MULTILIB_REQUIRED   += mthumb/march=armv8-m.main+fp/mfloat-abi=softfp
 MULTILIB_REQUIRED   += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=hard
 MULTILIB_REQUIRED   += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp

-
-
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+mve/mfloat-abi=hard

 # Arch Matches
 MULTILIB_MATCHES    += march?armv6s-m=march?armv6-m
@@ -66,12 +65,14 @@ MULTILIB_MATCHES    += 
march?armv7e-m+fp=march?armv7e-m+fpv5
 MULTILIB_REUSE  += $(foreach ARCH, armv6s-m armv7-m armv7e-m 
armv8-m\.base armv8-m\.main, \

mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp)

+
 # Map v8.1-M to v8-M.
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8.1-m.main
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8.1-m.main+dsp
-MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+mve
+MULTILIB_REUSE += 
mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.soft
+MULTILIB_REUSE += 
mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.softfp


-v8_1m_sp_variants = +fp +dsp+fp +mve.fp
+v8_1m_sp_variants = +fp +dsp+fp +mve.fp +fp+mve
 v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp

 # Map all v8.1-m.main FP sp variants down to v8-m.
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 
67d00266f6b5e69aa2a7831cfb9a4353ac4f4340..42aaebfabdf76c45a1909b2aaa1651d3c42ee4b7 
100644

--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -813,6 +813,9 @@ if {[multilib_config "rmprofile"] } {
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=soft} 
"thumb/v8-m.main/nofp"
 {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main/nofp"
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main+fp/softfp"
+   {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=hard} 
"thumb/v8.1-m.main+mve/hard"
+   {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=hard} 
"thumb/v8-m.main+fp/hard"
+   {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main+fp/softfp"
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=hard} 
"thumb/v8-m.main+fp/hard"
 {-march=armv8.1-m.main+mve+fp.dp -mfpu=auto -mfloat-abi=soft}

Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Kyrill Tkachov


Hi Mihail,

On 2/27/20 2:44 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 02/27/2020 11:09 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 2/27/20 10:27 AM, Mihail Ionescu wrote:

Hi,

This patch adds support for the bf16 vector create, get, set,
duplicate and reinterpret intrinsics.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-27  Mihail Ionescu  

    * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
    beginning of the file.
    (vcreate_bf16, vcombine_bf16): New.
    (vdup_n_bf16, vdupq_n_bf16): New.
    (vdup_lane_bf16, vdup_laneq_bf16): New.
    (vdupq_lane_bf16, vdupq_laneq_bf16): New.
    (vduph_lane_bf16, vduph_laneq_bf16): New.
    (vset_lane_bf16, vsetq_lane_bf16): New.
    (vget_lane_bf16, vgetq_lane_bf16): New.
    (vget_high_bf16, vget_low_bf16): New.
    (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
    (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
    (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
    (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
    (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
    (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
    (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
    (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
    (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
    (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
    (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
    (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
    (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
    (vreinterpretq_bf16_p128): New.
    (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
    (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
    (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
    (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
    (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
    (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
    (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
    (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
    (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
    (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
    (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
    (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
    (vreinterpretq_p128_bf16): New.
    * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
    (V_elem): Likewise.
    (V_elem_l): Likewise.
    (VD_LANE): Likewise.
    (VQX) Add V8BF.
    (V_DOUBLE): Likewise.
    (VDQX): Add V4BF and V8BF.
    (V_two_elem, V_three_elem, V_four_elem): Likewise.
    (V_reg): Likewise.
    (V_HALF): Likewise.
    (V_double_vector_mode): Likewise.
    (V_cmp_result): Likewise.
    (V_uf_sclr): Likewise.
    (V_sz_elem): Likewise.
    (Is_d_reg): Likewise.
    (V_mode_nunits): Likewise.
    * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.

gcc/testsuite/ChangeLog:

2020-02-27  Mihail Ionescu  

    * gcc.target/arm/bf16_dup.c: New test.
    * gcc.target/arm/bf16_reinterpret.c: Likewise.

Is it ok for trunk?


This looks mostly ok with a few nits...




Regards,
Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 

+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 
1))
+#define __arm_laneq(__vec, __idx) (__idx ^ 
(__ARM_NUM_LANES(__vec)/2 - 1))

+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */



Please move this comment as well.



-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 
1))
-#define __arm_laneq(__vec, __idx) (__idx ^ 
(__ARM_NUM_LANES(__vec)/2 - 1))

-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif

 #define vget_lane_f16(__v, __idx)   \
__extension__ \
@@ -14476,6 +14477,15 @@ vreinterpr

Re: [GCC] Fix misleading aarch64 mcpu/march warning string

2020-02-27 Thread Kyrill Tkachov


Hi Joel,

On 2/27/20 2:31 PM, Joel Hutton wrote:

The message for conflicting mcpu and march previously printed the
architecture of the CPU instead of the CPU name, as well as omitting the
extensions to the march string. This patch corrects both errors. This
patch fixes PR target/87612.


before:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=armv8.2-a' conflicts with '-march=armv8-a'
switch

after:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=cortex-a76' conflicts with
'-march=armv8-a+sve' switch


gcc/ChangeLog:

2020-02-27  Joel Hutton  
    PR target/87612
    * config/aarch64/aarch64.c (aarch64_override_options): Fix
misleading warning string.



Newline after the Name/email line in the ChangeLog.

This is okay for trunk.

Do you have commit access?

If not, please follow the steps at 
https://gcc.gnu.org/gitwrite.html#authenticated listing myself as approver.


Then you can commit this yourself.

Thanks,

Kyrill

Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Kyrill Tkachov


Hi Mihail,

On 2/27/20 10:27 AM, Mihail Ionescu wrote:

Hi,

This patch adds support for the bf16 vector create, get, set,
duplicate and reinterpret intrinsics.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-27  Mihail Ionescu  

    * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
    beginning of the file.
    (vcreate_bf16, vcombine_bf16): New.
    (vdup_n_bf16, vdupq_n_bf16): New.
    (vdup_lane_bf16, vdup_laneq_bf16): New.
    (vdupq_lane_bf16, vdupq_laneq_bf16): New.
    (vduph_lane_bf16, vduph_laneq_bf16): New.
    (vset_lane_bf16, vsetq_lane_bf16): New.
    (vget_lane_bf16, vgetq_lane_bf16): New.
    (vget_high_bf16, vget_low_bf16): New.
    (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
    (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
    (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
    (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
    (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
    (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
    (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
    (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
    (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
    (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
    (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
    (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
    (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
    (vreinterpretq_bf16_p128): New.
    (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
    (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
    (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
    (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
    (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
    (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
    (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
    (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
    (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
    (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
    (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
    (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
    (vreinterpretq_p128_bf16): New.
    * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
    (V_elem): Likewise.
    (V_elem_l): Likewise.
    (VD_LANE): Likewise.
    (VQX) Add V8BF.
    (V_DOUBLE): Likewise.
    (VDQX): Add V4BF and V8BF.
    (V_two_elem, V_three_elem, V_four_elem): Likewise.
    (V_reg): Likewise.
    (V_HALF): Likewise.
    (V_double_vector_mode): Likewise.
    (V_cmp_result): Likewise.
    (V_uf_sclr): Likewise.
    (V_sz_elem): Likewise.
    (Is_d_reg): Likewise.
    (V_mode_nunits): Likewise.
    * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.

gcc/testsuite/ChangeLog:

2020-02-27  Mihail Ionescu  

    * gcc.target/arm/bf16_dup.c: New test.
    * gcc.target/arm/bf16_reinterpret.c: Likewise.

Is it ok for trunk?


This looks mostly ok with a few nits...




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 

+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
+#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */



Please move this comment as well.



-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
-#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif

 #define vget_lane_f16(__v, __idx)   \
__extension__ \
@@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 (uint32x2_t __a)
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)

 __extension__ extern

Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-02-25 Thread Kyrill Tkachov


Hi Dennis,

On 2/25/20 5:18 PM, Dennis Zhang wrote:

Hi Kyrill,

On 25/02/2020 12:18, Kyrill Tkachov wrote:

Hi Dennis,

On 2/25/20 11:54 AM, Dennis Zhang wrote:

Hi all,

On 07/01/2020 12:12, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the patch enabling Arm BFmode
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch adds intrinsics for brain half-precision float-point dot
> product.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested for arm-none-linux-gnueabi-armv8-a.
>
> Is it OK for trunk please?
>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
>  (vbfdot_lane_f32, vbfdotq_laneq_f32): New.
>  (vbfdot_laneq_f32, vbfdotq_lane_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfdot): New.
>  (vbfdot_lanev4bf, vbfdot_lanev8bf): New.
>  * config/arm/iterators.md (VSF2BF): New mode attribute.
>  * config/arm/neon.md (neon_vbfdot): New.
>  (neon_vbfdot_lanev4bf): New.
>  (neon_vbfdot_lanev8bf): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_dot_1.c: New test.
>  * gcc.target/arm/simd/bf16_dot_2.c: New test.
>

This patch updates tests in bf16_dot_1.c to make proper assembly check.
Is it OK for trunk, please?

Cheers
Dennis


Looks ok but...


new file mode 100644
index 000..c533f9d0b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include "arm_neon.h"
+
+float32x2_t
+test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { 
dg-error {out of range 0 - 1} } */

+}
+
+float32x4_t
+test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { 
dg-error {out of range 0 - 1} } */

+}
+
+float32x2_t
+test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { 
dg-error {out of range 0 - 3} } */

+}
+
+float32x4_t
+test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { 
dg-error {out of range 0 - 3} } */

+}

These  tests shouldn't be calling the __builtin* directly, they are 
just an implementation detail.

What we want to test is the intrinsic itself.
Thanks,
Kyrill



Many thanks for the review.
The issue is fixed in the updated patch.
Is it ready please?



Ok.

Thanks,

Kyrill




Dennis
Cheers

gcc/ChangeLog:

2020-02-25  Dennis Zhang  

* config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
(vbfdot_lane_f32, vbfdotq_laneq_f32): New.
(vbfdot_laneq_f32, vbfdotq_lane_f32): New.
* config/arm/arm_neon_builtins.def (vbfdot): New entry.
(vbfdot_lanev4bf, vbfdot_lanev8bf): Likewise.
* config/arm/iterators.md (VSF2BF): New attribute.
* config/arm/neon.md (neon_vbfdot): New entry.
(neon_vbfdot_lanev4bf): Likewise.
(neon_vbfdot_lanev8bf): Likewise.

gcc/testsuite/ChangeLog:

2020-02-25  Dennis Zhang  

* gcc.target/arm/simd/bf16_dot_1.c: New test.
* gcc.target/arm/simd/bf16_dot_2.c: New test.
* gcc.target/arm/simd/bf16_dot_3.c: New test.

Re: [ARM] Fix -mpure-code for v6m

2020-02-25 Thread Kyrill Tkachov


Hi Christophe,

On 2/24/20 2:16 PM, Christophe Lyon wrote:

Ping?

I'd also like to backport this and the main patch (svn r279463,
r10-5505-ge24f6408df1e4c5e8c09785d7b488c492dfb68b3)
to the gcc-9 branch.

I found the problem addressed by this patch while validating the
backport to gcc-9: although the patch applies cleanly except for
testcases dg directives, there were some failures which I could
finally reproduce on trunk with -fdisable-rtl-fwprop2.

Here is a summary of the validations I ran using --target arm-eabi:
* without my patches:
(1) --with-cpu cortex-m0
(2) --with-cpu cortex-m4
(3) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code (to build the
libs with -mpure-code)
(4) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code
--target-board=-mpure-code (to also run the tests with -mpure-code)

* with my patches:
(5) --with-cpu cortex-m0 CFLAGS_FOR_TARGET=-mpure-code
--target-board=-mpure-code
(6) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code
--target-board=-mpure-code

Comparing (4) and (6) ensured that my (v6m) patches introduce no
regression in v7m cases.

Comparison of (1) vs (5) gave results similar to (2) vs (6), there's a
bit of noise because some tests cases don't cope well with -mpure-code
despite my previous testsuite-only patch (svn r277828)

Comparison of (1) vs (2) gave similar results to (5) vs (6).

Ideally, we may also want to backport svn r277828 (testsuite-only
patch, to handle -mpure-code better), but that's not mandatory.

In summary, is this patch OK for trunk?

Are this patch and r279463,
r10-5505-ge24f6408df1e4c5e8c09785d7b488c492dfb68b3 OK to backport to
gcc-9?



This is okay with me.

I don't think any of the branches are frozen at the moment, so it should 
be okay to backport it.


Thanks,

Kyrill



Thanks,

Christophe

On Thu, 13 Feb 2020 at 11:14, Christophe Lyon
 wrote:
>
> On Mon, 10 Feb 2020 at 17:45, Richard Earnshaw (lists)
>  wrote:
> >
> > On 10/02/2020 09:27, Christophe Lyon wrote:
> > > On Fri, 7 Feb 2020 at 17:55, Richard Earnshaw (lists)
> > >  wrote:
> > >>
> > >> On 07/02/2020 16:43, Christophe Lyon wrote:
> > >>> On Fri, 7 Feb 2020 at 14:49, Richard Earnshaw (lists)
> > >>>  wrote:
> > 
> >  On 07/02/2020 13:19, Christophe Lyon wrote:
> > > When running the testsuite with -fdisable-rtl-fwprop2 and 
-mpure-code
> > > for cortex-m0, I noticed that some testcases were failing 
because we
> > > still generate "ldr rX, .LCY", which is what we want to 
avoid with

> > > -mpure-code. This is latent since a recent improvement in fwprop
> > > (PR88833).
> > >
> > > In this patch I change the thumb1_movsi_insn pattern so that 
it emits
> > > the desired instruction sequence when 
arm_disable_literal_pool is set.

> > >
> > > I tried to add a define_split instead, but couldn't make it 
work: the
> > > compiler then complains it cannot split the instruction, 
while my new
> > > define_split accepts the same operand types as 
thumb1_movsi_insn:

> > >
> > > c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: 
could not split insn

> > > (insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342])
> > >    (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 
{*thumb1_movsi_insn}
> > > (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") 
[flags 0x2])

> > >    (nil)))
> > > during RTL pass: final
> > >
> > > (define_split
> > >  [(set (match_operand:SI 0 "register_operand" "")
> > >    (match_operand:SI 1 "general_operand" ""))]
> > >  "TARGET_THUMB1
> > >   && arm_disable_literal_pool
> > >   && GET_CODE (operands[1]) == SYMBOL_REF"
> > >  [(clobber (const_int 0))]
> > >  "
> > > gen_thumb1_movsi_symbol_ref(operands[0], operands[1]);
> > >    DONE;
> > >  "
> > > )
> > > and I put this in thumb1_movsi_insn:
> > > if (GET_CODE (operands[1]) == SYMBOL_REF && 
arm_disable_literal_pool)

> > >  {
> > >    return \"#\";
> > >  }
> > >  return \"ldr\\t%0, %1\";
> > >
> > > 2020-02-07  Christophe Lyon 
> > >
> > >    * config/arm/thumb1.md (thumb1_movsi_insn): Fix 
ldr alternative to

> > >    work with -mpure-code.
> > >
> > 
> >  +    case 0:
> >  +    case 1:
> >  +  return \"movs    %0, %1\";
> >  +    case 2:
> >  +  return \"movw    %0, %1\";
> > 
> >  This is OK, but please replace the hard tab in the strings 
for MOVS/MOVW

> >  with \\t.
> > 
> > >>>
> > >>> OK that was merely a cut & paste from the existing code.
> > >>>
> > >>> I'm concerned that the length attribute is becoming wrong with my
> > >>> patch, isn't this a problem?
> > >>>
> > >>
> > >> Potentially yes.  The branch range code needs this to handle 
overly long

> > >> jumps correctly.
> > >>
> > >
> > > Do you mean that the probability of problems due to that shortcoming
> > > is low enough

Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-02-25 Thread Kyrill Tkachov


Hi Dennis,

On 2/25/20 11:54 AM, Dennis Zhang wrote:

Hi all,

On 07/01/2020 12:12, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the patch enabling Arm BFmode
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch adds intrinsics for brain half-precision float-point dot
> product.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested for arm-none-linux-gnueabi-armv8-a.
>
> Is it OK for trunk please?
>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
>  (vbfdot_lane_f32, vbfdotq_laneq_f32): New.
>  (vbfdot_laneq_f32, vbfdotq_lane_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfdot): New.
>  (vbfdot_lanev4bf, vbfdot_lanev8bf): New.
>  * config/arm/iterators.md (VSF2BF): New mode attribute.
>  * config/arm/neon.md (neon_vbfdot): New.
>  (neon_vbfdot_lanev4bf): New.
>  (neon_vbfdot_lanev8bf): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_dot_1.c: New test.
>  * gcc.target/arm/simd/bf16_dot_2.c: New test.
>

This patch updates tests in bf16_dot_1.c to make proper assembly check.
Is it OK for trunk, please?

Cheers
Dennis


Looks ok but...


new file mode 100644
index 000..c533f9d0b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include "arm_neon.h"
+
+float32x2_t
+test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { dg-error {out 
of range 0 - 1} } */
+}
+
+float32x4_t
+test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { dg-error {out 
of range 0 - 1} } */
+}
+
+float32x2_t
+test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { dg-error {out 
of range 0 - 3} } */
+}
+
+float32x4_t
+test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { dg-error {out 
of range 0 - 3} } */
+}

These  tests shouldn't be calling the __builtin* directly, they are just an 
implementation detail.
What we want to test is the intrinsic itself.
Thanks,
Kyrill

Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Kyrill Tkachov


Hi Roman,

On 2/21/20 3:49 PM, Roman Zhuykov wrote:

11.02.2020 14:00, Richard Earnshaw (lists) wrote:

+(define_insn "*doloop_end"
+  [(parallel [(set (pc)
+   (if_then_else
+   (ne (reg:SI LR_REGNUM) (const_int 1))
+ (label_ref (match_operand 0 "" ""))
+ (pc)))
+  (set (reg:SI LR_REGNUM)
+   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  "le\tlr, %l0")

Is it deliberate that this pattern name has a '*' prefix?  doloop_end
is a named expansion pattern according to md.texi.

R.

21.02.2020 18:30, Kyrill Tkachov wrote:

+;; Originally expanded by 'doloop_end'.
+(define_insn "doloop_end_internal"

We usually prefer to name these patterns with a '*' in front to
prevent the gen* machinery from generating gen_* unneeded expanders
for them if they're not used.


It seems you and Richard asking Andrea to do the opposite things.
:) LOL.patch



Almost, but not exactly incompatible things ;)

doloop_end is a standard name and if we wanted to use it directly it 
cannot have a '*', which Richard is right to point out.


Once "doloop_end" is moved to its own expander and the define_insn is 
doloop_end_internal, there is no reason for it to not have a '*' as its 
gen_* form is never called.


Thanks,

Kyrill



Roman

PS. I don't have an idea what approach is correct.

Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Kyrill Tkachov


Hi Andrea,

On 2/19/20 1:01 PM, Andrea Corallo wrote:

Hi all,

Second version of the patch here addressing comments.

This patch enables the Armv8.1-M Mainline LOB (low overhead branch) 
extension

low overhead loops (LOL) feature by using the 'loop-doloop' pass.

Given the following function:

void
loop (int *a)
{
  for (int i = 0; i < 1000; i++)
    a[i] = i;
}

'doloop_begin' and 'doloop_end' patterns translates into 'dls' and 'le'
giving:

 loop:
 movw    r2, #1
 movs    r3, #0
 subs    r0, r0, #4
 push    {lr}
 dls lr, r2
 .L2:
 str r3, [r0, #4]!
 adds    r3, r3, #1
 le  lr, .L2
 ldr pc, [sp], #4

SMS is disabled in tests not to break them when SMS does loop versioning.

bootstrapped arm-none-linux-gnueabihf, do not introduce testsuite 
regressions.



This should be aimed at GCC 11 at this point.

Some comments inline...




Andrea

gcc/ChangeLog:

2020-??-??  Andrea Corallo  
    Mihail-Calin Ionescu 
    Iain Apreotesei  

    * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
    (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
    * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
    * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
    Add new patterns.
    * config/arm/unspecs.md: Add new unspec.

gcc/testsuite/ChangeLog:

2020-??-??  Andrea Corallo  
    Mihail-Calin Ionescu 
    Iain Apreotesei  

    * gcc.target/arm/lob.h: New header.
    * gcc.target/arm/lob1.c: New testcase.
    * gcc.target/arm/lob2.c: Likewise.
    * gcc.target/arm/lob3.c: Likewise.
    * gcc.target/arm/lob4.c: Likewise.
    * gcc.target/arm/lob5.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.



lol.patch

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index e07cf03538c5..1269f40bd77c 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -586,6 +586,9 @@ extern int arm_arch_bf16;
 
 /* Target machine storage Layout.  */
 
+/* Nonzero if this chip provides Armv8.1-M Mainline

+   LOB (low overhead branch features) extension instructions.  */
+#define TARGET_HAVE_LOB (arm_arch8_1m_main)
 
 /* Define this macro if it is advisable to hold scalars in registers

in a wider mode than that declared by the program.  In such cases,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e5621..7c2a7b7e9e97 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -833,6 +833,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
 
+#undef TARGET_INVALID_WITHIN_DOLOOP

+#define TARGET_INVALID_WITHIN_DOLOOP arm_invalid_within_doloop
+
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
@@ -32937,6 +32940,27 @@ arm_ge_bits_access (void)

   return true;
 }
 
+/* NULL if INSN insn is valid within a low-overhead loop.

+   Otherwise return why doloop cannot be applied.  */
+
+static const char *
+arm_invalid_within_doloop (const rtx_insn *insn)
+{
+  if (!TARGET_HAVE_LOB)
+return default_invalid_within_doloop (insn);
+
+  if (CALL_P (insn))
+return "Function call in the loop.";
+
+  if (tablejump_p (insn, NULL, NULL) || computed_jump_p (insn))
+return "Computed branch in the loop.";
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, LR_REGNUM), insn))
+return "LR is used inside loop.";
+
+  return NULL;
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md

index b0d3bd1cf1c4..4aff1a0838d8 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1555,8 +1555,11 @@
   using a certain 'count' register and (2) the loop count can be
   adjusted by modifying this register prior to the loop.
   ??? The possible introduction of a new block to initialize the
-  new IV can potentially affect branch optimizations.  */
-   if (optimize > 0 && flag_modulo_sched)
+  new IV can potentially affect branch optimizations.
+
+  Also used to implement the low over head loops feature, which is part of
+  the Armv8.1-M Mainline Low Overhead Branch (LOB) extension.  */
+   if (optimize > 0 && (flag_modulo_sched || TARGET_HAVE_LOB))
{
  rtx s0;
  rtx bcomp;
@@ -1569,6 +1572,11 @@
FAIL;
 
  s0 = operands [0];

+
+ /* Low over head loop instructions require the first operand to be LR.  */
+ if (TARGET_HAVE_LOB)
+   s0 = gen_rtx_REG (SImode, LR_REGNUM);
+
  if (TARGET_THUMB2)
insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
  else
@@ -1650,3 +1658,30 @@
   "TARGET_HAVE_MVE"
   "lsrl%?\\t%Q0, %R0, %1"
   [(set_attr "predicable" "yes")])
+
+;; Originally expanded by 'doloop_end'.
+(define_insn "doloop_end_internal"

We usually prefer to name these patterns with a '*' in front to prevent the 
gen*

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-02-21 Thread Kyrill Tkachov


Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit it 
for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
   return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
 }
 
+#pragma GCC push_options

+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the main 
arm_neon.h file, right?
I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


 +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+

Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests

2020-02-21 Thread Kyrill Tkachov

On 2/21/20 11:51 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 2/19/20 4:27 PM, Mihail Ionescu wrote:

Hi Christophe,

On 01/23/2020 09:34 AM, Christophe Lyon wrote:
> On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu
>  wrote:
>>
>> Hi,
>>
>> This patch fixes the scalar shifts tests added in:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html
>> By adding mthumb and ensuring that the target supports
>> thumb2 instructions.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2020-01-20  Mihail-Calin Ionescu 
>>
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: 
Add mthumb and target check.
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: 
Likewise.

>>
>>
>> Is this ok for trunk?
>>
>
> Why not add a new entry in check_effective_target_arm_arch_FUNC_ok?
> (there are already plenty, including v8m_main for instance)
>

Sorry for the delay, we were going to add the check_effective_target
to the MVE framework patches and then update this one. But I came
across some big endian issues and decided to update this now.

I've added the target check and changed the patch so it also
disables the scalar shift patterns when generating big endian
code. At the moment they are broken because the MVE shift instructions
have the restriction of having an even gp register specified first,
followed by the odd one, which requires swapping the data twice in
big endian. In this case, the previous code gen is preferred.

*** gcc/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar
    shifts from being used on when big endian is enabled.

*** gcc/testsuite/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks.
    * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_v8_1m_mve_ok_nocache): New.
    (check_effective_target_arm_v8_1m_mve_ok): New.
    (add_options_for_v8_1m_mve): New.

Is this ok for trunk?

This is ok, but please do a follow up patch to add the new effective 
target check to sourcebuild.texi (I know, we tend to forget to do it!)

I should say that such a patch is pre-approved.

Thanks,

Kyrill

> Christophe
>
>>
>> Regards,
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply    
###

>>
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> index 
5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>
>>   long long longval1;
>>   long long unsigned longval2;
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> index 
a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok  } */
>>
>>   long long longval2;
>>   int intval2;
>>

Regards,
Mihail

Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests

2020-02-21 Thread Kyrill Tkachov

Hi Mihail,

On 2/19/20 4:27 PM, Mihail Ionescu wrote:

Hi Christophe,

On 01/23/2020 09:34 AM, Christophe Lyon wrote:
> On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu
>  wrote:
>>
>> Hi,
>>
>> This patch fixes the scalar shifts tests added in:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html
>> By adding mthumb and ensuring that the target supports
>> thumb2 instructions.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2020-01-20  Mihail-Calin Ionescu 
>>
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: Add 
mthumb and target check.
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: 
Likewise.

>>
>>
>> Is this ok for trunk?
>>
>
> Why not add a new entry in check_effective_target_arm_arch_FUNC_ok?
> (there are already plenty, including v8m_main for instance)
>

Sorry for the delay, we were going to add the check_effective_target
to the MVE framework patches and then update this one. But I came
across some big endian issues and decided to update this now.

I've added the target check and changed the patch so it also
disables the scalar shift patterns when generating big endian
code. At the moment they are broken because the MVE shift instructions
have the restriction of having an even gp register specified first,
followed by the odd one, which requires swapping the data twice in
big endian. In this case, the previous code gen is preferred.

*** gcc/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar
    shifts from being used on when big endian is enabled.

*** gcc/testsuite/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks.
    * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_v8_1m_mve_ok_nocache): New.
    (check_effective_target_arm_v8_1m_mve_ok): New.
    (add_options_for_v8_1m_mve): New.

Is this ok for trunk?

This is ok, but please do a follow up patch to add the new effective 
target check to sourcebuild.texi (I know, we tend to forget to do it!)

Thanks,

Kyrill

> Christophe
>
>>
>> Regards,
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply    
###

>>
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> index 
5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" 
} */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>
>>   long long longval1;
>>   long long unsigned longval2;
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> index 
a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" 
} */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok  } */
>>
>>   long long longval2;
>>   int intval2;
>>

Regards,
Mihail

Re: [Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-21 Thread Kyrill Tkachov


Hi Dennis,

On 2/11/20 12:03 PM, Dennis Zhang wrote:

Hi all,

On 16/12/2019 13:45, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the Arm Armv8.6-A CLI patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
> It also depends on the Armv8.6-A effective target checking patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
> It also depends on the ARMv8.6-A I8MM dot product patch for using the
> same builtin qualifier
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.
>
> This patch adds intrinsics for matrix multiply-accumulate operations
> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8.2-a.
>
> Is it OK for trunk please?
>


This is ok.

Thanks,

Kyrill




> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2019-12-10  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): 
New.

>  * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
>  * config/arm/iterators.md (MATMUL): New.
>  (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
>  (mmla_sfx): New.
>  * config/arm/neon.md (neon_mmlav16qi): New.
>  * config/arm/unspecs.md (UNSPEC_MATMUL_S): New.
>  (UNSPEC_MATMUL_U, UNSPEC_MATMUL_US): New.
>
> gcc/testsuite/ChangeLog:
>
> 2019-12-10  Dennis Zhang  
>
>  * gcc.target/arm/simd/vmmla_1.c: New test.

This patch has been updated according to the feedback on related AArch64
version at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01591.html

Regtested. OK to commit please?

Many thanks
Dennis

gcc/ChangeLog:

2020-02-11  Dennis Zhang  

    * config/arm/arm-builtins.c (USTERNOP_QUALIFIERS): New macro.
    * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, 
vusmmlaq_s32): New.

    * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
    * config/arm/iterators.md (MATMUL): New iterator.
    (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
    (mmla_sfx): New attribute.
    * config/arm/neon.md (neon_mmlav16qi): New.
    * config/arm/unspecs.md (UNSPEC_MATMUL_S, UNSPEC_MATMUL_U): New.
    (UNSPEC_MATMUL_US): New.

gcc/testsuite/ChangeLog:

2020-02-11  Dennis Zhang  

    * gcc.target/arm/simd/vmmla_1.c: New test.

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-02-21 Thread Kyrill Tkachov


Hi Delia,

On 2/19/20 5:23 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/31/20 3:23 PM, Delia Burduv wrote:
Here is the updated patch. The changes are minor, so let me know if 
there is anything else to fix or if it can be committed.


Thank you,
Delia

On 1/30/20 2:55 PM, Kyrill Tkachov wrote:

Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.
 


*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 

*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla 
and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches and I
will apply what is relevant to this patch as well. Particularly, I 
will
change the tests to use the exact input and output registers and I 
will

change the types of the rtl patterns.



Please send the updated patches so that someone can commit them for 
you once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and 
vfmat

> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns are
> defined in neon.md.
> Two new tests are added to check assembler output and lane indices.
>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I 
don't have

> commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon.h (vbfmmlaq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_lane_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_laneq_f32): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon_builtins.def (vbfmmla): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_lane): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_laneq): New.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ * config/arm/iterators.md (BF_MA): New int iterator.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (bt): New int attribute.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (VQXBF): Copy of VQX with V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V_HALF): Added V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ * config/arm/neon.md (neon_vbfmmlav8hi): New 
insn.

>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfmav8hi): New insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_lanev8hi): New insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma_laneqv8hi): New 
expand.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vget_high): Changed 
iterator to VQXBF.

>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAB): New UNSPEC.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12ï¿½ Delia Burduv 
>
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_1.c: New 
test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_2.c: New 
test.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_mmla_1.c: New 
test.


This looks good, a few minor things though...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
   return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
 }
 
+#pragma GCC push_options

+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmmlav8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmabv8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32

Re: [PATCH v2][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.

2020-02-17 Thread Kyrill Tkachov




On 2/14/20 4:34 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

In this patch (v2) all the review comments mentioned in previous patch 
(v1) are

addressed.

(v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01401.html

#

Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the single-precision 
arithmetic
operations for MVE. This changes are to support the MVE ACLE 
intrinsics which

operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?


Ok.

Thanks,

Kyrill



Thanks,
Srinath.

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Srinath Parvathaneni 

    * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
function to add

    emulator calls for dobule precision arithmetic operations for MVE.


### Attachment also inlined for ease of reply    
###



>From af9d1eb4470c26564b69518bbec3fce297501fdd Mon Sep 17 00:00:00 2001
From: Srinath Parvathaneni 
Date: Tue, 11 Feb 2020 18:42:20 +
Subject: [PATCH] [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework 
patch.


---
 gcc/config/arm/arm.c   | 22 ++-
 .../gcc.target/arm/mve/intrinsics/mve_libcall1.c   | 70 
++
 .../gcc.target/arm/mve/intrinsics/mve_libcall2.c   | 70 
++

 3 files changed, 159 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall2.c


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 037f298..e00024b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5754,9 +5754,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
   /* Values from double-precision helper functions are returned 
in core

  registers if the selected core only supports single-precision
  arithmetic, even if we are using the hard-float ABI.  The 
same is

-    true for single-precision helpers, but we will never be using the
-    hard-float ABI on a CPU which doesn't support single-precision
-    operations in hardware.  */
+    true for single-precision helpers except in case of MVE, 
because in
+    MVE we will be using the hard-float ABI on a CPU which 
doesn't support
+    single-precision operations in hardware.  In MVE the 
following check

+    enables use of emulation for the single-precision arithmetic
+    operations.  */
+  if (TARGET_HAVE_MVE)
+   {
+ add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+   }
   add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));
diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c

new file mode 100644
index 000..45f46b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c
@@ -0,0 +1,70 @@
+/* { dg-do compile  } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+float
+foo (float a, float b, float c)
+{
+  return a + b + c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fadd" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fadd" 2 } } */
+
+float
+foo1 (float a, float b, float c)
+{
+  return a - b - c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fsub" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fsub" 2 } } */
+
+float
+foo2 (float a, float b, float c)
+{
+  return a * b * c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fmul" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fmul" 2 } } */
+
+float
+foo3 (float b, float c)
+{
+  return b / c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fdiv" }  } */
+
+int
+foo4 (float b, float c)
+{
+

Re: [PATCH v2][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.

2020-02-17 Thread Kyrill Tkachov


Hi Srinath,

On 2/14/20 4:34 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

In this patch (v2) all the review comments mentioned in previous patch 
(v1) are

addressed.

(v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01395.html

#

Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application 
Program Status Register)
register and FPSCR (Floating-point Status and Control Register) 
register for MVE.

This patch also enables thumb2 mov RTL patterns for MVE.

A new feature bit vfp_base is added. This bit is enabled for all VFP, 
MVE and MVE with floating point
extensions. This bit is used to enable the macro TARGET_VFP_BASE. For 
all the VFP instructions, RTL patterns,
status and control registers are guarded by TARGET_HAVE_FLOAT. But 
this patch modifies that and the
common instructions, RTL patterns, status and control registers 
bewteen MVE and VFP are guarded by

TARGET_VFP_BASE macro.

The RTL pattern set_fpscr and get_fpscr are updated to use 
VFPCC_REGNUM because few MVE intrinsics

set/get carry bit of FPSCR register.

Please refer to Arm reference manual [1] for more details.
[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?


Ok (please test a big-endian target as well, as per the 1st framework 
patch).


Thanks,

Kyrill




Thanks,
Srinath
gcc/ChangeLog:

2020-20-11  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * common/config/arm/arm-common.c (arm_asm_auto_mfpu): When 
vfp_base
    feature bit is on and -mfpu=auto is passed as compiler option, 
do not
    generate error on not finding any match fpu. Because in this 
case fpu

    is not required.
    * config/arm/arm-cpus.in (vfp_base): Define feature bit, this 
bit is

    enabled for MVE and also for all VFP extensions.
    (VFPv2): Modify fgroup to enable vfp_base feature bit when 
ever VFPv2

    is enabled.
    (MVE): Define fgroup to enable feature bits mve, vfp_base and 
armv7em.
    (MVE_FP): Define fgroup to enable feature bits is fgroup MVE 
and FPv5

    along with feature bits mve_float.
    (mve): Modify add options in armv8.1-m.main arch for MVE.
    (mve.fp): Modify add options in armv8.1-m.main arch for MVE with
    floating point.
    * config/arm/arm.c (use_return_insn): Replace the
    check with TARGET_VFP_BASE.
    (thumb2_legitimate_index_p): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE.
    (arm_rtx_costs_internal): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"
    with TARGET_VFP_BASE, to allow cost calculations for copies in 
MVE as

    well.
    (arm_get_vfp_saved_size): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE, to allow space calculation for VFP registers 
in MVE

    as well.
    (arm_compute_frame_layout): Likewise.
    (arm_save_coproc_regs): Likewise.
    (arm_fixed_condition_code_regs): Modify to enable using 
VFPCC_REGNUM

    in MVE as well.
    (arm_hard_regno_mode_ok): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"

    with equivalent macro TARGET_VFP_BASE.
    (arm_expand_epilogue_apcs_frame): Likewise.
    (arm_expand_epilogue): Likewise.
    (arm_conditional_register_usage): Likewise.
    (arm_declare_function_name): Add check to skip printing .fpu 
directive
    in assembly file when TARGET_VFP_BASE is enabled and 
fpu_to_print is

    "softvfp".
    * config/arm/arm.h (TARGET_VFP_BASE): Define.
    * config/arm/arm.md (arch): Add "mve" to arch.
    (eq_attr "arch" "mve"): Enable on TARGET_HAVE_MVE is true.
    (vfp_pop_multiple_with_writeback): Replace "TARGET_HARD_FLOAT
    || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE.
    * config/arm/constraints.md (Uf): Define for MVE.
    * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Modify 
target guard

    to not allow for MVE.
    * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Move to volatile 
unspecs

    enum.
    (VUNSPEC_GET_FPSCR): Define.
    * config/arm/vfp.md (thumb2_movhi_vfp): Add support for VMSR 
and VMRS
    instructions which move to general-purpose Register from 
Floating-point

    Special register and vice-versa.
    (thumb2_movhi_fp16): Likewise.
    (thumb2_movsi_vfp): Add support for VMSR and VMRS instructions 
along
    with MCR and MRC instructions which set and get Floating-point 
Status

    and Control Register (FPSCR).
    (movdi_vfp): Modify pattern to enable Single-precision scalar 
float move

    in MVE.
    (thumb2_movdf_vfp): Modify pattern to enable Double-precision 
scalar

    float move patterns in MVE.
    (thumb2_movsfcc_vfp): Modify pattern to enable single float 
conditional
    code move patterns of VFP also in MVE by adding 
TARGET_VFP_BASE

Re: [PATCH v2][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.

2020-02-17 Thread Kyrill Tkachov

Hi Srinath,

On 2/14/20 4:26 PM, Srinath Parvathaneni wrote:

Hi Kyrill,

> This patch series depends on upstream patches "Armv8.1-M Mainline 
Security Extension" [4],
> "CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] 
and "support for Armv8.1-M

> Mainline scalar shifts" [6].

Patch (version v1) was approved before. The above patches on which 
this patch (version v1) depends are

committed to trunk last month.

(version v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01338.html

This patch (Version v2) is re-based on latest trunk resolving few 
conflicts.

Regression tested on arm-none-eabi and found no regressions.

Can you please also test armeb-none-eabi to make sure big-endian works.

Ok for trunk? If ok, please commit on my behalf. I don't have the 
commit rights.

Ok, thanks.

Please apply for a commit access using the form at 
https://sourceware.org/cgi-bin/pdw/ps_form.cgi using my name/email as 
approver.

More details at https://gcc.gnu.org/gitwrite.html

Then you can commit them yourself :)

Thanks,

Kyrill

Thanks,
Srinath

##

Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE 
intrinsics.

Header file arm_mve.h is added to source code, which contains the 
definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md 
is also added which contains the

RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated 
patterns. A new register class "VPR_REG"

is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move 
patterns. The prefix of neon functions

which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to 
simd_immediate_valid_for_move.

In the patch standard patterns mve_move, mve_store and move_load for 
MVE are added and neon.md and vfp.md

files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2020-02-10  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config.gcc (arm_mve.h): Include mve intrinsics header file.
    * config/arm/aout.h (p0): Add new register name for MVE predicated
    cases.
    * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define 
macro

    common to Neon and MVE.
    (ARM_BUILTIN_NEON_LANE_CHECK): Renamed to 
ARM_BUILTIN_SIMD_LANE_CHECK.

    (arm_init_simd_builtin_types): Disable poly types for MVE.
    (arm_init_neon_builtins): Move a check to arm_init_builtins 
function.

    (arm_init_builtins): Use ARM_BUILTIN_SIMD_LANE_CHECK instead of
    ARM_BUILTIN_NEON_LANE_CHECK.
    (mve_dereference_pointer): Add function.
    (arm_expand_builtin_args): Call to mve_dereference_pointer 
when MVE is

    enabled.
    (arm_expand_neon_builtin): Moved to arm_expand_builtin function.
    (arm_expand_builtin): Moved from arm_expand_neon_builtin function.
    * config/arm/arm-c.c (__ARM_FEATURE_MVE): Define macro for MVE 
and MVE

    with floating point enabled.
    * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
Renamed to

    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Renamed from
    neon_immediate_valid_for_move function.
    * config/arm/arm.c (arm_options_perform_arch_sanity_checks): 
Generate

    error if vfpv2 feature bit is disabled and mve feature bit is also
    disabled for HARD_FLOAT_ABI.
    (use_return_insn): Check to not push VFP regs for MVE.
    (aapcs_vfp_allocate): Add MVE check to have same Procedure 
Call Standard

    as Neon.
    (aapcs_vfp_allocate_return_reg): Likewise.
    (thumb2_legitimate_address_p): Check to return 0 on valid Thumb-2
    address operand for MVE.
    (arm_rtx_costs_internal): MVE check to determine cost of rtx.
    (neon_valid_immediate): Rename to simd_valid_immediate.
    (simd_valid_immediate): Rename from neon_valid_immediate.
    (simd_valid_immediate): MVE check on size of vector is 128 bits.
    (neon_immediate_valid_for_move): Rename to
    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Rename from
    neon_immediate_valid_for_move.
    (neon_immediate_valid_for_logic): Modify call to 
neon_valid_immediate

    function.
    (neon_make_constant): Modify call to neon_valid_immediate 
function.
    (neon_vector_mem_operand): Return VFP register for POST_INC or 
PRE_DEC

    for MVE.
    (output_move_neon): Add MVE check to generate vldm/vstm 
instrcutions.
    (arm_compute_frame_layout):

Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-02-11 Thread Kyrill Tkachov


Hi Stam,

On 2/10/20 1:35 PM, Stam Markianos-Wright wrote:



On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
>
>
> On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>>
>> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:


 On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>
>
> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot 
product

>> operations (vector/by element) to the ARM back-end.
>>
>> These are:
>> usdot (vector), dot (by element).
>>
>> The functions are optional from ARMv8.2-a as 
-march=armv8.2-a+i8mm and

>> for ARM they remain optional as of ARMv8.6-a.
>>
>> The functions are declared in arm_neon.h, RTL patterns are 
defined to
>> generate assembler and tests are added to verify and perform 
adequate checks.

>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> for ARM CLI updates, and on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for testsuite effective_target update.
>>
>> Ok for trunk?
>

 New diff addressing review comments from Aarch64 version of the 
patch.


 _Change of order of operands in RTL patterns.
 _Change tests to use check-function-bodies, compile with 
optimisation and

 check for exact registers.
 _Rename tests to remove "-compile-" in filename.

>>>
> .Ping!

Ping :)

Diff re-attached in this ping email is same as the one posted on 10/01

Thank you!



Sorry for the delay.

This is ok.

Thanks,

Kyrill



> .
>>>
>>> Cheers,
>>> Stam
>>>
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> PS. I don't have commit rights, so if someone could commit on 
my behalf,

>> that would be great :)
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-28  Stam Markianos-Wright 
>>
>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>  (USTERNOP_QUALIFIERS): New define.
>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (arm_expand_builtin_args):
>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>  * config/arm/arm_neon.h (vusdot_s32): New.
>>  (vusdot_lane_s32): New.
>>  (vusdotq_lane_s32): New.
>>  (vsudot_lane_s32): New.
>>  (vsudotq_lane_s32): New.
>>  * config/arm/arm_neon_builtins.def
>> (usdot,usdot_lane,sudot_lane): New.
>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>  (sup, opsuffix): Add .
>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12  Stam Markianos-Wright 
>>
>>  * gcc.target/arm/simd/vdot-2-1.c: New test.
>>  * gcc.target/arm/simd/vdot-2-2.c: New test.
>>  * gcc.target/arm/simd/vdot-2-3.c: New test.
>>  * gcc.target/arm/simd/vdot-2-4.c: New test.
>>
>>

Re: [GCC][PATCH][ARM] Regenerate arm-tables.opt for Armv8.1-M patch

2020-02-06 Thread Kyrill Tkachov




On 2/3/20 5:18 PM, Mihail Ionescu wrote:

Hi all,

I've regenerated arm-tables.opt in config/arm to replace the improperly
generated arm-tables.opt file from "[PATCH, GCC/ARM, 2/10] Add command
line support for Armv8.1-M Mainline" 
(9722215a027b68651c3c7a8af9204d033197e9c0).



2020-02-03  Mihail Ionescu  

    * config/arm/arm-tables.opt: Regenerate.

Ok for trunk?



Ok. I would consider it obvious too.

Thanks,

Kyrill




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 
f295a4cffa2bbb3f8163fb9cef784b5af59aee12..a51a131505d184f120a3cfc51273b419bb0cb103 
100644

--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -353,13 +353,16 @@ EnumValue
 Enum(arm_arch) String(armv8-m.main) Value(28)

 EnumValue
-Enum(arm_arch) String(armv8.1-m.main) Value(29)
+Enum(arm_arch) String(armv8-r) Value(29)

 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(30)
+Enum(arm_arch) String(armv8.1-m.main) Value(30)

 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(31)
+Enum(arm_arch) String(iwmmxt) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(32)

 Enum
 Name(arm_fpu) Type(enum fpu_type)

Re: [GCC][PATCH][ARM] Set profile to M for Armv8.1-M

2020-02-06 Thread Kyrill Tkachov




On 2/4/20 1:49 PM, Christophe Lyon wrote:
On Mon, 3 Feb 2020 at 18:20, Mihail Ionescu 
 wrote:

>
> Hi,
>
> We noticed that the profile for armv8.1-m.main was not set in 
arm-cpus.in

> , which led to TARGET_ARM_ARCH_PROFILE and _ARM_ARCH_PROFILE not being
> defined properly.
>
>
>
> gcc/ChangeLog:
>
> 2020-02-03  Mihail Ionescu 
>
> * config/arm/arm-cpus.in: Set profile M
> for armv8.1-m.main.
>
>
> Ok for trunk?
>
> Regards,
> Mihail
>
>
> ### Attachment also inlined for ease of reply    
###

>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
1805b2b1cd8d6f65a967b4e3945257854a7e0fc1..96f584da325172bd1460251e2de0ad679589d312 
100644

> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -692,6 +692,7 @@ begin arch armv8.1-m.main
>   tune for cortex-m7
>   tune flags CO_PROC
>   base 8M_MAIN
> + profile M
>   isa ARMv8_1m_main
>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>   option dsp add armv7em
>

I'm wondering whether this is obvious?
OTOH, what's the impact of missing this (or why didn't we notice the
problem via a failing testcase?)


It's only used to set the __ARM_ARCH_PROFILE macro in arm-c.c

I do agree that the patch is obvious, so go ahead and commit this 
please, Mihail.


Thanks,

Kyrill





Christophe

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-01-30 Thread Kyrill Tkachov


Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.

*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 
*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla 
and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches and I
will apply what is relevant to this patch as well. Particularly, I will
change the tests to use the exact input and output registers and I will
change the types of the rtl patterns.



Please send the updated patches so that someone can commit them for you 
once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat
> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns are
> defined in neon.md.
> Two new tests are added to check assembler output and lane indices.
>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't 
have

> commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-11-12  Delia Burduv 
>
>  * config/arm/arm_neon.h (vbfmmlaq_f32): New.
>    (vbfmlalbq_f32): New.
>    (vbfmlaltq_f32): New.
>    (vbfmlalbq_lane_f32): New.
>    (vbfmlaltq_lane_f32): New.
>      (vbfmlalbq_laneq_f32): New.
>    (vbfmlaltq_laneq_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfmmla): New.
>    (vbfmab): New.
>    (vbfmat): New.
>    (vbfmab_lane): New.
>    (vbfmat_lane): New.
>    (vbfmab_laneq): New.
>    (vbfmat_laneq): New.
>   * config/arm/iterators.md (BF_MA): New int iterator.
>    (bt): New int attribute.
>    (VQXBF): Copy of VQX with V8BF.
>    (V_HALF): Added V8BF.
>    * config/arm/neon.md (neon_vbfmmlav8hi): New insn.
>    (neon_vbfmav8hi): New insn.
>    (neon_vbfma_lanev8hi): New insn.
>    (neon_vbfma_laneqv8hi): New expand.
>    (neon_vget_high): Changed iterator to VQXBF.
>  * config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
>    (UNSPEC_BFMAB): New UNSPEC.
>    (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12  Delia Burduv 
>
>      * gcc.target/arm/simd/bf16_ma_1.c: New test.
>      * gcc.target/arm/simd/bf16_ma_2.c: New test.
>      * gcc.target/arm/simd/bf16_mmla_1.c: New test.

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-30 Thread Kyrill Tkachov




On 1/30/20 2:42 PM, Stam Markianos-Wright wrote:



On 1/28/20 10:35 AM, Kyrill Tkachov wrote:

Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this 
approved

first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra 
#defines

from the test.



This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c 
b/gcc/testsuite/gcc.target/arm/pr91816.c

new file mode 100644
index 
..757c897e9c0db32709227b3fdf1b4a8033428232

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */
+int printf(const char *, ...);
+

I think this needs a couple of effective target checks like 
arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm 
that add -mthumb to the options.


Hmm, looking back at this now, is there any reason why it can't just be:

/* { dg-do compile } */
/* { dg-require-effective-target arm_thumb2_ok } */
/* { dg-additional-options "-mthumb" }  */

were we don't override the march or fpu options at all, but just use 
`require-effective-target arm_thumb2_ok` to make sure that thumb2 is 
supported?


The attached new diff does just that.



Works for me, there are plenty of configurations run with fpu that it 
should get the right coverage.


Ok (make sure commit the updated, if needed, ChangeLog as well)

Thanks!

Kyrill



Cheers :)

Stam.



Thanks,
Kyrill

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-28 Thread Kyrill Tkachov


Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra #defines
from the test.



This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c 
b/gcc/testsuite/gcc.target/arm/pr91816.c
new file mode 100644
index 
..757c897e9c0db32709227b3fdf1b4a8033428232
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */
+int printf(const char *, ...);
+

I think this needs a couple of effective target checks like arm_hard_vfp_ok and 
arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the 
options.

Thanks,
Kyrill

Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline

2020-01-17 Thread Kyrill Tkachov




On 12/18/19 1:23 PM, Mihail Ionescu wrote:



Hi Kyrill,

On 12/11/2019 05:50 PM, Kyrill Tkachov wrote:
> Hi Mihail,
>
> On 11/14/19 1:54 PM, Mihail Ionescu wrote:
>> Hi,
>>
>> This patch adds the new scalar shift instructions for Armv8.1-M
>> Mainline to the arm backend.
>> This patch is adding the following instructions:
>>
>> ASRL (reg)
>> LSLL (reg)
>>
>
> Sorry for the delay, very busy time for GCC development :(
>
>
>>
>> ChangeLog entry are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>>
>> 2019-11-14  Mihail-Calin Ionescu 
>> 2019-11-14  Sudakshina Das 
>>
>>     * config/arm/arm.h (TARGET_MVE): New macro for MVE support.
>
>
> I don't see this hunk in the patch... There's a lot of v8.1-M-related
> patches in flight. Is it defined elsewhere?

Thanks for having a look at this.
Yes, I forgot to remove that bit from the ChangeLog and mention that the
patch depends on the Armv8.1-M MVE CLI --
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.htm which introduces
the required TARGET_* macros needed. I've updated the ChangeLog to
reflect that:

*** gcc/ChangeLog ***


2019-12-18  Mihail-Calin Ionescu 
2019-12-18  Sudakshina Das  

    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for 
TARGET_HAVE_MVE.

    (ashrdi3): Generate thumb2_asrl for TARGET_HAVE_MVE.
    * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
    register pairs for doubleword quantities for ARMv8.1M-Mainline.
    * config/arm/thumb2.md (thumb2_asrl): New.
    (thumb2_lsll): Likewise.

>
>
>>     * config/arm/arm.md (ashldi3): Generate thumb2_lsll for
>> TARGET_MVE.
>>     (ashrdi3): Generate thumb2_asrl for TARGET_MVE.
>>     * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
>>     register pairs for doubleword quantities for ARMv8.1M-Mainline.
>>     * config/arm/thumb2.md (thumb2_asrl): New.
>>     (thumb2_lsll): Likewise.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2019-11-14  Mihail-Calin Ionescu 
>> 2019-11-14  Sudakshina Das 
>>
>>     * gcc.target/arm/armv8_1m-shift-reg_1.c: New test.
>>
>> Testsuite shows no regression when run for arm-none-eabi targets.
>>
>> Is this ok for trunk?
>>
>> Thanks
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply
>> ###
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index
>> 
be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 


>> 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno,
>> machine_mode mode)
>>
>>    /* We allow almost any value to be stored in the general registers.
>>   Restrict doubleword quantities to even register pairs in ARM 
state

>> - so that we can use ldrd.  Do not allow very large Neon structure
>> - opaque modes in general registers; they would use too many.  */
>> + so that we can use ldrd and Armv8.1-M Mainline instructions.
>> + Do not allow very large Neon structure opaque modes in general
>> + registers; they would use too many.  */
>
>
> This comment now reads:
>
> "Restrict doubleword quantities to even register pairs in ARM state
>   so that we can use ldrd and Armv8.1-M Mainline instructions."
>
> Armv8.1-M Mainline is not ARM mode though, so please clarify this
> comment further.
>
> Looks ok to me otherwise (I may even have merged this with the second
> patch, but I'm not complaining about keeping it simple :) )
>
> Thanks,
>
> Kyrill
>

I've now updated the comment to read:
"Restrict doubleword quantities to even register pairs in ARM state
so that we can use ldrd. The same restriction applies for MVE."



Ok.

Thanks,

Kyril




Regards,
Mihail

>
>>    if (regno <= LAST_ARM_REGNUM)
>>  {
>>    if (ARM_NUM_REGS (mode) > 4)
>>  return false;
>>
>> -  if (TARGET_THUMB2)
>> +  if (TARGET_THUMB2 && !TARGET_HAVE_MVE)
>>  return true;
>>
>>    return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1)
>> != 0);
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index
>> 
a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 


>> 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -3503,6 +3503,22 @@
>>     (match_ope

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-17 Thread Kyrill Tkachov

Hi Richard, Wilco,

On 1/17/20 8:43 AM, Richard Sandiford wrote:

Wilco Dijkstra  writes:
> Testing shows the setting of 32:16 for jump alignment has a 
significant codesize
> cost, however it doesn't make a difference in performance. So set 
jump-align

> to 4 to get 1.6% codesize improvement.

I was leaving this to others in case it was obvious to them.  On the
basis that silence suggests it wasn't, :-) could you go into more details?
Is it expected on first principles that jump alignment doesn't matter
for Neoverse N1, or is this purely based on experimentation?  If it's
expected, are we sure that the other "32:16" entries are still worthwhile?
When you say it doesn't make a difference in performance, does that mean
that no individual test's performance changed significantly, or just that
the aggregate score didn't?  Did you experiment with anything inbetween
the current 32:16 and 4, such as 32:8 or even 32:4?

Sorry for dragging my feet on this one, as I put in those numbers last 
year and I've been trying to recall my experiments from then.

The Neoverse N1 Software Optimization guide recommends aligning branch 
targets to 32 bytes withing the bounds of code density requirements.

From my benchmarking last year I do seem to remember function and loop 
alignment to matter.

I probably added the jump alignment for completeness as it's a good idea 
from first principles. But if the code size hit is too large we could 
look to decrease it.

I'd also be interested in seeing the impact of 32:8 and 32:4.

Thanks,

Kyrill

The problem with applying the patch only with the explanation above is
that if someone in future has evidence that jump alignment can make a
difference for their testcase, it would be very hard for them to
reproduce the reasoning that led to this change.

Thanks,
Richard

> OK for commit?
>
> ChangeLog
> 2019-12-24  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (neoversen1_tunings): Set jump_align to 4.
>
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
1646ed1d9a3de8ee2f0abff385a1ea145e234475..209ed8ebbe81104d9d8cff0df31946ab7704fb33 
100644

> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1132,7 +1132,7 @@ static const struct tune_params 
neoversen1_tunings =

>    3, /* issue_rate  */
>    (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* 
fusible_ops  */

>    "32:16",/* function_align.  */
> -  "32:16",/* jump_align.  */
> +  "4",/* jump_align.  */
>    "32:16",/* loop_align.  */
>    2,/* int_reassoc_width.  */
>    4,/* fp_reassoc_width.  */

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2020-01-14 Thread Kyrill Tkachov




On 1/14/20 1:50 PM, Christophe Lyon wrote:

On Mon, 13 Jan 2020 at 14:49, Kyrill Tkachov
 wrote:

Hi Christophe,

On 12/17/19 3:31 PM, Kyrill Tkachov wrote:

On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;

This is a poor name in the context of the function as a whole.
What's
not supported.  Please think of a better name so that I have some
idea
what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?

The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?

Yes, I think that is simplest correct change to make.


Can you also send a patch to the changes.html page for GCC 10 making
users aware that this restriction is now lifted?


Sure. I should have thought of it when I submitted the GCC patch...

How about the attached? I'm not sure about the right upper/lower case
and  markers

Thanks,

Christophe


commit ba2a354c9ed6c75ec00bf21dd6938b89a113a96e
Author: Christophe Lyon
Date:   Tue Jan 14 13:48:19 2020 +

[arm] Document -mpure-code support for v6m in gcc-10

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index caa9df7..26cdf66 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -417,7 +417,11 @@ a work-in-progress.
   data-processing intrinsics to include 32-bit SIMD, saturating arithmetic,
   16-bit multiplication and other related intrinsics aimed at DSP algorithm
   optimization.
-   
+  
+  Support for -mpure-code in Thumb-1 (v6m) has been
+  added: this M-profile feature is no longer restricted to targets
+  with MOTV. For instance, Cortex-M0 is now
+  supported

Typo in MOVT.
Let's make the last sentence. "For example, -mcpu=cortex-m0 now 
supports this option."

Ok with those changes.
Thanks,
Kyrill

 
 
 AVR

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2020-01-13 Thread Kyrill Tkachov




On 12/18/19 1:26 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 12/17/2019 10:26 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 11/12/2019 09:55 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM 
instruction

when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the 
register

list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple 
instruction pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register 
clearing if

    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile 
unspec.


*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise. Restrict checks 
for Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm-protos.h 
b/gcc/config/arm/arm-protos.h
index

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2020-01-13 Thread Kyrill Tkachov


Hi Christophe,

On 12/17/19 3:31 PM, Kyrill Tkachov wrote:


On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;
This is a poor name in the context of the function as a whole.  
What's
not supported.  Please think of a better name so that I have some 
idea

what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?


The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?


Yes, I think that is simplest correct change to make.



Can you also send a patch to the changes.html page for GCC 10 making 
users aware that this restriction is now lifted?


Thanks,

Kyrill



Thanks,

Kyrill



Christophe


Thanks,

Kyrill




Thanks,

Christophe


Thanks,

Christophe


R.

Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]

2020-01-13 Thread Kyrill Tkachov


Hi Stam,

On 1/10/20 6:47 PM, Stam Markianos-Wright wrote:

Hi all,

This patch is part 2 of Bfloat16_t enablement in the ARM back-end.

This new type is constrained using target hooks TARGET_INVALID_CONVERSION,
TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only 
be used

through ACLE intrinsics (will be provided in later patches).

Regression testing on arm-none-eabi passed successfully.

Ok for trunk?



Ok.

Thanks,

Kyrill




Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 





gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * config/arm/arm.c
    (arm_invalid_conversion): New function for target hook.
    (arm_invalid_unary_op): New function for target hook.
    (arm_invalid_binary_op): New function for target hook.

2020-01-10  Stam Markianos-Wright 

    * gcc.target/arm/bfloat16_scalar_typecheck.c: New test.
    * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test.
    * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test.

Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

2020-01-13 Thread Kyrill Tkachov


Hi Stam,

On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:

Hi all,

This is a respin of patch:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

which has now been split into two (similar to the Aarch64 version).

This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
It also adds a new machine_mode (BFmode) for this type and 
accompanying Vector

modes V4BFmode and V8BFmode.

The second patch in this series uses existing target hooks to restrict 
type use.


Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Ok for trunk?


This is ok, thanks.

You can commit it once the git conversion goes through :)

Kyrill




Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 





gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * config.gcc: Add arm_bf16.h.
    * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix 
comment.

    (arm_simd_builtin_std_type): Add BFmode.
    (arm_init_simd_builtin_types): Define element types for vector 
types.

    (arm_init_bf16_types):  New function.
    (arm_init_builtins): Add arm_init_bf16_types function call.
    * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector 
modes.

    * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
    * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
    (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
    (arm_vector_mode_supported_p): Add V4BF, V8BF.
    (arm_mangle_type):
    * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
  VALID_NEON_QREG_MODE respectively. Add export 
arm_bf16_type_node,

  arm_bf16_ptr_type_node.
    * config/arm/arm.md: New enabled_for_bfmode_scalar,
  enabled_for_bfmode_vector attributes. Add BFmode to movhf 
expand.

  pattern and define_split between ARM registers.
    * config/arm/arm_bf16.h: New file.
    * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
    * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, 
fporbf): New.

  (VQXMOV): Add V8BF.
    * config/arm/neon.md: Add BF vector types to NEON move patterns.
    * config/arm/vfp.md: Add BFmode to movhf patterns.

gcc/testsuite/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
    * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
    * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_4.c: New test.
    * gcc.target/arm/bfloat16_simd_1_1.c: New test.
    * gcc.target/arm/bfloat16_simd_1_2.c: New test.
    * gcc.target/arm/bfloat16_simd_2_1.c: New test.
    * gcc.target/arm/bfloat16_simd_2_2.c: New test.
    * gcc.target/arm/bfloat16_simd_3_1.c: New test.
    * gcc.target/arm/bfloat16_simd_3_2.c: New test.

Re: [Patch 0/X] HWASAN v3

2020-01-10 Thread Kyrill Tkachov

On 1/8/20 11:26 AM, Matthew Malcomson wrote:

Hi everyone,

I'm writing this email to summarise & publicise the state of this patch
series, especially the difficulties around approval for GCC 10 mentioned
on IRC.

The main obstacle seems to be that no maintainer feels they have enough
knowledge about hwasan and justification that it's worthwhile to approve
the patch series.

Similarly, Martin has given a review of the parts of the code he can
(thanks!), but doesn't feel he can do a deep review of the code related
to the RTL hooks and stack expansion -- hence that part is as yet not
reviewed in-depth.

The questions around justification raised on IRC are mainly that it
seems like a proof-of-concept for MTE rather than a stand-alone useable
sanitizer.  Especially since in the GNU world hwasan instrumented code
is not really ready for production since we can only use the
less-"interceptor ABI" rather than the "platform ABI".  This restriction
is because there is no version of glibc with the required modifications
to provide the "platform ABI".

(n.b. that since https://reviews.llvm.org/D69574 the code-generation for
these ABI's is the same).

 From my perspective the reasons that make HWASAN useful in itself are:

1) Much less memory usage.

 From a back-of-the-envelope calculation based on the hwasan paper's
table of memory overhead from over-alignment
https://arxiv.org/pdf/1802.09517.pdf I guess hwasan instrumented code
has an overhead of about 1.1x (~4% from overalignment and ~6.25% from
shadow memory), while asan seems to have an overhead somewhere in the
range 1.5x - 3x.

Maybe there's some data out there comparing total overheads that I
haven't found? (I'd appreciate a reference if anyone has that info).

2) Available on more architectures that MTE.

HWASAN only requires TBI, which is a feature of all AArch64 machines,
while MTE will be an optional extension and only available on certain
architectures.

3) This enables using hwasan in the kernel.

While instrumented user-space applications will be using the
"interceptor ABI" and hence are likely not production-quality, the
biggest aim of implementing hwasan in GCC is to allow building the Linux
kernel with tag-based sanitization using GCC.

Instrumented kernel code uses hooks in the kernel itself, so this ABI
distinction is no longer relevant, and this sanitizer should produce a
production-quality kernel binary.

I'm hoping I can find a maintainer willing to review and ACK this patch
series -- especially with stage3 coming to a close soon.  If there's
anything else I could do to help get someone willing up-to-speed then
please just ask.

FWIW I've reviewed the aarch64 parts over the lifetime of the patch 
series and I am okay with them.

Given the reviews of the sanitiser, library and aarch64 backend 
components, and the data at

https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00387.html

how can we move forward with commit approval ? Is this something a 
global reviewer can help with, Jeff ? :)

Thanks,

Kyrill

Cheers,
Matthew

On 07/01/2020 15:14, Martin Liška wrote:
> On 12/12/19 4:18 PM, Matthew Malcomson wrote:
>
> Hello.
>
> I've just sent few comments that are related to the v3 of the patch set.
> Based on the HWASAN (limited) knowledge the patch seems reasonable 
to me.

> I haven't looked much at the newly introduced RTL-hooks.
> But these seems to me isolated to the aarch64 port.
>
> I can also verify that the patchset works on my aarch64 linux 
machine and

> hwasan.exp and asan.exp tests succeed.
>
>> I haven't gotten ASAN_MARK to print as HWASAN_MARK when using memory
>> tagging,
>> since I'm not sure the way I found to implement this would be
>> acceptable.  The
>> inlined patch below works but it requires a special declaration
>> instead of just
>> an ~#include~.
>
> Knowing that, I would not bother with the printing of HWASAN_MARK.
>
> Thanks for the series,
> Martin

[PATCH][wwwdocs] GCC 10 changes.html for arm and aarch64

2020-01-10 Thread Kyrill Tkachov


Hi all,

This patch adds initial entries for notable features that went in to GCC 
10 on the arm and aarch64 front.
The list is by no means complete so if you'd like your contribution 
called please shout or post a follow-up patch.

It is, nevertheless, a decent start at the relevant sections in changes.html

Thanks,
Kyrill

commit b539d38b322883ed5aa6563ac879af6a5ebabd96
Author: Kyrylo Tkachov 
Date:   Thu Nov 7 17:58:45 2019 +

[arm/aarch64] GCC 10 changes.html

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index d6108269..8f498017 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -322,17 +322,102 @@ a work-in-progress.
 
 New Targets and Target Specific Improvements
 
-
+AArch64  arm
+
+  The AArch64 and arm ports now support condition flag output constraints
+  in inline assembly, as indicated by the __GCC_ASM_FLAG_OUTPUTS__.
+  On arm this feature is only available for A32 and T32 targets.
+  Please refer to the documentation for more details. 
+
+
+AArch64
+
+   The -mbranch-protection=pac-ret option now accepts the
+  optional argument +b-key extension to perform return address
+  signing with the B-key instead of the A-key.
+  
+  The Transactional Memory Extension is now supported through ACLE
+  intrinsics.  It can be enabled through the +tme option
+  extension (for example, -march=armv8.5-a+tme).
+  
+  Initial autovectorization support for SVE2 has been added and can be
+  enabled through the   +sve2 option extension (for example,
+  -march=armv8.5-a+sve2).  Additional extensions can be enabled
+  through +sve2-sm4, +sve2=aes,
+  +sve2-sha3, +sve2-bitperm.
+  
+   A number of features from the Armv8.5-a are now supported through ACLE
+  intrinsics.  These include:
+
+	The random number instructions that can be enabled
+	through the (already present in GCC 9.1) +rng option
+	extension.
+	Floating-point intrinsics to round to integer instructions from
+	Armv8.5-a when targeting -march=armv8.5-a or later.
+	Memory Tagging Extension intrinsics enabled through the
+	+memtag option extension.
+
+  
+   The option -moutline-atomics has been added to aid
+  deployment of the Large System Extensions (LSE) on GNU/Linux systems built
+  with a baseline architecture targeting Armv8-A.  When the option is
+  specified code is emitted to detect the presence of LSE instructions at
+  runtime and use them for standard atomic operations.
+  For more information please refer to the documentation.
+  
+  
+   Support has been added for the following processors
+   (GCC identifiers in parentheses):
+   
+ Arm Cortex-A77 (cortex-a77).
+	 Arm Cortex-A76AE (cortex-a76ae).
+	 Arm Cortex-A65 (cortex-a65).
+	 Arm Cortex-A65AE (cortex-a65ae).
+	 Arm Cortex-A34 (cortex-a34).
+   
+   The GCC identifiers can be used
+   as arguments to the -mcpu or -mtune options,
+   for example: -mcpu=cortex-a77 or
+   -mtune=cortex-a65ae or as arguments to the equivalent target
+   attributes and pragmas.
+  
+
 
 
 
-ARM
+arm
 
   Support for the FDPIC ABI has been added. It uses 64-bit
   function descriptors to represent pointers to functions, and enables
   code sharing on MMU-less systems. The corresponding target triple is
   arm-uclinuxfdpiceabi, and the C library is uclibc-ng.
   
+  Support has been added for the Arm EABI on NetBSD through the
+  arm*-*-netbsdelf-*eabi* triplet.
+  
+  The handling of 64-bit integer operations has been significantly reworked
+  and improved leading to improved performance and reduced stack usage when using
+  64-bit integral data types.  The option -mneon-for-64bits is now
+  deprecated and will be removed in a future release.
+  
+   Support has been added for the following processors
+   (GCC identifiers in parentheses):
+   
+ Arm Cortex-A77 (cortex-a77).
+	 Arm Cortex-A76AE (cortex-a76ae).
+	 Arm Cortex-M35P (cortex-m35p).
+   
+   The GCC identifiers can be used
+   as arguments to the -mcpu or -mtune options,
+   for example: -mcpu=cortex-a77 or
+   -mtune=cortex-m35p.
+  
+  Support has been extended for the ACLE
+  https://developer.arm.com/docs/101028/0009/data-processing-intrinsics;>
+  data-processing intrinsics to include 32-bit SIMD, saturating arithmetic,
+  16-bit multiplication and other related intrinsics aimed at DSP algorithm
+  optimization.
+

Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-12-20 Thread Kyrill Tkachov

Hi Dennis,

On 12/12/19 5:30 PM, Dennis Zhang wrote:

Hi all,

On 22/11/2019 14:33, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It enables options including -march=armv8.6-a, +i8mm and +bf16.
> The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
> Documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8-a.
>

This is an update to rebase the patch to the top.
Some issues are fixed according to the recent CLI patch for AArch64.
ChangeLog is updated as following:

gcc/ChangeLog:

2019-12-12  Dennis Zhang  

    * config/arm/arm-c.c (arm_cpu_builtins): Define
    __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
    __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
    __ARM_BF16_FORMAT_ALTERNATIVE when enabled.
    * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
    * config/arm/arm-tables.opt: Regenerated.
    * config/arm/arm.c (arm_option_reconfigure_globals): Initialize
    arm_arch_i8mm and arm_arch_bf16 when enabled.
    * config/arm/arm.h (TARGET_I8MM): New macro.
    (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
    * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
    * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
    * config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
    (v8_6_a_simd_variants): New.
    (v8_*_a_simd_variants): Add i8mm and bf16.
    * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-12-12  Dennis Zhang  

    * gcc.target/arm/multilib.exp: Add combination tests for 
armv8.6-a.

Is it OK for trunk?

This is ok for trunk.

Please follow the steps at https://gcc.gnu.org/svnwrite.html to get 
write permission to the repo (listing me as approver).

You can then commit it yourself :)

Thanks,

Kyrill

Many thanks!
Dennis

Re: [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with binary operand.

vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16,
vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, 
vcreateq_f32.


Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


In this patch new constraint "Rd" is added, which checks the constant 
is with in the range of 1 to 16.
Also a new predicate "mve_imm_16" is added, to check the the matching 
constraint Rd.


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): 
Define

    qualifier for binary operands.
    (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vsubq_n_f16): Define macro.
    (vsubq_n_f32): Likewise.
    (vbrsrq_n_f16): Likewise.
    (vbrsrq_n_f32): Likewise.
    (vcvtq_n_f16_s16): Likewise.
    (vcvtq_n_f32_s32): Likewise.
    (vcvtq_n_f16_u16): Likewise.
    (vcvtq_n_f32_u32): Likewise.
    (vcreateq_f16): Likewise.
    (vcreateq_f32): Likewise.
    (__arm_vsubq_n_f16): Define intrinsic.
    (__arm_vsubq_n_f32): Likewise.
    (__arm_vbrsrq_n_f16): Likewise.
    (__arm_vbrsrq_n_f32): Likewise.
    (__arm_vcvtq_n_f16_s16): Likewise.
    (__arm_vcvtq_n_f32_s32): Likewise.
    (__arm_vcvtq_n_f16_u16): Likewise.
    (__arm_vcvtq_n_f32_u32): Likewise.
    (__arm_vcreateq_f16): Likewise.
    (__arm_vcreateq_f32): Likewise.
    (vsubq): Define polymorphic variant.
    (vbrsrq): Likewise.
    (vcvtq_n): Likewise.
    * config/arm/arm_mve_builtins.def 
(BINOP_NONE_NONE_NONE_QUALIFIERS): Use

    it.
    (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/constraints.md (Rd): Define constraint to check 
constant is

    in the range of 1 to 16.
    * config/arm/mve.md (mve_vsubq_n_f): Define RTL pattern.
    mve_vbrsrq_n_f: Likewise.
    mve_vcvtq_n_to_f_: Likewise.
    mve_vcreateq_f: Likewise.
    * config/arm/predicates.md (mve_imm_16): Define predicate to check
    the matching constraint Rd.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_UNONE_IMM_QUALIFIERS \
   (arm_unop_unone_imm_qualifiers)

+static enum arm_type_qualifiers
+arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none };
+#define BINOP_NONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_none_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_NONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_none_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate };
+#define BINOP_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_none_unone_unone_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */

    /* void ([T

Re: [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vctp16q, vctp32q, vctp64q, vctp8q, vpnot.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


There are few conflicts in defining the machine registers, resolved by 
re-ordering

VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (hi_UP): Define mode.
    * config/arm/arm.h (IS_VPR_REGNUM): Move.
    * config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM.
    (APSRQ_REGNUM): Modify.
    (APSRGE_REGNUM): Modify.
    * config/arm/arm_mve.h (vctp16q): Define macro.
    (vctp32q): Likewise.
    (vctp64q): Likewise.
    (vctp8q): Likewise.
    (vpnot): Likewise.
    (__arm_vctp16q): Define intrinsic.
    (__arm_vctp32q): Likewise.
    (__arm_vctp64q): Likewise.
    (__arm_vctp8q): Likewise.
    (__arm_vpnot): Likewise.
    * config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin
    qualifier.
    * config/arm/mve.md (mve_vctpqhi): Define RTL pattern.
    (mve_vpnothi): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vctp16q.c: New test.
    * gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vpnot.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP    E_OImode
 #define hf_UP    E_HFmode
 #define si_UP    E_SImode
+#define hi_UP    E_HImode
 #define void_UP  E_VOIDmode

 #define UP(X) X##_UP
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 
100644

--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -955,6 +955,9 @@ extern int arm_arch_cmse;
 #define IS_IWMMXT_GR_REGNUM(REGNUM) \
   (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= 
LAST_IWMMXT_GR_REGNUM))


+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Base register for access to local variables of the function.  */
 #define FRAME_POINTER_REGNUM    102

@@ -999,7 +1002,7 @@ extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))

 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   +1 VPR + 1 APSRQ + 1 APSRGE.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
 #define FIRST_PSEUDO_REGISTER   107
@@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */    \
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,    \
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,   \
-  VPR_REGNUM   \
+  SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\
+  APSRGE_REGNUM    \
 }

-#define IS_VPR_REGNUM(REGNUM) \
-  ((REGNUM) == VPR_REGNUM)
-
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -39,9 +39,9 @@
    (LAST_ARM_REGNUM  15)   ;
    (CC_REGNUM   100)   ; Condition code pseudo register
    (VFPCC_REGNUM    101)   ; VFP Condition code pseudo register
-   (APSRQ_REGNUM    104)   ; Q bit pseudo register
-   (APSRGE_REGNUM   105)   ; GE bits pseudo register
-   (VPR_REGNUM  106)   ; Vector Predication Register - MVE 
register.
+   (VPR_REGNUM  104)   ; Vector Predication Register - MVE 
register.

+   (APSRQ_REGNUM    105)   ; Q bit pseudo register
+   (APSRGE_REGNUM   106)   ; GE bits pseudo register
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index

Re: [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, 
vcvtq_s16_f16, vcvtq_s32_f32,
vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, 
vcvtq_u16_f16, vcvtq_u32_f32,

vrev64q.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define.
    (UNOP_SNONE_NONE_QUALIFIERS): Likewise.
    (UNOP_SNONE_IMM_QUALIFIERS): Likewise.
    (UNOP_UNONE_NONE_QUALIFIERS): Likewise.
    (UNOP_UNONE_UNONE_QUALIFIERS): Likewise.
    (UNOP_UNONE_IMM_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vmvnq_n_s16): Define macro.
    (vmvnq_n_s32): Likewise.
    (vrev64q_s8): Likewise.
    (vrev64q_s16): Likewise.
    (vrev64q_s32): Likewise.
    (vcvtq_s16_f16): Likewise.
    (vcvtq_s32_f32): Likewise.
    (vrev64q_u8): Likewise.
    (vrev64q_u16): Likewise.
    (vrev64q_u32): Likewise.
    (vmvnq_n_u16): Likewise.
    (vmvnq_n_u32): Likewise.
    (vcvtq_u16_f16): Likewise.
    (vcvtq_u32_f32): Likewise.
    (__arm_vmvnq_n_s16): Define intrinsic.
    (__arm_vmvnq_n_s32): Likewise.
    (__arm_vrev64q_s8): Likewise.
    (__arm_vrev64q_s16): Likewise.
    (__arm_vrev64q_s32): Likewise.
    (__arm_vrev64q_u8): Likewise.
    (__arm_vrev64q_u16): Likewise.
    (__arm_vrev64q_u32): Likewise.
    (__arm_vmvnq_n_u16): Likewise.
    (__arm_vmvnq_n_u32): Likewise.
    (__arm_vcvtq_s16_f16): Likewise.
    (__arm_vcvtq_s32_f32): Likewise.
    (__arm_vcvtq_u16_f16): Likewise.
    (__arm_vcvtq_u32_f32): Likewise.
    (vrev64q): Define polymorphic variant.
    * config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it.
    (UNOP_SNONE_NONE): Likewise.
    (UNOP_SNONE_IMM): Likewise.
    (UNOP_UNONE_UNONE): Likewise.
    (UNOP_UNONE_NONE): Likewise.
    (UNOP_UNONE_IMM): Likewise.
    * config/arm/mve.md (mve_vrev64q_): Define RTL 
pattern.

    (mve_vcvtq_from_f_): Likewise.
    (mve_vmvnq_n_): Likewise.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_NONE_UNONE_QUALIFIERS \
   (arm_unop_none_unone_qualifiers)

+static enum arm_type_qualifiers
+arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_SNONE_QUALIFIERS \
+  (arm_unop_snone_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_NONE_QUALIFIERS \
+  (arm_unop_snone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate };
+#define UNOP_SNONE_IMM_QUALIFIERS \
+  (arm_unop_snone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define UNOP_UNONE_NONE_QUALIFIERS \
+  (arm_unop_unone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned,

Re: [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, 
vcvtq_f16_u16, vcvtq_f32_u32n
vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, 
vrndnq_f16, vrndnq_f32,
vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, 
vrev64q_f32, vnegq_f16, vnegq_f32,
vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, 
vcvttq_f32_f16, vcvtbq_f32_f16.



Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-17  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): 
Define macro.

    (UNOP_NONE_SNONE_QUALIFIERS): Likewise.
    (UNOP_NONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vrndxq_f16): Define macro.
    (vrndxq_f32): Likewise.
    (vrndq_f16) Likewise.:
    (vrndq_f32): Likewise.
    (vrndpq_f16): Likewise.
    (vrndpq_f32): Likewise.
    (vrndnq_f16): Likewise.
    (vrndnq_f32): Likewise.
    (vrndmq_f16): Likewise.
    (vrndmq_f32): Likewise.
    (vrndaq_f16): Likewise.
    (vrndaq_f32): Likewise.
    (vrev64q_f16): Likewise.
    (vrev64q_f32): Likewise.
    (vnegq_f16): Likewise.
    (vnegq_f32): Likewise.
    (vdupq_n_f16): Likewise.
    (vdupq_n_f32): Likewise.
    (vabsq_f16): Likewise.
    (vabsq_f32): Likewise.
    (vrev32q_f16): Likewise.
    (vcvttq_f32_f16): Likewise.
    (vcvtbq_f32_f16): Likewise.
    (vcvtq_f16_s16): Likewise.
    (vcvtq_f32_s32): Likewise.
    (vcvtq_f16_u16): Likewise.
    (vcvtq_f32_u32): Likewise.
    (__arm_vrndxq_f16): Define intrinsic.
    (__arm_vrndxq_f32): Likewise.
    (__arm_vrndq_f16): Likewise.
    (__arm_vrndq_f32): Likewise.
    (__arm_vrndpq_f16): Likewise.
    (__arm_vrndpq_f32): Likewise.
    (__arm_vrndnq_f16): Likewise.
    (__arm_vrndnq_f32): Likewise.
    (__arm_vrndmq_f16): Likewise.
    (__arm_vrndmq_f32): Likewise.
    (__arm_vrndaq_f16): Likewise.
    (__arm_vrndaq_f32): Likewise.
    (__arm_vrev64q_f16): Likewise.
    (__arm_vrev64q_f32): Likewise.
    (__arm_vnegq_f16): Likewise.
    (__arm_vnegq_f32): Likewise.
    (__arm_vdupq_n_f16): Likewise.
    (__arm_vdupq_n_f32): Likewise.
    (__arm_vabsq_f16): Likewise.
    (__arm_vabsq_f32): Likewise.
    (__arm_vrev32q_f16): Likewise.
    (__arm_vcvttq_f32_f16): Likewise.
    (__arm_vcvtbq_f32_f16): Likewise.
    (__arm_vcvtq_f16_s16): Likewise.
    (__arm_vcvtq_f32_s32): Likewise.
    (__arm_vcvtq_f16_u16): Likewise.
    (__arm_vcvtq_f32_u32): Likewise.
    (vrndxq): Define polymorphic variants.
    (vrndq): Likewise.
    (vrndpq): Likewise.
    (vrndnq): Likewise.
    (vrndmq): Likewise.
    (vrndaq): Likewise.
    (vrev64q): Likewise.
    (vnegq): Likewise.
    (vabsq): Likewise.
    (vrev32q): Likewise.
    (vcvtbq_f32): Likewise.
    (vcvttq_f32): Likewise.
    (vcvtq): Likewise.
    * config/arm/arm_mve_builtins.def (VAR2): Define.
    (VAR1): Define.
    * config/arm/mve.md (mve_vrndxq_f): Add RTL pattern.
    (mve_vrndq_f): Likewise.
    (mve_vrndpq_f): Likewise.
    (mve_vrndnq_f): Likewise.
    (mve_vrndmq_f): Likewise.
    (mve_vrndaq_f): Likewise.
    (mve_vrev64q_f): Likewise.
    (mve_vnegq_f): Likewise.
    (mve_vdupq_n_f): Likewise.
    (mve_vabsq_f): Likewise.
    (mve_vrev32q_fv8hf): Likewise.
    (mve_vcvttq_f32_f16v4sf): Likewise.
    (mve_vcvtbq_f32_f16v4sf): Likewise.
    (mve_vcvtq_to_f_): Likewise.

gcc/testsuite/ChangeLog:

2019-10-17  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise.
    *

Re: [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the double-precision 
arithmetic
operations for MVE. This changes are to support the MVE ACLE 
intrinsics which

operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Srinath Parvathaneni 

    * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
function to add

    emulator calls for dobule precision arithmetic operations for MVE.



I'm a bit confused by the changelog and the comment in the patch





### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
   /* Values from double-precision helper functions are returned 
in core

  registers if the selected core only supports single-precision
  arithmetic, even if we are using the hard-float ABI.  The 
same is

-    true for single-precision helpers, but we will never be using the
-    hard-float ABI on a CPU which doesn't support single-precision
-    operations in hardware.  */
+    true for single-precision helpers except in case of MVE, 
because in
+    MVE we will be using the hard-float ABI on a CPU which 
doesn't support
+    single-precision operations in hardware.  In MVE the 
following check

+    enables use of emulation for the double-precision arithmetic
+    operations.  */
+  if (TARGET_HAVE_MVE)
+   {
+ add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+   }


... this adds emulation for SFmode but you say you want double-precision 
emulation?


Can you demonstrate what this patch wants to achieve with a testcase?

Thanks,

Kyrill





   add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));

Re: [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.

2019-12-19 Thread Kyrill Tkachov




On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, 
vst4q_s32, vst4q_u8,

vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32.

In this patch arm_mve_builtins.def file is added to the source code in 
which the

builtins for MVE ACLE intrinsics are defined using builtin qualifiers.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?



Ok.

Thanks,

Kyrill




Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (CF): Define mve_builtin_data.
    (VAR1): Define.
    (ARM_BUILTIN_MVE_PATTERN_START): Define.
    (arm_init_mve_builtins): Define function.
    (arm_init_builtins): Add TARGET_HAVE_MVE check.
    (arm_expand_builtin_1): Check the range of fcode.
    (arm_expand_mve_builtin): Define function to expand MVE builtins.
    (arm_expand_builtin): Check the range of fcode.
    * config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE 
floating point

    types.
    (__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user 
namespace.

    (vst4q_s8): Define macro.
    (vst4q_s16): Likewise.
    (vst4q_s32): Likewise.
    (vst4q_u8): Likewise.
    (vst4q_u16): Likewise.
    (vst4q_u32): Likewise.
    (vst4q_f16): Likewise.
    (vst4q_f32): Likewise.
    (__arm_vst4q_s8): Define inline builtin.
    (__arm_vst4q_s16): Likewise.
    (__arm_vst4q_s32): Likewise.
    (__arm_vst4q_u8): Likewise.
    (__arm_vst4q_u16): Likewise.
    (__arm_vst4q_u32): Likewise.
    (__arm_vst4q_f16): Likewise.
    (__arm_vst4q_f32): Likewise.
    (__ARM_mve_typeid): Define macro with MVE types.
    (__ARM_mve_coerce): Define macro with _Generic feature.
    (vst4q): Define polymorphic variant for different vst4q builtins.
    * config/arm/arm_mve_builtins.def: New file.
    * config/arm/mve.md (MVE_VLD_ST): Define iterator.
    (unspec): Define unspec.
    (mve_vst4q): Define RTL pattern.
    * config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def.
    (arm-builtins.o): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] =
 };

 #undef CF
+#define CF(N,X) CODE_FOR_mve_##N##X
+static arm_builtin_datum mve_builtin_data[] =
+{
+#include "arm_mve_builtins.def"
+};
+
+#undef CF
 #undef VAR1
 #define VAR1(T, N, A) \
   {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS},
@@ -705,6 +712,13 @@ enum arm_builtins

 #include "arm_acle_builtins.def"

+  ARM_BUILTIN_MVE_BASE,
+
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_MVE_##N##X,
+#include "arm_mve_builtins.def"
+
   ARM_BUILTIN_MAX
 };

@@ -714,6 +728,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)

+#define ARM_BUILTIN_MVE_PATTERN_START \
+  (ARM_BUILTIN_MVE_BASE + 1)
+
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)

@@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void)
 }
 }

+/* Set up all the MVE builtins mentioned in arm_mve_builtins.def 
file.  */

+static void
+arm_init_mve_builtins (void)
+{
+  volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START;
+
+  arm_init_simd_builtin_scalar_types ();
+  arm_init_simd_builtin_types ();
+
+  for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
+    {
+  arm_builtin_datum *d = _builtin_data[i];
+  arm_init_builtin (fcode, d, "__builtin_mve");
+    }
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that 
are not
    in the current target ISA to allow the user to compile particular 
modules
    with different target specific options that differ from the 
command line

@@ -1961,8 +1994,10 @@ arm_init_builtins (void)
   = add_builtin_function

Re: [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application 
Program Status Register)
register and FPSCR (Floating-point Status and Control Register) 
register for MVE.

This patch also enables thumb2 mov RTL patterns for MVE.

Please refer to Arm reference manual [1] for more details.
[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check 
to not allow

    TARGET_HAVE_MVE for this pattern.
    (thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to 
update APSR register.

    * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define.
    (VUNSPEC_GET_FPSCR): Remove.
    * config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movhi_fp16): Add TARGET_HAVE_MVE check.
    (thumb2_movsi_vfp): Add TARGET_HAVE_MVE check.
    (movdi_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movdf_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check.
    (push_multi_vfp): Add TARGET_HAVE_MVE check.
    (set_fpscr): Add TARGET_HAVE_MVE check.
    (get_fpscr): Add TARGET_HAVE_MVE check.



These pattern changes do more that add a TARGET_HAVE_MVE check. Some add 
new alternatives, some even change the RTL pattern.


I'd like to see them reflected in the ChangeLog so that I know they're 
deliberate.






### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -435,10 +435,10 @@
 (define_insn "*cmovsi_insn"
   [(set (match_operand:SI 0 "arm_general_register_operand" 
"=r,r,r,r,r,r,r")

 (if_then_else:SI
-    (match_operator 1 "arm_comparison_operator"
- [(match_operand 2 "cc_register" "") (const_int 0)])
-    (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
-    (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, 
r,UM,U1")))]

+   (match_operator 1 "arm_comparison_operator"
+    [(match_operand 2 "cc_register" "") (const_int 0)])
+   (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
+   (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
   "TARGET_THUMB2 && TARGET_COND_ARITH
    && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
    || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
@@ -540,7 +540,7 @@
   [(match_operand 4 "cc_register" "") 
(const_int 0)])

  (match_operand:SF 1 "s_register_operand" "0,r")
  (match_operand:SF 2 "s_register_operand" 
"r,0")))]

-  "TARGET_THUMB2 && TARGET_SOFT_FLOAT"
+  "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE"
   "@
    it\\t%D3\;mov%D3\\t%0, %2
    it\\t%d3\;mov%d3\\t%0, %1"
@@ -1226,7 +1226,7 @@
    ; added to clear the APSR and potentially the FPSCR if VFP is 
available, so

    ; we adapt the length accordingly.
    (set (attr "length")
- (if_then_else (match_test "TARGET_HARD_FLOAT")
+ (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE")
   (const_int 34)
   (const_int 8)))
    ; We do not support predicate execution of returns from 
cmse_nonsecure_entry

diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 
b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 
100644

--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -170,6 +170,7 @@
   UNSPEC_TORC  ; Used by the intrinsic form of the iWMMXt 
TORC instruction.
   UNSPEC_TORVSC    ; Used by the intrinsic form of the 
iWMMXt TORVSC instruction.
   UNSPEC_TEXTRC    ; Used by the intrinsic form of the 
iWMMXt TEXTRC instruction.

+  UNSPEC_GET_FPSCR ; Represent fetch of FPSCR content.
 ])


@@ -216,7 +217,6 @@
   VUNSPEC_SLX  ; Represent a store-register-release-exclusive.
   VUNSPEC_LDA  ; Represent a store-register-acquire.
   VUNSPEC_STL  ; Represent a store-register-release.
-  VUNSPEC_GET_FPSCR    ; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR    ; Represent assign of FPSCR content.
   VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
   VUNSPEC_CDP  ; Represent the coprocessor cdp instruction.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index

Re: [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.

2019-12-18 Thread Kyrill Tkachov




On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE 
intrinsics.


Header file arm_mve.h is added to source code, which contains the 
definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md 
is also added which contains the

RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated 
patterns. A new register class "VPR_REG"

is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move 
patterns. The prefix of neon functions

which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to 
simd_immediate_valid_for_move.


In the patch standard patterns mve_move, mve_store and move_load for 
MVE are added and neon.md and vfp.md

files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?


Ok.

Thanks,

Kyrill



Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config.gcc (arm_mve.h): Add header file.
    * config/arm/aout.h (p0): Add new register name.
    * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.
    (ARM_BUILTIN_NEON_LANE_CHECK): Remove.
    (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.
    (arm_init_neon_builtins): Move a check to arm_init_builtins 
function.
    (arm_init_builtins): Move a check from arm_init_neon_builtins 
function.

    (mve_dereference_pointer): Add new function.
    (arm_expand_builtin_args): Add TARGET_HAVE_MVE check.
    (arm_expand_neon_builtin): Move a check to arm_expand_builtin 
function.
    (arm_expand_builtin): Move a check from 
arm_expand_neon_builtin function.

    * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.
    * config/arm/arm-modes.def (INT_MODE): Add three new integer 
modes.
    * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
Rename function.
    (simd_immediate_valid_for_move): Rename 
neon_immediate_valid_for_move function.
    * config/arm/arm.c 
(arm_options_perform_arch_sanity_checks):Enable mve isa bit.

    (use_return_insn): Add TARGET_HAVE_MVE check.
    (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.
    (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.
    (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.
    (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.
    (neon_valid_immediate): Rename to simd_valid_immediate.
    (simd_valid_immediate): Rename from neon_valid_immediate.
    (neon_immediate_valid_for_move): Rename to 
simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Rename from 
neon_immediate_valid_for_move.
    (neon_immediate_valid_for_logic): Modify call to 
neon_valid_immediate function.
    (neon_make_constant): Modify call to neon_valid_immediate 
function.

    (neon_vector_mem_operand): Add TARGET_HAVE_MVE check.
    (output_move_neon): Add TARGET_HAVE_MVE check.
    (arm_compute_frame_layout): Add TARGET_HAVE_MVE check.
    (arm_save_coproc_regs): Add TARGET_HAVE_MVE check.
    (arm_print_operand): Add case 'E' to print memory operands.
    (arm_print_operand_address): Add TARGET_HAVE_MVE check.
    (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.
    (arm_modes_tieable_p): Add TARGET_HAVE_MVE check.
    (arm_regno_class): Add VPR_REGNUM check.
    (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.
    (arm_expand_epilogue): Add TARGET_HAVE_MVE check.
    (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for 
MVE vector modes.

    (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.
    (arm_conditional_register_usage): For TARGET_HAVE_MVE enable 
VPR register.
    * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR 
register.

    (FIRST_PSEUDO_REGISTER): Modify.
    (VALID_MVE_MODE): Define.
    (VALID_MVE_SI_MODE): Define.
    (VALID_MVE_SF_MODE): Define.
    (VALID_MVE_STRUCT_MODE): Define.
    (REG_ALLOC_ORDER): Add VPR_REGNUM entry.
    (enum reg_class): Add VPR_REG entry.
    (REG_CLASS_NAMES): Add VPR_REG entry.
    * config/arm/arm.md (VPR_REGNUM): Define.
    (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.
    (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check 
to allow writeback.

    (include "mve.md"): Include mve.md file.
    * config/arm/arm_mve.h: New file.
    *

Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions

2019-12-18 Thread Kyrill Tkachov




On 12/18/19 5:00 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 12/18/2019 02:13 PM, Kyrill Tkachov wrote:
> Hi Mihail,
>
> On 11/8/19 4:52 PM, Mihail Ionescu wrote:
>> Hi,
>>
>> This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm
>> backend.
>> Two new option added for v8.1-m.main: "+mve" for integer MVE
>> instructions only
>> and "+mve.fp" for both integer and single-precision/half-precision
>> floating-point MVE.
>> The patch also maps the Armv8.1-M multilib variants to the
>> corresponding v8-M ones.
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-08  Mihail Ionescu 
>> 2019-11-08  Andre Vieira 
>>
>>     * config/arm/arm-cpus.in (mve, mve_float): New features.
>>     (dsp, mve, mve.fp): New options.
>>     * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT):
>> Define.
>>     * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-08  Mihail Ionescu 
>> 2019-11-08  Andre Vieira 
>>
>>     * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.
>>
>>
>> Is this ok for trunk?
>
>
> This is ok, but please document the new options in invoke.texi.
>

Here it is with the updated invoke.texi and ChangeLog.



Thanks, looks great to me.

Kyrill



gcc/ChangeLog:

2019-12-18  Mihail Ionescu  
2019-12-18  Andre Vieira 

    * config/arm/arm-cpus.in (mve, mve_float): New features.
    (dsp, mve, mve.fp): New options.
    * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): 
Define.

    * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.
    * doc/invoke.texi: Document the armv8.1-m mve and dsp options.


gcc/testsuite/ChangeLog:

2019-12-18  Mihail Ionescu  
2019-12-18  Andre Vieira 

    * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.


Thanks,
Mihail

> Thanks,
>
> Kyrill
>
>
>>
>> Best regards,
>>
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply
>> ###
>>
>>
>> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
>> index
>> 
59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 


>> 100644
>> --- a/gcc/config/arm/arm-cpus.in
>> +++ b/gcc/config/arm/arm-cpus.in
>> @@ -194,6 +194,10 @@ define feature sb
>>  # v8-A architectures, added by default from v8.5-A
>>  define feature predres
>>
>> +# M-profile Vector Extension feature bits
>> +define feature mve
>> +define feature mve_float
>> +
>>  # Feature groups.  Conventionally all (or mostly) upper case.
>>  # ALL_FPU lists all the feature bits associated with the 
floating-point
>>  # unit; these will all be removed if the floating-point unit is 
disabled

>> @@ -654,9 +658,12 @@ begin arch armv8.1-m.main
>>   base 8M_MAIN
>>   isa ARMv8_1m_main
>>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>> + option dsp add armv7em
>>   option fp add FPv5 fp16
>>   option fp.dp add FPv5 FP_DBL fp16
>>   option nofp remove ALL_FP
>> + option mve add mve armv7em
>> + option mve.fp add mve FPv5 fp16 mve_float armv7em
>>  end arch armv8.1-m.main
>>
>>  begin arch iwmmxt
>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index
>> 
64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 


>> 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -310,6 +310,12 @@ emission of floating point pcs attributes.  */
>>     instructions (most are floating-point related).  */
>>  #define TARGET_HAVE_FPCXT_CMSE (arm_arch8_1m_main)
>>
>> +#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \
>> + isa_bit_mve))
>> +
>> +#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \
>> + isa_bit_mve_float))
>> +
>>  /* Nonzero if integer division instructions supported.  */
>>  #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
>>   || (TARGET_THUMB && arm_arch_thumb_hwdiv))
>> diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
>> index
>> 
807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef 


>> 100644
>> --- a/gcc/config/arm/t-rmprofile
>> +++ b/gcc/config/arm/t-rmprofile
>> @@ -54,7 +54,7 @@ MULTILIB_REQUIRED +=
>> mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp
>>  # Arch Matches
>>  MULTILIB_MATCHES    += mar

Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns

2019-12-18 Thread Kyrill Tkachov




On 12/18/19 1:38 PM, Mihail Ionescu wrote:

Hi,

On 11/12/2019 10:23 AM, Kyrill Tkachov wrote:


On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to call
functions with the cmse_nonsecure_call attribute directly using blxns
with no undue restriction on the register used for that.

=== Patch description ===

This change to use BLXNS to call a nonsecure function from secure
directly (not using a libcall) is made in 2 steps:
- change nonsecure_call patterns to use blxns instead of calling
  __gnu_cmse_nonsecure_call
- loosen requirement for function address to allow any register when
  doing BLXNS.

The former is a straightforward check over whether instructions 
added in

Armv8.1-M Mainline are available while the latter consist in making the
nonsecure call pattern accept any register by using match_operand and
changing the nonsecure_call_internal expander to no force r4 when
targeting Armv8.1-M Mainline.

The tricky bit is actually in the test update, specifically how to 
check

that register lists for CLRM have all registers except for the one
holding parameters (already done) and the one holding the address used
by BLXNS. This is achieved with 3 scan-assembler directives.

1) The first one lists all registers that can appear in CLRM but make
   each of them optional.
   Property guaranteed: no wrong register is cleared and none appears
   twice in the register list.
2) The second directive check that the CLRM is made of a fixed number
   of the right registers to be cleared. The number used is the number
   of registers that could contain a secret minus one (used to hold the
   address of the function to call.
   Property guaranteed: register list has the right number of registers
   Cumulated property guaranteed: only registers with a potential 
secret

   are cleared and they are all listed but ont
3) The last directive checks that we cannot find a CLRM with a register
   in it that also appears in BLXNS. This is check via the use of a
   back-reference on any of the allowed register in CLRM, the
   back-reference enforcing that whatever register match in CLRM 
must be

   the same in the BLXNS.
   Property guaranteed: register used for BLXNS is different from
   registers cleared in CLRM.

Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c
testcase due to there being two CLRM generated. To ensure the third
directive match the right CLRM to the BLXNS, a negative lookahead is
used between the CLRM register list and the BLXNS. The way negative
lookahead work is by matching the *position* where a given regular
expression does not match. In this case, since it comes after the CLRM
register list it is requesting that what comes after the register list
does not have a CLRM again followed by BLXNS. This guarantees that the
.*blxns after only matches a blxns without another CLRM before.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.md (nonsecure_call_internal): Do not force 
memory

    address in r4 when targeting Armv8.1-M Mainline.
    (nonsecure_call_value_internal): Likewise.
    * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make 
memory address

    a register match_operand again.  Emit BLXNS when targeting
    Armv8.1-M Mainline.
    (nonsecure_call_value_reg_thumb2): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when 
instructions
    introduced in Armv8.1-M Mainline Security Extensions are 
available and
    restrict checks for libcall to __gnu_cmse_nonsecure_call to 
Armv8-M
    targets only.  Adapt CLRM check to verify register used for 
BLXNS is

    not in the CLRM register list.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise 
and adapt
    check for LSB clearing bit to be using the same register as 
BLXNS when

    targeting Armv8.1-M Mainline.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c

Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions

2019-12-18 Thread Kyrill Tkachov


Hi Mihail,

On 11/8/19 4:52 PM, Mihail Ionescu wrote:

Hi,

This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm 
backend.
Two new option added for v8.1-m.main: "+mve" for integer MVE 
instructions only

and "+mve.fp" for both integer and single-precision/half-precision
floating-point MVE.
The patch also maps the Armv8.1-M multilib variants to the 
corresponding v8-M ones.




gcc/ChangeLog:

2019-11-08  Mihail Ionescu  
2019-11-08  Andre Vieira 

    * config/arm/arm-cpus.in (mve, mve_float): New features.
    (dsp, mve, mve.fp): New options.
    * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): 
Define.

    * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.


gcc/testsuite/ChangeLog:

2019-11-08  Mihail Ionescu  
2019-11-08  Andre Vieira 

    * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.


Is this ok for trunk?



This is ok, but please document the new options in invoke.texi.

Thanks,

Kyrill




Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 
100644

--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -194,6 +194,10 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres

+# M-profile Vector Extension feature bits
+define feature mve
+define feature mve_float
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -654,9 +658,12 @@ begin arch armv8.1-m.main
  base 8M_MAIN
  isa ARMv8_1m_main
 # fp => FPv5-sp-d16; fp.dp => FPv5-d16
+ option dsp add armv7em
  option fp add FPv5 fp16
  option fp.dp add FPv5 FP_DBL fp16
  option nofp remove ALL_FP
+ option mve add mve armv7em
+ option mve.fp add mve FPv5 fp16 mve_float armv7em
 end arch armv8.1-m.main

 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 
100644

--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -310,6 +310,12 @@ emission of floating point pcs attributes.  */
    instructions (most are floating-point related).  */
 #define TARGET_HAVE_FPCXT_CMSE  (arm_arch8_1m_main)

+#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \
+  isa_bit_mve))
+
+#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \
+ isa_bit_mve_float))
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
  || (TARGET_THUMB && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef 
100644

--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -54,7 +54,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp

 # Arch Matches
 MULTILIB_MATCHES    += march?armv6s-m=march?armv6-m

-# Map all v8-m.main+dsp FP variants down the the variant without DSP.
+# Map all v8-m.main+dsp FP variants down to the variant without DSP.
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8-m.main+dsp \
    $(foreach FP, +fp +fp.dp, \
march?armv8-m.main$(FP)=march?armv8-m.main+dsp$(FP))
@@ -66,3 +66,18 @@ MULTILIB_MATCHES += 
march?armv7e-m+fp=march?armv7e-m+fpv5
 MULTILIB_REUSE  += $(foreach ARCH, armv6s-m armv7-m armv7e-m 
armv8-m\.base armv8-m\.main, \

mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp)

+# Map v8.1-M to v8-M.
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+dsp
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+mve
+
+v8_1m_sp_variants = +fp +dsp+fp +mve.fp
+v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp
+
+# Map all v8.1-m.main FP sp variants down to v8-m.
+MULTILIB_MATCHES += $(foreach FP, $(v8_1m_sp_variants), \
+ march?armv8-m.main+fp=march?armv8.1-m.main$(FP))
+
+# Map all v8.1-m.main FP dp variants down to v8-m.
+MULTILIB_MATCHES += $(foreach FP, $(v8_1m_dp_variants), \
+ march?armv8-m.main+fp.dp=march?armv8.1-m.main$(FP))
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 
dcea829965eb15e372401e6389df5a1403393ecb..63cca118da2578253740fcd95421eae9ddf219bd 
100644

--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -775,6 +775,27 @@ if {[multilib_config "rmprofile"] } {
 {-march=armv8-r+fp.sp -mfpu=auto -mfloat-abi=hard}

Re: [PATCH][AArch64] Fixup core tunings

2019-12-18 Thread Kyrill Tkachov


Hi Wilco,

On 12/17/19 4:03 PM, Wilco Dijkstra wrote:

Hi Richard,

> This changelog entry is inadequate.  It's also not in the correct style.
>
> It should say what has changed, not just that it has changed.

Sure, but there is often no useful space for that. We should auto generate
changelogs if they are deemed useful. I find the commit message a lot more
useful in general. Here is the updated version:


Several tuning settings in cores.def are not consistent.
Set the tuning for Cortex-A76AE and Cortex-A77 to neoversen1 so
it is the same as for Cortex-A76 and Neoverse N1.
Set the tuning for Neoverse E1 to cortexa73 so it's the same as for
Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to
cortexa53.

Bootstrap OK, OK for commit?



Ok.

Thanks,

Kyrill




ChangeLog:
2019-12-17  Wilco Dijkstra  

* config/aarch64/aarch64-cores.def:
("cortex-a76ae"): Use neoversen1 tuning.
("cortex-a77"): Likewise.
("cortex-a65"): Use cortexa53 scheduler.
("cortex-a65ae"): Likewise.
("neoverse-e1"): Use cortexa73 tuning.
--

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
053c6390e747cb9c818fe29a9b22990143b260ad..d170253c6eddca87f8b9f4f7fcc4692695ef83fb 
100644

--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -101,13 +101,13 @@ AARCH64_CORE("thunderx2t99", thunderx2t99,  
thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a65",  cortexa65, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
-AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
+AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | 
AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
+AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd4a, -1)


 /* HiSilicon ('H') cores. */
 AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A, AARCH64_FL_FOR_ARCH8_2 
| AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | 
AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
@@ -127,6 +127,6 @@ AARCH64_CORE("cortex-a73.cortex-a53", 
cortexa73cortexa53, cortexa53, 8A,  AARCH

 /* ARM DynamIQ big.LITTLE configurations.  */

 AARCH64_CORE("cortex-a75.cortex-a55", cortexa75cortexa55, cortexa53, 
8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 
0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55", cortexa76cortexa55, cortexa53, 
8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd0b, 
0xd05), -1)
+AARCH64_CORE("cortex-a76.cortex-a55",

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov




On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;

This is a poor name in the context of the function as a whole.  What's
not supported.  Please think of a better name so that I have some idea
what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?


The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?


Yes, I think that is simplest correct change to make.

Thanks,

Kyrill



Christophe


Thanks,

Kyrill




Thanks,

Christophe


Thanks,

Christophe


R.

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov


Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:
>
> On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
>  wrote:
> >
> > On 18/10/2019 14:18, Christophe Lyon wrote:
> > > +  bool not_supported = arm_arch_notm || flag_pic || 
TARGET_NEON;

> > >
> >
> > This is a poor name in the context of the function as a whole.  What's
> > not supported.  Please think of a better name so that I have some idea
> > what the intention is.
>
> That's to keep most of the code common when checking if -mpure-code
> and -mslow-flash-data are supported.
> These 3 cases are common to the two compilation flags, and
> -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
>
> Would "common_unsupported_modes" work better for you?
> Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
> the two tests.
>

Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?



The name looks ok to me. Richard had a concern about Armv8-M Baseline, 
but I do see it being supported as you pointed out.


So I believe all the concerns are addressed.

Thus the code is ok. However, please also updated the documentation for 
-mpure-code in invoke.texi (it currently states that a MOVT instruction 
is needed).


Thanks,

Kyrill





Thanks,

Christophe

> Thanks,
>
> Christophe
>
> >
> > R.

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-12-17 Thread Kyrill Tkachov


Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 11/12/2019 09:55 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM 
instruction

when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple instruction 
pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register 
clearing if

    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile 
unspec.


*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks 
for Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc

Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-12-17 Thread Kyrill Tkachov


Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:


Hi Kyrill,

On 11/06/2019 04:12 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on 
return from

nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore 
the FP

context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored 
and

    their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
    Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (fp_sysreg_names): Declare and define.
    (use_return_insn): Also return false for Armv8.1-M Mainline.
    (output_return_instruction): Skip FPSCR clearing if Armv8.1-M
    Mainline instructions are available.
    (arm_compute_frame_layout): Allocate space in frame for FPCXTNS
    when targeting Armv8.1-M Mainline Security Extensions.
    (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
    Mainline entry function.
    (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
    targeting Armv8.1-M Mainline or successor.
    (arm_expand_epilogue): Fix indentation of caller-saved register
    clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
    entry function.
    * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
    (FP_SYSREGS): Likewise.
    (enum vfp_sysregs_encoding): Define enum.
    (fp_sysreg_names): Declare.
    * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile 
unspec.

    * config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
    (pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and 
VLDR.

    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M 
Mainline tests
    from mainline/8m subdirectory and new Armv8.1-M Mainline 
tests from

    mainline/8_1m subdirectory.
    * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move 
and rename

    into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Move

Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-12-17 Thread Kyrill Tkachov

Hi Mihail,

On 12/16/19 6:28 PM, Mihail Ionescu wrote:

Hi Kyrill

On 11/06/2019 03:59 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 11/4/19 4:49 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:
> [PATCH, GCC/ARM, 2/10] Add command line support
>
> Hi,
>
> === Context ===
>
> This patch is part of a patch series to add support for Armv8.1-M
> Mainline Security Extensions architecture. Its purpose is to add
> command-line support for that new architecture.
>
> === Patch description ===
>
> Besides the expected enabling of the new value for the -march
> command-line option (-march=armv8.1-m.main) and its extensions (see
> below), this patch disables support of the Security Extensions for 
this

> newly added architecture. This is done both by not including the cmse
> bit in the architecture description and by throwing an error message
> when user request Armv8.1-M Mainline Security Extensions. Note that
> Armv8-M Baseline and Mainline Security Extensions are still enabled.
>
> Only extensions for already supported instructions are implemented in
> this patch. Other extensions (MVE integer and float) will be added in
> separate patches. The following configurations are allowed for 
Armv8.1-M

> Mainline with regards to FPU and implemented in this patch:
> + no FPU (+nofp)
> + single precision VFPv5 with FP16 (+fp)
> + double precision VFPv5 with FP16 (+fp.dp)
>
> ChangeLog entry are as follow:
>
> *** gcc/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * config/arm/arm-cpus.in (armv8_1m_main): New feature.
>     (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, 
ARMv6k,
>     ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, 
ARMv7ve,
>     ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, 
ARMv8_3a,
>     ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): 
Reindent.

>     (ARMv8_1m_main): New feature group.
>     (armv8.1-m.main): New architecture.
>     * config/arm/arm-tables.opt: Regenerate.
>     * config/arm/arm.c (arm_arch8_1m_main): Define and default
> initialize.
>     (arm_option_reconfigure_globals): Initialize 
arm_arch8_1m_main.
>     (arm_options_perform_arch_sanity_checks): Error out when 
targeting

>     Armv8.1-M Mainline Security Extensions.
>     * config/arm/arm.h (arm_arch8_1m_main): Declare.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * lib/target-supports.exp
> (check_effective_target_arm_arch_v8_1m_main_ok): Define.
>     (add_options_for_arm_arch_v8_1m_main): Likewise.
> (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.
>
> Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; 
testsuite

> shows no regression.
>
> Is this ok for trunk?
>
Ok.

Something that I remembered last night upon reflection...

New command-line options (or arguments to them) need documentation in 
invoke.texi.

Please add some either as part of this patch or as a separate patch 
if you prefer.

I've added the missing cli options in invoke.texi.

Here's the updated ChangeLog:

2019-12-06  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8_1m_main): New feature.
(ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k,
ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve,
ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
(ARMv8_1m_main): New feature group.
(armv8.1-m.main): New architecture.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1m_main): Define and default 
initialize.

(arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
(arm_options_perform_arch_sanity_checks): Error out when targeting
Armv8.1-M Mainline Security Extensions.
* config/arm/arm.h (arm_arch8_1m_main): Declare.
* doc/invoke.texi: Document armv8.1-m.main.

*** gcc/testsuite/ChangeLog ***

2019-12-16  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* lib/target-supports.exp
(check_effective_target_arm_arch_v8_1m_main_ok): Define.
(add_options_for_arm_arch_v8_1m_main): Likewise.
(check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.

Thanks, this is ok.

Kyrill

Regards,
Mihail

Thanks,

Kyrill

Thanks,

Kyrill

> Best regards,
>
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index
> 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cf

Re: [PATCH] [AARCH64] Improve vector generation cost model

2019-12-13 Thread Kyrill Tkachov


Hi Andrew,

On 3/15/19 1:18 AM, apin...@marvell.com wrote:

From: Andrew Pinski 

Hi,
  On OcteonTX2, ld1r and ld1 (with a single lane) are split
into two different micro-ops unlike most other targets.
This adds three extra costs to the cost table:
ld1_dup: used for "ld1r {v0.4s}, [x0]"
merge_dup: used for "dup v0.4s, v0.4s[0]" and "ins v0.4s[0], v0.4s[0]"
ld1_merge: used fir "ld1 {v0.4s}[0], [x0]"

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.



Sorry for the slow reply, missed it on gcc-patches :(



Thanks,
Andrew Pinski

ChangeLog:
* config/arm/aarch-common-protos.h (vector_cost_table):
Add merge_dup, ld1_merge, and ld1_dup.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs):
Update for the new fields.
(thunderx_extra_costs): Likewise.
(thunderx2t99_extra_costs): Likewise.
(tsv110_extra_costs): Likewise.
* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
(cortexa53_extra_costs): Likewise.
(cortexa57_extra_costs): Likewise.
(exynosm1_extra_costs): Likewise.
(xgene1_extra_costs): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle vec_dup of a 
memory.

Hanlde vec_merge of a memory.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-cost-tables.h | 20 +++
 gcc/config/aarch64/aarch64.c | 22 +
 gcc/config/arm/aarch-common-protos.h |  3 +++
 gcc/config/arm/aarch-cost-tables.h   | 25 +++-
 4 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h

index 5c9442e1b89..9a7c70ba595 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -123,7 +123,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -227,7 +230,10 @@ const struct cpu_cost_table thunderx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* Alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -330,7 +336,10 @@ const struct cpu_cost_table 
thunderx2t99_extra_costs =

   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* Alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -434,7 +443,10 @@ const struct cpu_cost_table tsv110_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b0872..dc4d3d39af8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10568,6 +10568,28 @@ cost_plus:
 }
   break;

+    case VEC_DUPLICATE:
+  if (!speed)
+   return false;



If I read the code right, before this patch we would be returning true 
for !speed i.e. not recursing.


Do we want to trigger a recursion now?



+
+  if (GET_CODE (XEXP (x, 0)) == MEM)
+   *cost += extra_cost->vect.ld1_dup;



Please use MEM_P here.



+  else
+   *cost += extra_cost->vect.merge_dup;
+  return true;
+
+    case VEC_MERGE:
+  if (speed && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+   {
+ if (GET_CODE (XEXP (XEXP (x, 0), 0)) == MEM)



And here.

Thanks,

Kyrill



+   *cost += extra_cost->vect.ld1_merge;
+ else
+   *cost += extra_cost->vect.merge_dup;
+ return true;
+   }
+  break;
+
+
 case TRUNCATE:

   /* Decompose muldi3_highpart.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h

index 11cd5145bbc..dbc1282402a 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -131,6 +131,9 @@ struct fp_cost_table
 struct vector_cost_table
 {
   const int alu;
+  const int merge_dup;
+  const int ld1_merge;
+  const int ld1_dup;
 };

 struct cpu_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h 
b/gcc/config/arm/aarch-cost-tables.h

index bc33efadc6c..a51bc668f56 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -121,7 +121,10 @@ const struct cpu_cost_table generic_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -224,7 +227,10 @@ const struct cpu_cost_table

Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-12-12 Thread Kyrill Tkachov


Hi Matthew,

Martin is the authority on this but I have a small comment inline...

On 12/12/19 3:19 PM, Matthew Malcomson wrote:

This is an analogous option to --bootstrap-asan to configure.  It allows
bootstrapping GCC using HWASAN.

For the same reasons as for ASAN we have to avoid using the HWASAN
sanitizer when compiling libiberty and the lto-plugin.

Also add a function to query whether -fsanitize=hwaddress has been
passed.

ChangeLog:

2019-08-29  Matthew Malcomson 

    * configure: Regenerate.
    * configure.ac: Add --bootstrap-hwasan option.

config/ChangeLog:

2019-12-12  Matthew Malcomson 

    * bootstrap-hwasan.mk: New file.

libiberty/ChangeLog:

2019-12-12  Matthew Malcomson 

    * configure: Regenerate.
    * configure.ac: Avoid using sanitizer.

lto-plugin/ChangeLog:

2019-12-12  Matthew Malcomson 

    * Makefile.am: Avoid using sanitizer.
    * Makefile.in: Regenerate.



### Attachment also inlined for ease of reply    
###



diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
new file mode 100644
index 
..4f60bed3fd6e98b47a3a38aea6eba2a7c320da25

--- /dev/null
+++ b/config/bootstrap-hwasan.mk
@@ -0,0 +1,8 @@
+# This option enables -fsanitize=hwaddress for stage2 and stage3.
+
+STAGE2_CFLAGS += -fsanitize=hwaddress
+STAGE3_CFLAGS += -fsanitize=hwaddress
+POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/.libs
diff --git a/configure b/configure
index 
aec9186b2b0123d3088b69eb1ee541567654953e..6f71b111bd18ec053180beecf83dd4549e83c2b9 
100755

--- a/configure
+++ b/configure
@@ -7270,7 +7270,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 
2>&1; then

   case "$BUILD_CONFIG" in
-    *bootstrap-asan* | *bootstrap-ubsan* )
+    *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/configure.ac b/configure.ac
index 
b8ce2ad20b9d03e42731252a9ec2a8417c13e566..16bfdf164555dad94c789f17b6a63ba1a2e3e9f4 
100644

--- a/configure.ac
+++ b/configure.ac
@@ -2775,7 +2775,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 
2>&1; then

   case "$BUILD_CONFIG" in
-    *bootstrap-asan* | *bootstrap-ubsan* )
+    *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 
6c9579bfaff955eb43875b404fb7db1a667bf522..da9a8809c3440827ac22ef6936e080820197f4e7 
100644

--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2645,6 +2645,13 @@ Some examples of build configurations designed 
for developers of GCC are:
 Compiles GCC itself using Address Sanitization in order to catch 
invalid memory

 accesses within the GCC code.

+@item @samp{bootstrap-hwasan}
+Compiles GCC itself using HWAddress Sanitization in order to catch 
invalid
+memory accesses within the GCC code.  This option is only available 
on AArch64

+targets with a very recent linux kernel (5.4 or later).




Using terms like "very recent" in documentation is discouraged. It won't 
be very recent in a couple of years time and I doubt any of us will 
remember to come update this snippet :)


I suggest something like "this option requires a Linux kernel support 
that supports the right ABI () (5.4 
or later)".


Thanks,

Kyrill




+
+@end table
+
 @section Building a cross compiler

 When building a cross compiler, it is not generally possible to do a
diff --git a/libiberty/configure b/libiberty/configure
index 
7a34dabec32b0b383bd33f07811757335f4dd39c..cb2dd4ff5295598343cc18b3a79a86a778f2261d 
100755

--- a/libiberty/configure
+++ b/libiberty/configure
@@ -5261,6 +5261,7 @@ fi
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac


diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 
f1ce76010c9acde79c5dc46686a78b2e2f19244e..043237628b79cbf37d07359b59c5ffe17a7a22ef 
100644

--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -240,6 +240,7 @@ AC_SUBST(PICFLAG)
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 AC_SUBST(NOASANFLAG)

diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 
28dc21014b2e86988fa88adabd63ce6092e18e02..34aa397d785e3cc9b6975de460d065900364c3ff 
100644

--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -11,8 +11,8 @@ AM_CPPFLAGS =

Re: [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.

2019-12-12 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patches series is to support Arm MVE ACLE intrinsics.

Please refer to Arm reference manual [1] and MVE intrinsics [2] for 
more details.

Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts.

This patch series depends on upstream patches "Armv8.1-M Mainline 
Security Extension" [4],
"CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] 
and "support for Armv8.1-M

Mainline scalar shifts" [6].

[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
[2] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
[3] 
https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914

[4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html
[5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html
[6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html

Srinath Parvathaneni(38):
[PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
[PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with 
unary operand.

[PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
[PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
[PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
[PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
[PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, 
halfword, or word from memory.
[PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads 
half word and word or double word from memory.
[PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half 
word or word to memory.
[PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores 
an half word, word and double word to memory.
[PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus 
operator.

[PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
[PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup 
intrinsics with writeback.
[PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store 
intrinsics with writeback.
[PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) 
variant.
[PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across 
beats" and "beat-wise substract".
[PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and 
deinterleaving load intrinsics and also aliases to vstr and vldr 
intrinsics.

[PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
[PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
[PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry 
intrinsics.



Thank you for working on these.

I will reply to individual patches with more targeted comments.

As this is a fairly large amount of code, here's my high-level view:

The MVE intrinsics spec has more complexities than the Neon intrinsics one:

* It needs support for both the user-namespace versions, and the __arm_* 
ones.


* There are also overloaded forms that in C are implemented using _Generic.

The above two facts make for a rather bulky and messy arm_mve.h 
implementation.


In the case of the _Generic usage we hit the performance problems 
reported in PR c/91937.


Ideally, I'd like to see the frontend parts of these intrinsics 
implemented in a similar way to the SVE ACLE 
(https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00413.html)


i.e. have the compiler inject the right functions into the language and 
do overload resolution through the appropriate hooks, thus keeping the 
(unavoidable) complexity in the backend rather than arm_mve.h


That being said, this is a major feature that I would very much like to 
see in GCC 10 and the current implementation, outside of the new .md

Re: [PATCH, GCC/ARM, 2/2] Add support for ASRL(imm), LSLL(imm) and LSRL(imm) instructions for Armv8.1-M Mainline

2019-12-11 Thread Kyrill Tkachov


Hi Mihail,

On 11/14/19 1:54 PM, Mihail Ionescu wrote:

Hi,

This is part of a series of patches where I am trying to add new
instructions for Armv8.1-M Mainline to the arm backend.
This patch is adding the following instructions:

ASRL (imm)
LSLL (imm)
LSRL (imm)


ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for both reg
    and valid immediate.
    (ashrdi3): Generate thumb2_asrl for both reg and valid immediate.
    (lshrdi3): Generate thumb2_lsrl for valid immediates.
    * config/arm/constraints.md (Pg): New.
    * config/arm/predicates.md (long_shift_imm): New.
    (arm_reg_or_long_shift_imm): Likewise.
    * config/arm/thumb2.md (thumb2_asrl): New immediate alternative.
    (thumb2_lsll): Likewise.
    (thumb2_lsrl): New.

*** gcc/testsuite/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * gcc.target/arm/armv8_1m-shift-imm_1.c: New test.

Testsuite shows no regression when run for arm-none-eabi targets.

Is this ok for trunk?



This is ok once the prerequisites are in.

Thanks,

Kyrill



Thanks
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
b735f858a6a5c94d02a6765c1b349cdcb5e77ee3..82f4a5573d43925fb7638b9078a06699df38f88c 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3509,8 +3509,8 @@
 operands[2] = force_reg (SImode, operands[2]);

   /* Armv8.1-M Mainline double shifts are not expanded.  */
-  if (REG_P (operands[2]))
-   {
+  if (arm_reg_or_long_shift_imm (operands[2], GET_MODE 
(operands[2])))

+    {
   if (!reg_overlap_mentioned_p(operands[0], operands[1]))
 emit_insn (gen_movdi (operands[0], operands[1]));

@@ -3547,7 +3547,8 @@
   "TARGET_32BIT"
   "
   /* Armv8.1-M Mainline double shifts are not expanded.  */
-  if (TARGET_HAVE_MVE && REG_P (operands[2]))
+  if (TARGET_HAVE_MVE
+  && arm_reg_or_long_shift_imm (operands[2], GET_MODE (operands[2])))
 {
   if (!reg_overlap_mentioned_p(operands[0], operands[1]))
 emit_insn (gen_movdi (operands[0], operands[1]));
@@ -3580,6 +3581,17 @@
  (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (TARGET_HAVE_MVE
+    && long_shift_imm (operands[2], GET_MODE (operands[2])))
+    {
+  if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+    emit_insn (gen_movdi (operands[0], operands[1]));
+
+  emit_insn (gen_thumb2_lsrl (operands[0], operands[2]));
+  DONE;
+    }
+
   arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
b76de81b85c8ce7a2ca484a750b908b7ca64600a..d807818c8499a6a65837f1ed0487e45947f68199 
100644

--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;;   Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
-;; in all states: Pf
+;; in all states: Pf, Pg

 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us
@@ -187,6 +187,11 @@
 && !is_mm_consume (memmodel_from_int (ival))
 && !is_mm_release (memmodel_from_int (ival))")))

+(define_constraint "Pg"
+  "@internal In Thumb-2 state a constant in range 1 to 32"
+  (and (match_code "const_int")
+   (match_test "TARGET_THUMB2 && ival >= 1 && ival <= 32")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 
69c10c06ff405e19efa172217a08a512c66cb902..ef5b0303d4424981347287865efb3cca85e56f36 
100644

--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -322,6 +322,15 @@
   && (UINTVAL (XEXP (op, 1)) < 32)")))
    (match_test "mode == GET_MODE (op)")))

+;; True for Armv8.1-M Mainline long shift instructions.
+(define_predicate "long_shift_imm"
+  (match_test "satisfies_constraint_Pg (op)"))
+
+(define_predicate "arm_reg_or_long_shift_imm"
+  (ior (match_test "TARGET_THUMB2
+   && arm_general_register_operand (op, GET_MODE (op))")
+   (match_test "satisfies_constraint_Pg (op)")))
+
 ;; True for MULT, to identify which variant of shift_operator is in use.
 (define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index

Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline

2019-12-11 Thread Kyrill Tkachov


Hi Mihail,

On 11/14/19 1:54 PM, Mihail Ionescu wrote:

Hi,

This patch adds the new scalar shift instructions for Armv8.1-M
Mainline to the arm backend.
This patch is adding the following instructions:

ASRL (reg)
LSLL (reg)



Sorry for the delay, very busy time for GCC development :(




ChangeLog entry are as follow:

*** gcc/ChangeLog ***


2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * config/arm/arm.h (TARGET_MVE): New macro for MVE support.



I don't see this hunk in the patch... There's a lot of v8.1-M-related 
patches in flight. Is it defined elsewhere?



    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for 
TARGET_MVE.

    (ashrdi3): Generate thumb2_asrl for TARGET_MVE.
    * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
    register pairs for doubleword quantities for ARMv8.1M-Mainline.
    * config/arm/thumb2.md (thumb2_asrl): New.
    (thumb2_lsll): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * gcc.target/arm/armv8_1m-shift-reg_1.c: New test.

Testsuite shows no regression when run for arm-none-eabi targets.

Is this ok for trunk?

Thanks
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)


   /* We allow almost any value to be stored in the general registers.
  Restrict doubleword quantities to even register pairs in ARM state
- so that we can use ldrd.  Do not allow very large Neon structure
- opaque modes in general registers; they would use too many.  */
+ so that we can use ldrd and Armv8.1-M Mainline instructions.
+ Do not allow very large Neon structure  opaque modes in general
+ registers; they would use too many.  */



This comment now reads:

"Restrict doubleword quantities to even register pairs in ARM state
 so that we can use ldrd and Armv8.1-M Mainline instructions."

Armv8.1-M Mainline is not ARM mode though, so please clarify this 
comment further.


Looks ok to me otherwise (I may even have merged this with the second 
patch, but I'm not complaining about keeping it simple :) )


Thanks,

Kyrill



   if (regno <= LAST_ARM_REGNUM)
 {
   if (ARM_NUM_REGS (mode) > 4)
 return false;

-  if (TARGET_THUMB2)
+  if (TARGET_THUMB2 && !TARGET_HAVE_MVE)
 return true;

   return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) 
!= 0);

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3503,6 +3503,22 @@
    (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  if (TARGET_HAVE_MVE)
+    {
+  if (!reg_or_int_operand (operands[2], SImode))
+    operands[2] = force_reg (SImode, operands[2]);
+
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (REG_P (operands[2]))
+   {
+ if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+   emit_insn (gen_movdi (operands[0], operands[1]));
+
+ emit_insn (gen_thumb2_lsll (operands[0], operands[2]));
+ DONE;
+   }
+    }
+
   arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
@@ -3530,6 +3546,16 @@
  (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (TARGET_HAVE_MVE && REG_P (operands[2]))
+    {
+  if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+   emit_insn (gen_movdi (operands[0], operands[1]));
+
+  emit_insn (gen_thumb2_asrl (operands[0], operands[2]));
+  DONE;
+    }
+
   arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
c08dab233784bd1cbaae147ece795058d2ef234f..3a716ea954ac55b2081121248b930b7f11520ffa 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1645,3 +1645,19 @@
   }
   [(set_attr "predicable" "yes")]
 )
+
+(define_insn "thumb2_asrl"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+   (ashiftrt:DI (match_dup 0)
+    (match_operand:SI 1 
"arm_general_register_operand" "r")))]

+  "TARGET_HAVE_MVE"
+  "asrl%?\\t%Q0, %R0, %1"
+  [(set_attr "predicable" "yes")])
+
+(define_insn

Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-11 Thread Kyrill Tkachov


Hi all,

On 12/11/19 9:41 AM, Stam Markianos-Wright wrote:



On 12/11/19 3:48 AM, Jeff Law wrote:
> On Mon, 2019-12-09 at 13:40 +, Stam Markianos-Wright wrote:
>>
>> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
>>>
>>> On 12/2/19 9:27 PM, Joseph Myers wrote:
 On Mon, 2 Dec 2019, Jeff Law wrote:

>> 2019-11-13  Stam Markianos-Wright <
>> stam.markianos-wri...@arm.com>
>>
>>  * real.c (struct arm_bfloat_half_format,
>>  encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>  * real.h (arm_bfloat_half_format): New.
>>
>>
> Generally OK.  Please consider using "arm_bfloat_half" instead
> of
> "bfloat_half" for the name field in the arm_bfloat_half_format
> structure.  I'm not sure if that's really visible externally,
> but it
>>> Hi both! Agreed that we want to be conservative. See latest diff
>>> attached with the name field change (also pasted below).
>>
>> .Ping :)
> Sorry if I wasn't clear.  WIth the name change I considered this OK for
> the trunk.  Please install on the trunk.
>
> If you don't have commit privs let me know.

Ahh ok gotcha! Sorry I'm new here, and yes, I don't have commit
privileges, yet!



I've committed this on Stams' behalf with r279216.

Thanks,

Kyrill



Cheers,
Stam
>
>
> Jeff
>

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-10 Thread Kyrill Tkachov


Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!



Sorry for the delay.



Thank you,
Stam


 Forwarded Message 
Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
branches in Thumb2 (PR91816)
Date: Mon, 21 Oct 2019 10:37:09 +0100
From: Stam Markianos-Wright 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , nd ,
James Greenhalgh , Richard Earnshaw




On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * config/arm/arm.md: Update b for Thumb2 range checks.
>>   * config/arm/arm.c: New function arm_gen_far_branch.
>>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>>   prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * testsuite/gcc.target/arm/pr91816.c: New test.
>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
cpu_arch_option *,

>>
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
char *);

>> +
>> +
>
> Lets get the nits out of the way.
>
> Unnecessary extra new line, need a space between int and const above.
>
>

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>
>> +
>> +/* Generate code to enable conditional branches in functions over 
1 MiB.  */

>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +    const char * branch_format)
>
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
>

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +    CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
label_ptr);

>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
label_ptr);

>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
-4) + 4).

>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
(-2040->2048).
>>   ;; For a 'b' pos_range = 254, neg_range = -256  giving 
(-250 ->256).

>>
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574, neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>> (match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>> (pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -    {
>> -  arm_ccfsm_state += 2;
>> -  return \"\";
>> -    }
>> -  return \"b%d1\\t%l0\";
>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> +  {
>> +    arm_ccfsm_state += 2;
>> +    return \"\";
>> +  }
>> + switch (get_attr_length (insn))
>> +  {
>> +    // Thumb2 16-bit b{cond}
>> +    case 2:
>> +
>> +    // Thumb2 32-bit b{cond}
>> +    case 4: return \"b%d1\\t%l0\";break;
>> +
>> +    // Thumb2 b{cond} out of range.  Use unconditional branch.
>> +    case 8: return arm_gen_far_branch \
>> +    (operands, 0, \"Lbcond\", \"b%D1\t\");
>> +    break;
>> +
>> +    // A32 b{cond}
>> +

Re: [PATCH][gas] Implement .cfi_negate_ra_state directive

2019-12-05 Thread Kyrill Tkachov


Sorry, wrong list address from my side, please ignore.

Kyrill

On 12/5/19 10:59 AM, Kyrill Tkachov wrote:

Hi all,

This patch implements the .cfi_negate_ra_state to be consistent with
LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code
DW_CFA_AARCH64_negate_ra_state
is multiplexed on top of DW_CFA_GNU_window_save, as per
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html

I believe this is the simplest patch implementing this and is needed to
allow users to build, for example, the Linux kernel with Armv8.3-A
pointer authentication support with Clang while using gas as the
assembler, which is a common usecase.

Tested gas aarch64-none-elf.
Ok for master and the release branches?

Thanks,
Kyrill

gas/
2019-12-05  Kyrylo Tkachov  

 * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state.
 * testsuite/gas/aarch64/pac_negate_ra_state.s: New file.
 * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise.

[PATCH][gas] Implement .cfi_negate_ra_state directive

2019-12-05 Thread Kyrill Tkachov


Hi all,

This patch implements the .cfi_negate_ra_state to be consistent with
LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code 
DW_CFA_AARCH64_negate_ra_state

is multiplexed on top of DW_CFA_GNU_window_save, as per
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html

I believe this is the simplest patch implementing this and is needed to
allow users to build, for example, the Linux kernel with Armv8.3-A
pointer authentication support with Clang while using gas as the
assembler, which is a common usecase.

Tested gas aarch64-none-elf.
Ok for master and the release branches?

Thanks,
Kyrill

gas/
2019-12-05  Kyrylo Tkachov  

    * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state.
    * testsuite/gas/aarch64/pac_negate_ra_state.s: New file.
    * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise.

diff --git a/gas/dw2gencfi.c b/gas/dw2gencfi.c
index 6c0478a72063801f1f91441a11350daa94605843..707830cbe82f860d21c3b9b8f7cbe1999568398b 100644
--- a/gas/dw2gencfi.c
+++ b/gas/dw2gencfi.c
@@ -726,6 +726,7 @@ const pseudo_typeS cfi_pseudo_table[] =
 { "cfi_remember_state", dot_cfi, DW_CFA_remember_state },
 { "cfi_restore_state", dot_cfi, DW_CFA_restore_state },
 { "cfi_window_save", dot_cfi, DW_CFA_GNU_window_save },
+{ "cfi_negate_ra_state", dot_cfi, DW_CFA_AARCH64_negate_ra_state },
 { "cfi_escape", dot_cfi_escape, 0 },
 { "cfi_signal_frame", dot_cfi, CFI_signal_frame },
 { "cfi_personality", dot_cfi_personality, 0 },
diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.d b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d
new file mode 100644
index ..7ab0f2369dece1a71fc064ae38f6e273128bf074
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d
@@ -0,0 +1,26 @@
+#objdump: --dwarf=frames
+
+.+: file .+
+
+Contents of the .eh_frame section:
+
+ 0010  CIE
+  Version:   1
+  Augmentation:  "zR"
+  Code alignment factor: 4
+  Data alignment factor: -8
+  Return address column: 30
+  Augmentation data: 1b
+  DW_CFA_def_cfa: r31 \(sp\) ofs 0
+
+0014 0018 0018 FDE cie= pc=..0008
+  DW_CFA_advance_loc: 4 to 0004
+  DW_CFA_GNU_window_save
+  DW_CFA_advance_loc: 4 to 0008
+  DW_CFA_def_cfa_offset: 16
+  DW_CFA_offset: r29 \(x29\) at cfa-16
+  DW_CFA_offset: r30 \(x30\) at cfa-8
+  DW_CFA_nop
+  DW_CFA_nop
+
+
diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.s b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s
new file mode 100644
index ..36ddbeb43b7002a68eb6787a21599eb20d2b965e
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s
@@ -0,0 +1,20 @@
+	.arch armv8-a
+	.text
+	.align	2
+	.global	_Z5foo_av
+	.type	_Z5foo_av, %function
+_Z5foo_av:
+.LFB0:
+	.cfi_startproc
+	hint	25 // paciasp
+	.cfi_negate_ra_state
+	stp	x29, x30, [sp, -16]!
+	.cfi_def_cfa_offset 16
+	.cfi_offset 29, -16
+	.cfi_offset 30, -8
+	.cfi_endproc
+.LFE0:
+	.size	_Z5foo_av, .-_Z5foo_av
+	.align	2
+	.global	_Z5foo_bv
+	.type	_Z5foo_bv, %function

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Kyrill Tkachov




On 12/3/19 1:45 PM, Wilco Dijkstra wrote:

Hi,

Part 2, split off from 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00399.html


To enable cores to use the correct max_cond_insns setting, use the 
core-specific

tuning when a CPU/tune is selected unless -mrestrict-it is explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a
0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?


Ok.

Thanks,

Kyrill



ChangeLog:

2019-12-03  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_override_internal):
    Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
daebe76352d62ad94556762b4e3bc3d0532ad411..5ed9046988996e56f754c5588e4d25d5ecdd6b03 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3041,6 +3041,11 @@ arm_option_override_internal (struct 
gcc_options *opts,

   if (!TARGET_THUMB2_P (opts->x_target_flags) || !arm_arch_notm)
 opts->x_arm_restrict_it = 0;

+  /* Use the IT size from CPU specific tuning unless -mrestrict-it is 
used.  */

+  if (!opts_set->x_arm_restrict_it
+  && (opts_set->x_arm_cpu_string || opts_set->x_arm_tune_string))
+    opts->x_arm_restrict_it = 0;
+
   /* Enable -munaligned-access by default for
  - all ARMv6 architecture-based processors when compiling for a 
32-bit ISA

  i.e. Thumb2 and ARM state only.

Re: [PATCH][GCC8][AArch64] Backport Cortex-A76, Ares and Neoverse N1 cpu names

2019-12-02 Thread Kyrill Tkachov




On 12/2/19 12:14 PM, Wilco Dijkstra wrote:

Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch.

2019-11-29  Wilco Dijkstra  

    * config/aarch64/aarch64-cores.def (ares): Define.
    (cortex-a76): Likewise.
    (neoverse-n1): Likewise.
    * config/aarch64/aarch64-tune.md: Regenerate.
    * doc/invoke.texi (AArch64 Options): Document ares, cortex-a76 and
    neoverse-n1.

Ok as it's very non-invasive and provides a convenience to users of that 
branch.


Thanks,

Kyrill



--
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
33b96ca2861dce506a854cff19cfcaa32f0db23a..f48b7c22b2d261203ac25c010a054e47c291ddfc 
100644

--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -85,6 +85,9 @@ AARCH64_CORE("thunderx2t99", thunderx2t99,  
thunderx2t99, 8_1A,  AARCH64_FL_FOR

 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
+AARCH64_CORE("ares",    ares,  cortexa57, 8_2A, 
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1)
+AARCH64_CORE("neoverse-n1", neoversen1,cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1)


 /* ARMv8.3-A Architecture Processors.  */

diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
7b3a7460561ee87e13799f726919c3f870781f6d..f08b7e44b27beeb41df928cf3aa09e59e734b5d2 
100644

--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
+"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
 (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
c63f5611afb52b2358207a458dd6c275403a5a45..57340cea31df315ce37cfd57e084844da78df9fe 
100644

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14747,6 +14747,7 @@ Specify the name of the target processor for 
which GCC should tune the

 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, 
@samp{cortex-a75},

+@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1}
 @samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
 @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-26 Thread Kyrill Tkachov


Hi Wilco,

On 11/19/19 3:11 PM, Wilco Dijkstra wrote:

ping

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2 for all Cortex-A cores.


Hmm, I'm not too confident on that. I'd support such a change for the 
generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I 
don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9.


So let's make the change for the Armv8-A-based cores now. If you get 
benchmarking data for the older ones (such systems may or may not be 
easy to get a hold of) we can update those separately.




  Set max_cond_insns
to 4 on Thumb-2 architectures given it's already limited to that by
MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.



This can go in as a separate patch from the rest, thanks.




On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-08-19  Wilco Dijkstra  

    * gcc/config/arm/arm.c (arm_option_override_internal):
    Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
    (arm_v6t2_tune): set max_cond_insns to 4.
    (arm_cortex_tune): set max_cond_insns to 2.
    (arm_cortex_a8_tune): Likewise.
    (arm_cortex_a7_tune): Likewise.
    (arm_cortex_a35_tune): Likewise.
    (arm_cortex_a53_tune): Likewise.
    (arm_cortex_a5_tune): Likewise.
    (arm_cortex_a9_tune): Likewise.
    (arm_v6m_tune): set max_cond_insns to 4.


No "gcc/" in the ChangeLog path.

Thanks,

Kyrill


---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2167,9 +2167,6 @@ const struct tune_params

Re: [GCC][ARM]: Fix the failing ACLE testcase with correct test directive.

2019-11-21 Thread Kyrill Tkachov


Hi Srinath,

On 11/21/19 4:32 PM, Srinath Parvathaneni wrote:

Hello,

This patch fixes arm acle testcase crc_hf_1.c by modifying the 
compiler options directive.


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk? If ok, please commit on my behalf, I don't have the 
commit rights.



This is ok.

I see Matthew has already committed it, which is fine. It's an obvious 
patch.


Thanks,

Kyrill



Thanks,
Srinath.

gcc/testsuite/ChangeLog:

2019-11-21  Srinath Parvathaneni 

    * gcc.target/arm/acle/crc_hf_1.c: Modify the compiler options 
directive from

    dg-options to dg-additional-options.



### Attachment also inlined for ease of reply    
###



diff --git a/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c 
b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c
index 
e6cbfc0b33e56e4275b96978ca1823d7682792fb..f1de2bdffee41a0f3259e2bf00296e9c3218f548 
100644

--- a/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c
+++ b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c
@@ -3,7 +3,7 @@

 /* { dg-do compile } */
 /* { dg-require-effective-target arm_hard_vfp_ok }  */
-/* { dg-options "-mfloat-abi=hard -march=armv8-a+simd+crc" } */
+/* { dg-additional-options "-mfloat-abi=hard -march=armv8-a+simd+crc" 
} */


 #include

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov




On 11/19/19 1:41 PM, Dennis Zhang wrote:

Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:

Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin_general): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
   __ARM_FEATURE_MEMORY_TAGGING when enabled.
   (aarch64_resolve_overloaded_builtin): Call
   aarch64_resolve_overloaded_builtin_general.
   * config/aarch64/aarch64-protos.h
   (aarch64_resolve_overloaded_builtin_general): New declaration.
   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
   (TARGET_MEMTAG): Likewise.
   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
   (irg, gmi, subp, addg, ldg, stg): New instructions.
   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
   * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
New.
   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
   * config/arm/types.md (memtag): New.
   * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

   * gcc.target/aarch64/acle/memtag_1.c: New test.
   * gcc.target/aarch64/acle/memtag_2.c: New test.
   * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics


The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
    AARCH64_MEMTAG_BUILTIN_START,
AARCH64_MEMTAG_BUILTIN_IRG,
    AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
    AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
    AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
    (aarch64_init_memtag_builtins): New.
    (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
    (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
    (aarch64_expand_builtin_memtag): New.
    (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
    (AARCH64_BUILTIN_SUBCODE): New macro.
    (aarch64_resolve_overloaded_memtag): New.
    (aarch64_resolve_overloaded_builtin): New

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Kyrill Tkachov


Hi Dennis,

On 11/12/19 5:32 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:

On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin_general): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  (aarch64_resolve_overloaded_builtin): Call
  aarch64_resolve_overloaded_builtin_general.
  * config/aarch64/aarch64-protos.h
  (aarch64_resolve_overloaded_builtin_general): New declaration.
  * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
  (TARGET_MEMTAG): Likewise.
  * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
  UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
  (irg, gmi, subp, addg, ldg, stg): New instructions.
  * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
macro.
  (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
  (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
Likewise.
  * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
  (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
  * config/arm/types.md (memtag): New.
  * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

  * gcc.target/aarch64/acle/memtag_1.c: New test.
  * gcc.target/aarch64/acle/memtag_2.c: New test.
  * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading
tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics

The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with
the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

   * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
   AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
   AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
   (aarch64_init_memtag_builtins): New.
   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
   (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
   (aarch64_expand_builtin_memtag): New.
   (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
   (AARCH64_BUILTIN_SUBCODE): New macro.
   (aarch64_resolve_overloaded_memtag): New.
   (aarch64_resolve_overloaded_builtin): New hook. Call
   aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtin

Re: [GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def

2019-11-18 Thread Kyrill Tkachov




On 11/18/19 12:54 PM, Tamar Christina wrote:

OK to backport to GCC 9?


Yes.

Thanks,

Kyrill



Thanks,
Tamar


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org 
On Behalf Of Kyrill Tkachov
Sent: Tuesday, September 24, 2019 14:32
To: Stam Markianos-Wright ; gcc-
patc...@gcc.gnu.org
Cc: nd ; Richard Earnshaw ;
James Greenhalgh ; Marcus Shawcroft

Subject: Re: [GCC][PATCH][AArch64] Update hwcap string for fp16fml in
aarch64-option-extensions.def

Hi all,

On 9/10/19 1:34 PM, Stam Markianos-Wright wrote:

Hi all,

This is a minor patch that fixes the entry for the fp16fml feature in
GCC's aarch64-option-extensions.def.

As can be seen in the Linux sources here
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinf
o.c#L69

the correct string is "asimdfhm", not "asimdfml".

Cross-compiled and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Also, I don't have commit rights, so could someone commit it on my behalf?


James approved it offline so I've committed it on Stam's behalf as
r276097 with a slightly adjusted ChangeLog:

2019-09-24  Stamatis Markianos-Wright 

      * config/aarch64/aarch64-option-extensions.def (fp16fml):
      Update hwcap string for fp16fml.

Thanks,

Kyrill


Thanks,
Stam Markianos-Wright


The diff is:

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def
b/gcc/config/aarch64/aarch64-option-extensions.def
index 9919edd43d0..60e8f28fff5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4",

AARCH64_FL_SM4,

AARCH64_FL_SIMD, \
    /* Enabling "fp16fml" also enables "fp" and "fp16".
   Disabling "fp16fml" just disables "fp16fml".  */
    AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \
-  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml")
+  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm")

    /* Enabling "sve" also enables "fp16", "fp" and "simd".
   Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3",
"sve2-sm4"



gcc/ChangeLog:

2019-09-09  Stamatis Markianos-Wright 

   * config/aarch64/aarch64-option-extensions.def: Updated hwcap
string for fp16fml.

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-18 Thread Kyrill Tkachov


Hi Prathamesh,

On 11/14/19 6:47 PM, Prathamesh Kulkarni wrote:

Hi,
As suggested in PR, the attached patch falls back to distributing
rshift over plus_expr instead of fallback widening -> arithmetic ->
narrowing sequence, if target support is not available.
Bootstrap+tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh



diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
new file mode 100644
index 000..b682f3f3b74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+#define N 1024
+unsigned char dst[N];
+unsigned char in1[N];
+unsigned char in2[N];
+
+void
+foo ()
+{
+  for( int x = 0; x < N; x++ )
+dst[x] = (in1[x] + in2[x] + 1) >> 1;
+}
+
+/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
+/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */


I think you'll want to make the test a bit strong to test the actual 
instructions expected here.
You'll also want to test the IFN_AVG_FLOOR case, as your patch adds support for 
it too.

Thanks,
Kyrill
 
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c

index 8ebbcd76b64..7025a3b4dc2 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
last_stmt_info, tree *type_out)
 
   /* Check for target support.  */

   tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
-  if (!new_vectype
-  || !direct_internal_fn_supported_p (ifn, new_vectype,
- OPTIMIZE_FOR_SPEED))
+
+  if (!new_vectype)
 return NULL;
 
+  bool ifn_supported

+= direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED);
+
   /* The IR requires a valid vector type for the cast result, even though
  it's likely to be discarded.  */
   *type_out = get_vectype_for_scalar_type (vinfo, type);
   if (!*type_out)
 return NULL;
 
-  /* Generate the IFN_AVG* call.  */

   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
   tree new_ops[2];
   vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
   unprom, new_vectype);
+
+  if (!ifn_supported)
+{
+  /* If there is no target support available, generate code
+to distribute rshift over plus and add one depending
+upon floor or ceil rounding.  */
+
+  tree one_cst = build_one_cst (new_type);
+
+  tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL);
+  gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], 
one_cst);
+
+  tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL);
+  gassign *g2 = gimple_build_assign (tmp2, RSHIFT_EXPR, new_ops[1], 
one_cst);
+
+  tree tmp3 = vect_recog_temp_ssa_var (new_type, NULL);
+  gassign *g3 = gimple_build_assign (tmp3, PLUS_EXPR, tmp1, tmp2);
+
+  tree tmp4 = vect_recog_temp_ssa_var (new_type, NULL);
+  tree_code c = (ifn == IFN_AVG_CEIL) ? BIT_IOR_EXPR : BIT_AND_EXPR;
+  gassign *g4 = gimple_build_assign (tmp4, c, new_ops[0], new_ops[1]);
+
+  tree tmp5 = vect_recog_temp_ssa_var (new_type, NULL);
+  gassign *g5 = gimple_build_assign (tmp5, BIT_AND_EXPR, tmp4, one_cst);
+
+  gassign *g6 = gimple_build_assign (new_var, PLUS_EXPR, tmp3, tmp5);
+
+  append_pattern_def_seq (last_stmt_info, g1, new_vectype);
+  append_pattern_def_seq (last_stmt_info, g2, new_vectype);
+  append_pattern_def_seq (last_stmt_info, g3, new_vectype);
+  append_pattern_def_seq (last_stmt_info, g4, new_vectype);
+  append_pattern_def_seq (last_stmt_info, g5, new_vectype);
+  return vect_convert_output (last_stmt_info, type, g6, new_vectype);
+}
+
+  /* Generate the IFN_AVG* call.  */
   gcall *average_stmt = gimple_build_call_internal (ifn, 2, new_ops[0],
new_ops[1]);
   gimple_call_set_lhs (average_stmt, new_var);

Re: [PATCH v2 0/6] Implement asm flag outputs for arm + aarch64

2019-11-14 Thread Kyrill Tkachov


Hi Richard,

On 11/14/19 10:07 AM, Richard Henderson wrote:

I've put the implementation into config/arm/aarch-common.c, so
that it can be shared between the two targets.  This required
a little bit of cleanup to the CC modes and constraints to get
the two targets to match up.

Changes for v2:
  * Document overflow flags.
  * Add "hs" and "lo" as aliases of "cs" and "cc".
  * Add unsigned cmp tests to asm-flag-6.c.

Richard Sandiford has given his ack for the aarch64 side.
I'm still looking for an ack for the arm side.

The arm parts look good to me, there's not too much arm-specific stuff 
that's not shared with aarch64 thankfully.


Thanks,

Kyrill




r~


Richard Henderson (6):
  aarch64: Add "c" constraint
  arm: Fix the "c" constraint
  arm: Rename CC_NOOVmode to CC_NZmode
  arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__
  arm: Add testsuite checks for asm-flag
  aarch64: Add testsuite checks for asm-flag

 gcc/config/arm/aarch-common-protos.h  |   6 +
 gcc/config/aarch64/aarch64-c.c    |   2 +
 gcc/config/aarch64/aarch64.c  |   3 +
 gcc/config/arm/aarch-common.c | 136 +
 gcc/config/arm/arm-c.c    |   1 +
 gcc/config/arm/arm.c  |  15 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |  35 
 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c |  38 
 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c |  62 ++
 gcc/testsuite/gcc.target/arm/asm-flag-1.c |  36 
 gcc/testsuite/gcc.target/arm/asm-flag-3.c |  38 
 gcc/testsuite/gcc.target/arm/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/arm/asm-flag-6.c |  62 ++
 gcc/config/aarch64/constraints.md |   4 +
 gcc/config/arm/arm-modes.def  |   4 +-
 gcc/config/arm/arm.md | 186 +-
 gcc/config/arm/constraints.md |   5 +-
 gcc/config/arm/predicates.md  |   2 +-
 gcc/config/arm/thumb1.md  |   8 +-
 gcc/config/arm/thumb2.md  |  34 ++--
 gcc/doc/extend.texi   |  39 
 22 files changed, 651 insertions(+), 125 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c

--
2.17.1

Re: [PATCH v2 2/6] arm: Fix the "c" constraint

2019-11-14 Thread Kyrill Tkachov




On 11/14/19 10:07 AM, Richard Henderson wrote:

The existing definition using register class CC_REG does not
work because CC_REGNUM does not support normal modes, and so
fails to match register_operand.  Use a non-register constraint
and the cc_register predicate instead.

    * config/arm/constraints.md (c): Use cc_register predicate.



Ok.

Does this need a backport to the branches?

Thanks,

Kyrill



---
 gcc/config/arm/constraints.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b76de81b85c..e02b678d26d 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -94,8 +94,9 @@
  "@internal
   Thumb only.  The union of the low registers and the stack register.")

-(define_register_constraint "c" "CC_REG"
- "@internal The condition code register.")
+(define_constraint "c"
+ "@internal The condition code register."
+ (match_operand 0 "cc_register"))

 (define_register_constraint "Cs" "CALLER_SAVE_REGS"
  "@internal The caller save registers.  Useful for sibcalls.")
--
2.17.1

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Kyrill Tkachov




On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

 * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
 AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
 AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
 AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
 AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
 (aarch64_init_memtag_builtins): New.
 (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
 (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
 (aarch64_expand_builtin_memtag): New.
 (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
 (AARCH64_BUILTIN_SUBCODE): New macro.
 (aarch64_resolve_overloaded_memtag): New.
 (aarch64_resolve_overloaded_builtin_general): New hook. Call
 aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
 * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
 __ARM_FEATURE_MEMORY_TAGGING when enabled.
 (aarch64_resolve_overloaded_builtin): Call
 aarch64_resolve_overloaded_builtin_general.
 * config/aarch64/aarch64-protos.h
 (aarch64_resolve_overloaded_builtin_general): New declaration.
 * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
 (TARGET_MEMTAG): Likewise.
 * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
 UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
 (irg, gmi, subp, addg, ldg, stg): New instructions.
 * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
 (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
 (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
 * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
 (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
 * config/arm/types.md (memtag): New.
 * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

 * gcc.target/aarch64/acle/memtag_1.c: New test.
 * gcc.target/aarch64/acle/memtag_2.c: New test.
 * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  * config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin):
  Add declaration.
  * conf

Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-12 Thread Kyrill Tkachov


Hi Christophe,

On 11/12/19 10:29 AM, Christophe Lyon wrote:

On Thu, 7 Nov 2019 at 11:26, Kyrill Tkachov  wrote:

Hi all,

This patch adds the plumbing for and an implementation of the saturation
intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics.
These intrinsics set the Q sticky bit in APSR if an overflow occurred.
ACLE allows the user to read that bit (within the same function, it's not
defined across function boundaries) using the __saturation_occurred
intrinsic
and reset it using __set_saturation_occurred.
Thus, if the user cares about the Q bit they would be using a flow such as:

__set_saturation_occurred (0); // reset the Q bit
...
__ssat (...) // Do some calculations involving __ssat
...
if (__saturation_occurred ()) // if Q bit set handle overflow
...

For the implementation this has a few implications:
* We must track the Q-setting side-effects of these instructions to make
sure
saturation reading/writing intrinsics are ordered properly.
This is done by introducing a new "apsrq" register (and associated
APSRQ_REGNUM) in a similar way to the "fake"" cc register.

* The RTL patterns coming out of these intrinsics can have two forms:
one where they set the APSRQ_REGNUM and one where they don't.
Which one is used depends on whether the function cares about reading the Q
flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the
__saturation_occurred, __set_saturation_occurred occurrences.
If no Q-flag read is present in the function we'll use the simpler
non-Q-setting form to allow for more aggressive scheduling and such.
If a Q-bit read is present then the Q-setting form is emitted.
To avoid adding two patterns for each intrinsic to the MD file we make
use of define_subst to auto-generate the Q-setting forms

* Some existing patterns already produce instructions that may clobber the
Q bit, but they don't model it (as we didn't care about that bit up till
now).
Since these patterns can be generated from straight-line C code they can
affect
the Q-bit reads from intrinsics. Therefore they have to be disabled when
a Q-bit read is present.  These are mostly patterns in arm-fixed.md that are
not very common anyway, but there are also a couple of widening
multiply-accumulate patterns in arm.md that can set the Q-bit during
accumulation.

There are more Q-setting intrinsics in ACLE, but these will be
implemented in
a more mechanical fashion once the infrastructure in this patch goes in.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

  * config/arm/aout.h (REGISTER_NAMES): Add apsrq.
  * config/arm/arm.md (APSRQ_REGNUM): Define.
  (add_setq): New define_subst.
  (add_clobber_q_name): New define_subst_attr.
  (add_clobber_q_pred): Likewise.
  (maddhisi4): Change to define_expand.  Split into mult and add if
  ARM_Q_BIT_READ.
  (arm_maddhisi4): New define_insn.
  (*maddhisi4tb): Disable for ARM_Q_BIT_READ.
  (*maddhisi4tt): Likewise.
  (arm_ssat): New define_expand.
  (arm_usat): Likewise.
  (arm_get_apsr): New define_insn.
  (arm_set_apsr): Likewise.
  (arm_saturation_occurred): New define_expand.
  (arm_set_saturation): Likewise.
  (*satsi_): Rename to...
  (satsi_): ... This.
  (*satsi__shift): Disable for ARM_Q_BIT_READ.
  * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.
  (CALL_USED_REGISTERS): Mark apsrq.
  (FIRST_PSEUDO_REGISTER): Update value.
  (REG_ALLOC_ORDER): Add APSRQ_REGNUM.
  (machine_function): Add q_bit_access.
  (ARM_Q_BIT_READ): Define.
  * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.
  (arm_conditional_register_usage): Clear APSRQ_REGNUM from
  operand_reg_set.
  (arm_q_bit_access): Define.
  * config/arm/arm-builtins.c: Include stringpool.h.
  (arm_sat_binop_imm_qualifiers,
  arm_unsigned_sat_binop_unsigned_imm_qualifiers,
  arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.
  (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,
  UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,
  SET_SAT_QUALIFIERS): Likewise.
  (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.
  (arm_init_acle_builtins): Initialize __builtin_sat_imm_check.
  Handle 0 argument expander.
  (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.
  (arm_check_builtin_call): Define.
  * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,
  arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.
  * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.
  (arm_q_bit_access): Likewise.
  * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,
  __saturation_occurred, __set_saturation_occurred): Define.
  * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
  saturation_occurred, set_saturation_occurred.

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-11-12 Thread Kyrill Tkachov


Hi Christophe,

On 10/18/19 2:18 PM, Christophe Lyon wrote:

Hi,

This patch extends support for -mpure-code to all thumb-1 processors,
by removing the need for MOVT.

Symbol addresses are built using upper8_15, upper0_7, lower8_15 and
lower0_7 relocations, and constants are built using sequences of
movs/adds and lsls instructions.

The extension of the *thumb1_movhf pattern uses always the same size
(6) although it can emit a shorter sequence when possible. This is
similar to what *arm32_movhf already does.

CASE_VECTOR_PC_RELATIVE is now false with -mpure-code, to avoid
generating invalid assembly code with differences from symbols from
two different sections (the difference cannot be computed by the
assembler).

Tests pr45701-[12].c needed a small adjustment to avoid matching
upper8_15 when looking for the r8 register.

Test no-literal-pool.c is augmented with __fp16, so it now uses
-mfp16-format=ieee.

Test thumb1-Os-mult.c generates an inline code sequence with
-mpure-code and computes the multiplication by using a sequence of
add/shift rather than using the multiply instruction, so we skip it in
presence of -mpure-code.

With -mcpu=cortex-m0, the pure-code/no-literal-pool.c fails because
code like:
static char *p = "Hello World";
char *
testchar ()
{
  return p + 4;
}
generates 2 indirections (I removed non-essential directives/code)
    .section    .rodata
.LC0:
.ascii  "Hello World\000"
.data
p:
.word   .LC0
.section    .rodata
.LC2:
.word   p
.section .text,"0x2006",%progbits
testchar:
push    {r7, lr}
add r7, sp, #0
movs    r3, #:upper8_15:#.LC2
lsls    r3, #8
adds    r3, #:upper0_7:#.LC2
lsls    r3, #8
adds    r3, #:lower8_15:#.LC2
lsls    r3, #8
adds    r3, #:lower0_7:#.LC2
ldr r3, [r3]
ldr r3, [r3]
adds    r3, r3, #4
movs    r0, r3
mov sp, r7
@ sp needed
pop {r7, pc}

By contrast, when using -mcpu=cortex-m4, the code looks like:
    .section    .rodata
.LC0:
.ascii  "Hello World\000"
.data
p:
.word   .LC0
testchar:
push    {r7}
add r7, sp, #0
movw    r3, #:lower16:p
movt    r3, #:upper16:p
ldr r3, [r3]
adds    r3, r3, #4
mov r0, r3
mov sp, r7
pop {r7}
bx  lr

I haven't found yet how to make code for cortex-m0 apply upper/lower
relocations to "p" instead of .LC2. The current code looks functional,
but could be improved.

OK as-is?

Thanks,

Christophe



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f995974..beb8411 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -66,6 +66,7 @@ extern bool arm_small_register_classes_for_mode_p 
(machine_mode);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
+extern void thumb1_gen_const_int (rtx, HOST_WIDE_INT);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
   HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0975d..836f147 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2882,13 +2882,19 @@ arm_option_check_internal (struct gcc_options *opts)
 {
   const char *flag = (target_pure_code ? "-mpure-code" :
 "-mslow-flash-data");
+  bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON;
 
-  /* We only support -mpure-code and -mslow-flash-data on M-profile targets

-with MOVT.  */
-  if (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON)
+  /* We only support -mslow-flash-data on M-profile targets with
+MOVT.  */
+  if (target_slow_flash_data && (!TARGET_HAVE_MOVT || not_supported))
error ("%s only supports non-pic code on M-profile targets with the "
   "MOVT instruction", flag);
 
+  /* We only support -mpure-code-flash-data on M-profile

+targets.  */


Typo in the option name.

+  if (target_pure_code && not_supported)
+   error ("%s only supports non-pic code on M-profile targets", flag);
+
   /* Cannot load addresses: -mslow-flash-data forbids literal pool and
 -mword-relocations forbids relocation of MOVT/MOVW.  */
   if (target_word_relocations)
@@ -4400,6 +4406,38 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code 
code)
 }
 }
 
+/* Emit a sequence of movs/adds/shift to produce a 32-bit constant.

+   Avoid generating useless code when one of the bytes is zero.  */
+void
+thumb1_gen_const_int (rtx op0, HOST_WIDE_INT op1)
+{
+  bool mov_done_p = false;
+  int i;
+
+  /* Emit upper 3 bytes if needed.  */
+  for (i = 0; i < 3; i++)
+{
+  int byte = (op1 >> (8 * (3 - i))) & 0xff;
+
+  if (byte)
+   {
+ emit_set_insn (op0, mov_done_p
+? gen_rtx_PLUS (SImode,op0, GEN_INT (byte))
+: GEN_INT (byte));
+ mov_done_p =

Re: [PATCH, GCC/ARM, 10/10] Enable -mcmse

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 10/10] Enable -mcmse

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable the
-mcmse option now that support for Armv8.1-M Security Extension is
complete.

=== Patch description ===

The patch is straightforward: it redefines ARMv8_1m_main as having the
same features as ARMv8m_main (and thus as having the cmse feature) with
the extra features represented by armv8_1m_main.  It also removes the
error for using -mcmse on Armv8.1-M Mainline.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-cpus.in (ARMv8_1m_main): Redefine as an 
extension to

    Armv8-M Mainline.
    * config/arm/arm.c (arm_options_perform_arch_sanity_checks): 
Remove

    error for using -mcmse when targeting Armv8.1-M Mainline.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?


Ok once the rest of the series is in.

Does this need some documentation though?

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
652f2a4be9388fd7a74f0ec4615a292fd1cfcd36..a845dd2f83a38519a1387515a2d4646761fb405f 
100644

--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -259,10 +259,7 @@ define fgroup ARMv8_5a    ARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
-# Feature cmse is omitted to disable Security Extensions support 
while secure
-# code compiled by GCC does not preserve FP context as allowed by 
Armv8.1-M

-# Mainline.
-define fgroup ARMv8_1m_main ARMv7m armv8 armv8_1m_main
+define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main

 # Useful combinations.
 define fgroup VFPv2 vfpv2
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
cabcce8c8bd11c5ff3516c3102c0305b865b00cb..0f19b4eb4ec4fcca2df10e1b8e0b79d1a1e0a93d 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3742,9 +3742,6 @@ arm_options_perform_arch_sanity_checks (void)
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
 sorry ("__fp16 and no ldrh");

-  if (use_cmse && arm_arch8_1m_main)
-    error ("ARMv8.1-M Mainline Security Extensions is unsupported");
-
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");

Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to call
functions with the cmse_nonsecure_call attribute directly using blxns
with no undue restriction on the register used for that.

=== Patch description ===

This change to use BLXNS to call a nonsecure function from secure
directly (not using a libcall) is made in 2 steps:
- change nonsecure_call patterns to use blxns instead of calling
  __gnu_cmse_nonsecure_call
- loosen requirement for function address to allow any register when
  doing BLXNS.

The former is a straightforward check over whether instructions added in
Armv8.1-M Mainline are available while the latter consist in making the
nonsecure call pattern accept any register by using match_operand and
changing the nonsecure_call_internal expander to no force r4 when
targeting Armv8.1-M Mainline.

The tricky bit is actually in the test update, specifically how to check
that register lists for CLRM have all registers except for the one
holding parameters (already done) and the one holding the address used
by BLXNS. This is achieved with 3 scan-assembler directives.

1) The first one lists all registers that can appear in CLRM but make
   each of them optional.
   Property guaranteed: no wrong register is cleared and none appears
   twice in the register list.
2) The second directive check that the CLRM is made of a fixed number
   of the right registers to be cleared. The number used is the number
   of registers that could contain a secret minus one (used to hold the
   address of the function to call.
   Property guaranteed: register list has the right number of registers
   Cumulated property guaranteed: only registers with a potential secret
   are cleared and they are all listed but ont
3) The last directive checks that we cannot find a CLRM with a register
   in it that also appears in BLXNS. This is check via the use of a
   back-reference on any of the allowed register in CLRM, the
   back-reference enforcing that whatever register match in CLRM must be
   the same in the BLXNS.
   Property guaranteed: register used for BLXNS is different from
   registers cleared in CLRM.

Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c
testcase due to there being two CLRM generated. To ensure the third
directive match the right CLRM to the BLXNS, a negative lookahead is
used between the CLRM register list and the BLXNS. The way negative
lookahead work is by matching the *position* where a given regular
expression does not match. In this case, since it comes after the CLRM
register list it is requesting that what comes after the register list
does not have a CLRM again followed by BLXNS. This guarantees that the
.*blxns after only matches a blxns without another CLRM before.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.md (nonsecure_call_internal): Do not force memory
    address in r4 when targeting Armv8.1-M Mainline.
    (nonsecure_call_value_internal): Likewise.
    * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make 
memory address

    a register match_operand again.  Emit BLXNS when targeting
    Armv8.1-M Mainline.
    (nonsecure_call_value_reg_thumb2): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when 
instructions
    introduced in Armv8.1-M Mainline Security Extensions are 
available and
    restrict checks for libcall to __gnu_cmse_nonsecure_call to 
Armv8-M
    targets only.  Adapt CLRM check to verify register used for 
BLXNS is

    not in the CLRM register list.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise and 
adapt
    check for LSB clearing bit to be using the same register as 
BLXNS when

    targeting Armv8.1-M Mainline.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    *

Re: [PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 3:24 PM, Mihail Ionescu wrote:
[PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall 
function


Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
lazy store and load instruction inline when calling a function with the
cmse_nonsecure_call attribute with the soft or softfp floating-point
ABI.

=== Patch description ===

This patch adds two new patterns for the VLSTM and VLLDM instructions.
cmse_nonsecure_call_inline_register_clear is then modified to
generate VLSTM and VLLDM respectively before and after calls to
functions with the cmse_nonsecure_call attribute in order to have lazy
saving, clearing and restoring of VFP registers. Since these
instructions do not do writeback of the base register, the stack is 
adjusted

prior the lazy store and after the lazy load with appropriate frame
debug notes to describe the effect on the CFA register.

As with CLRM, VSCCLRM and VSTR/VLDR, the instruction is modeled as an
unspecified operation to the memory pointed to by the base register.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (arm_add_cfa_adjust_cfa_note): Declare early.
    (cmse_nonsecure_call_inline_register_clear): Define new 
lazy_fpclear

    variable as true when floating-point ABI is not hard.  Replace
    check against TARGET_HARD_FLOAT_ABI by checks against 
lazy_fpclear.

    Generate VLSTM and VLLDM instruction respectively before and
    after a function call to cmse_nonsecure_call function.
    * config/arm/unspecs.md (VUNSPEC_VLSTM): Define unspec.
    (VUNSPEC_VLLDM): Likewise.
    * config/arm/vfp.md (lazy_store_multiple_insn): New define_insn.
    (lazy_load_multiple_insn): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Add check 
for VLSTM and

    VLLDM.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
bcc86d50a10f11d9672258442089a0aa5c450b2f..b10f996c023e830ca24ff83fcbab335caf85d4cb 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -186,6 +186,7 @@ static int arm_register_move_cost (machine_mode, 
reg_class_t, reg_class_t);

 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
+static void arm_add_cfa_adjust_cfa_note (rtx, int, rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
 static int vfp_emit_fstmd (int, int);
@@ -17830,6 +17831,9 @@ cmse_nonsecure_call_inline_register_clear (void)
   FOR_BB_INSNS (bb, insn)
 {
   bool clear_callee_saved = TARGET_HAVE_FPCTX_CMSE;
+ /* frame = VFP regs + FPSCR + VPR.  */
+ unsigned lazy_store_stack_frame_size =
+   (LAST_VFP_REGNUM - FIRST_VFP_REGNUM + 1 + 2) * UNITS_PER_WORD;
   unsigned long callee_saved_mask =
 ((1 << (LAST_HI_REGNUM + 1)) - 1)
 & ~((1 << (LAST_ARG_REGNUM + 1)) - 1);
@@ -17847,7 +17851,7 @@ cmse_nonsecure_call_inline_register_clear (void)
   CUMULATIVE_ARGS args_so_far_v;
   cumulative_args_t args_so_far;
   tree arg_type, fntype;
- bool first_param = true;
+ bool first_param = true, lazy_fpclear = !TARGET_HARD_FLOAT_ABI;
   function_args_iterator args_iter;
   uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U};

@@ -17881,7 +17885,7 @@ cmse_nonsecure_call_inline_register_clear (void)
  -mfloat-abi=hard.  For -mfloat-abi=softfp we will be 
using the
  lazy store and loads which clear both caller- and 
callee-saved

  registers.  */
- if (TARGET_HARD_FLOAT_ABI)
+ if (!lazy_fpclear)
 {
   auto_sbitmap float_bitmap (maxregno + 1);

@@ -17965,8 +17969,23 @@ cmse_nonsecure_call_inline_register_clear (void)
  disabled for pop (see below).  */
   RTX_FRAME_RELATED_P

Re: [PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:
[PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall 
functions


Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline instructions to save, clear and restore callee-saved VFP
registers before doing a call to a function with the cmse_nonsecure_call
attribute.

=== Patch description ===

The patch is fairly straightforward in its approach and consist of the
following 3 logical changes:
- abstract the number of floating-point register to clear in
  max_fp_regno
- use max_fp_regno to decide how many registers to clear so that the
  same code works for Armv8-M and Armv8.1-M Mainline
- emit vpush and vpop instruction respectively before and after a
  nonsecure call

Note that as in the patch to clear GPRs inline, debug information has to
be disabled for VPUSH and VPOP due to VPOP adding CFA adjustment note
for SP when R7 is sometimes used as CFA.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (vfp_emit_fstmd): Declare early.
    (arm_emit_vfp_multi_reg_pop): Likewise.
    (cmse_nonsecure_call_inline_register_clear): Abstract number 
of VFP
    registers to clear in max_fp_regno.  Emit VPUSH and VPOP to 
save and

    restore callee-saved VFP registers.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Add 
check for

    VPUSH and VPOP and update expectation for VSCCLRM.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?


Ok.

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c24996897eb21c641914326f7064a26bbb363411..bcc86d50a10f11d9672258442089a0aa5c450b2f 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -188,6 +188,8 @@ static void emit_constant_insn (rtx cond, rtx 
pattern);

 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
+static int vfp_emit_fstmd (int, int);
+static void arm_emit_vfp_multi_reg_pop (int, int, rtx);
 static int arm_arg_partial_bytes (cumulative_args_t,
   const function_arg_info &);
 static rtx arm_function_arg (cumulative_args_t, const 
function_arg_info &);

@@ -17834,8 +17836,10 @@ cmse_nonsecure_call_inline_register_clear (void)
   unsigned address_regnum, regno;
   unsigned max_int_regno =
 clear_callee_saved ? IP_REGNUM : LAST_ARG_REGNUM;
+ unsigned max_fp_regno =
+   TARGET_HAVE_FPCTX_CMSE ? LAST_VFP_REGNUM : D7_VFP_REGNUM;
   unsigned maxregno =
-   TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : max_int_regno;
+   TARGET_HARD_FLOAT_ABI ? max_fp_regno : max_int_regno;
   auto_sbitmap to_clear_bitmap (maxregno + 1);
   rtx_insn *seq;
   rtx pat, call, unspec, clearing_reg, ip_reg, shift;
@@ -17883,7 +17887,7 @@ cmse_nonsecure_call_inline_register_clear (void)

   bitmap_clear (float_bitmap);
   bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM,
-   D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1);
+   max_fp_regno - FIRST_VFP_REGNUM + 1);
   bitmap_ior (to_clear_bitmap, to_clear_bitmap, 
float_bitmap);

 }

@@ -17960,6 +17964,16 @@ cmse_nonsecure_call_inline_register_clear (void)
   /* Disable frame debug info in push because it needs to be
  disabled for pop (see below).  */
   RTX_FRAME_RELATED_P (push_insn) = 0;
+
+ /* Save VFP callee-saved registers.  */
+ if (TARGET_HARD_FLOAT_ABI)
+   {
+ vfp_emit_fstmd (D7_VFP_REGNUM + 1,
+ (max_fp_regno - D7_VFP_REGNUM) / 2);
+ /* Disable frame debug info in push because it needs 
to be

+    disabled for vpop (see below).  */
+ RTX_FRAME_RELATED_P (get_last_insn ()) = 0;
+   }
 }

   /* Clear caller-saved registers that leak before doing a 
non-secure

@@ -17974,9 +17988,25 @@ cmse_nonsecure_call_inline_register_clear (void)

   if (TARGET_HAVE_FPCTX_CMSE)

Re: [PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline callee-saved register clearing when calling a function with the
cmse_nonsecure_call attribute with the ultimate goal of having the whole
call sequence inline.

=== Patch description ===

Besides changing the set of registers that needs to be cleared inline,
this patch also generates the push and pop to save and restore
callee-saved registers without trusting the callee inline. To make the
code more future-proof, this (currently) Armv8.1-M specific behavior is
expressed in terms of clearing of callee-saved registers rather than
directly based on the targets.

The patch contains 1 subtlety:

Debug information is disabled for push and pop because the
REG_CFA_RESTORE notes used to describe popping of registers do not stack.
Instead, they just reset the debug state for the register to the one at
the beginning of the function, which is incorrect for a register that is
pushed twice (in prologue and before nonsecure call) and then popped for
the first time. In particular, this occasionally trips CFI note creation
code when there are two codepaths to the epilogue, one of which does not
go through the nonsecure call. Obviously this mean that debugging
between the push and pop is not reliable.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (arm_emit_multi_reg_pop): Declare early.
    (cmse_nonsecure_call_clear_caller_saved): Rename into ...
    (cmse_nonsecure_call_inline_register_clear): This. Save and clear
    callee-saved GPRs as well as clear ip register before doing a 
nonsecure

    call then restore callee-saved GPRs after it when targeting
    Armv8.1-M Mainline.
    (arm_reorg): Adapt to function rename.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for PUSH and POP and 
update

    CLRM check.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?



This is ok.

I think you should get commit access to GCC by now.

Please fill in the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi

listing me as the approver (using my details from the MAINTAINERS file).

Of course, only commit this once the whole series is approved ;)

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
fca10801c87c5e635d573c0fbdc47a1ae229d0ef..12b4b42a66b0c5589690d9a2d8cf8e42712ca2c0 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -187,6 +187,7 @@ static int arm_memory_move_cost (machine_mode, 
reg_class_t, bool);

 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
+static void arm_emit_multi_reg_pop (unsigned long);
 static int arm_arg_partial_bytes

Re: [PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
VSCCLRM to do all the VFP register clearing as well as clearing the VFP
register.

=== Patch description ===

This patch adds a new pattern for the VSCCLRM instruction.
cmse_clear_registers () is then modified to use the new VSCCLRM
instruction when targeting Armv8.1-M Mainline, thus, making the Armv8-M
register clearing code specific to Armv8-M.

Since the VSCCLRM instruction mandates VPR in the register list, the
pattern is encoded with a parallel which only requires an unspecified
VUNSPEC_CLRM_VPR constant modelling the APSR clearing. Other expression
in the parallel are expected to be set expression for clearing the VFP
registers.

I see we don't represent the VPR here as a register and use an UNSPEC to 
represent its clearing.


That's okay for now but when we do add support for it for MVE we'll need 
to adjust the RTL representation here to show its clobbering.




ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Adapt prototype.
    * config/arm/arm.c (clear_operation_p): Extend to be able to 
check a

    clear_vfp_multiple pattern based on a new vfp parameter.
    (cmse_clear_registers): Generate VSCCLRM to clear VFP 
registers when

    targeting Armv8.1-M Mainline.
    (cmse_nonsecure_entry_clear_before_return): Clear VFP registers
    unconditionally when targeting Armv8.1-M Mainline 
architecture.  Check
    whether VFP registers are available before looking 
call_used_regs for a

    VFP register.
    * config/arm/predicates.md (clear_multiple_operation): Adapt 
to change

    of prototype of clear_operation_p.
    (clear_vfp_multiple_operation): New predicate.
    * config/arm/unspecs.md (VUNSPEC_VSCCLRM_VPR): New volatile 
unspec.

    * config/arm/vfp.md (clear_vfp_multiple): New define_insn.


Ok.

Thanks,

kyrill




*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for VSCCLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.

Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
1a948d2c97526ad7e67e8d4a610ac74cfdb13882..37a46982bbc1a8f17abe2fc76ba3cb7d65257c0d 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,7 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);

 extern int thumb1_legitimate_address_p (machine_mode, rtx, int);
 extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode,
  bool, bool);
-extern bool clear_operation_p (rtx);
+extern bool clear_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int vfp3_const_double_rtx (rtx);
 extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, 
int *);

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f1f730cecff0fb3da7115ea1147dc8b9ab7076b7..5f3ce5c4605f609d1a0e31c0f697871266bdf835 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13499,8 +13499,9 @@ ldm_stm_operation_p (rtx op, bool load, 
machine_mode mode,

   return true;
 }

-/* Checks whether OP is a valid parallel pattern for a CLRM insn.  To 
be a

-   valid CLRM pattern, OP must have the following form:
+/* Checks whether OP is a valid parallel pattern for a CLRM (if VFP 
is

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM instruction
when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple instruction 
pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register clearing if
    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks for 
Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,6 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);

 extern int

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Kyrill Tkachov


Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.


Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

     * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
     AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
     AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
     AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
     AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
     (aarch64_init_memtag_builtins): New.
     (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
     (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
     (aarch64_expand_builtin_memtag): New.
     (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
     (AARCH64_BUILTIN_SUBCODE): New macro.
     (aarch64_resolve_overloaded_memtag): New.
     (aarch64_resolve_overloaded_builtin): New hook. Call
     aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
     * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
     __ARM_FEATURE_MEMORY_TAGGING when enabled.
     * config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin):
     Add declaration.
     * config/aarch64

Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-11 Thread Kyrill Tkachov


Hi Richard,

On 11/9/19 12:44 PM, Richard Henderson wrote:

On 11/7/19 11:26 AM, Kyrill Tkachov wrote:

-;; The code sequence emitted by this insn pattern uses the Q flag, which GCC
-;; doesn't generally know about, so we don't bother expanding to individual
-;; instructions.  It may be better to just use an out-of-line asm libcall for
-;; this.
+;; The code sequence emitted by this insn pattern uses the Q flag, so we need
+;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead.
+
+(define_expand "ssmulsa3"
+  [(parallel [(set (match_operand:SA 0 "s_register_operand")
+   (ss_mult:SA (match_operand:SA 1 "s_register_operand")
+   (match_operand:SA 2 "s_register_operand")))
+   (clobber (match_scratch:DI 3))
+   (clobber (match_scratch:SI 4))
+   (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_32BIT && arm_arch6"
+  {
+if (ARM_Q_BIT_READ)
+  FAIL;
+  }
+)

Coming back to this, why would you not just represent the update of the Q bit?
  This is not generated by generic pattern matching, but by the __ssmulsa3
builtin function.  It seems easy to me to simply describe how this older
builtin operates in conjunction with the new acle builtins.

I recognize that ssadd3 etc are more difficult, because they can be
generated by arithmetic operations on TYPE_SATURATING.  Although again it seems
weird to generate expensive out-of-line code for TYPE_SATURATING when used in
conjunction with acle builtins.

I think it would be better to merely expand the documentation.  Even if only so
far as to say "unsupported to mix these".


I'm tempted to agree, as this part of the patch is quite ugly.

Thank you for the comments on these patches, I wasn't aware of some of 
the mechanisms.


I guess I should have posted the series as an RFC first...

I'll send patches to fix up the issues.

Thanks,

Kyrill


+(define_expand "maddhisi4"
+  [(set (match_operand:SI 0 "s_register_operand")
+   (plus:SI (mult:SI (sign_extend:SI
+  (match_operand:HI 1 "s_register_operand"))
+ (sign_extend:SI
+  (match_operand:HI 2 "s_register_operand")))
+(match_operand:SI 3 "s_register_operand")))]
+  "TARGET_DSP_MULTIPLY"
+  {
+/* If this function reads the Q bit from ACLE intrinsics break up the
+   multiplication and accumulation as an overflow during accumulation will
+   clobber the Q flag.  */
+if (ARM_Q_BIT_READ)
+  {
+   rtx tmp = gen_reg_rtx (SImode);
+   emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2]));
+   emit_insn (gen_addsi3 (operands[0], tmp, operands[3]));
+   DONE;
+  }
+  }
+)
+
+(define_insn "*arm_maddhisi4"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (sign_extend:SI
   (match_operand:HI 1 "s_register_operand" "r"))
  (sign_extend:SI
   (match_operand:HI 2 "s_register_operand" "r")))
 (match_operand:SI 3 "s_register_operand" "r")))]
-  "TARGET_DSP_MULTIPLY"
+  "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ"
"smlabb%?\\t%0, %1, %2, %3"
[(set_attr "type" "smlaxy")
 (set_attr "predicable" "yes")]

I think this case would be better represented with a single
define_insn_and_split and a peephole2.  It is easy to notice during peep2
whether or not the Q bit is actually live at the exact place we want to expand
this operation.  If it is live, then use two insns; if it isn't, use one.


r~

[PATCH][arm][4/X] Add initial support for GE-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch adds in plumbing for the ACLE intrinsics that set the GE bits in
APSR.  These are special SIMD instructions in Armv6 that pack bytes or
halfwords into the 32-bit general-purpose registers and set the GE bits in
APSR to indicate if some of the "lanes" of the result have overflowed or 
have

some other instruction-specific property.
These bits can then be used by the SEL instruction (accessed through the 
__sel

intrinsic) to select lanes for further processing.

This situation is similar to the Q-setting intrinsics: we have to track 
the GE

fake register, detect when a function reads it through __sel and restrict
existing patterns that may generate GE-clobbering instruction from
straight-line C code when reading the GE bits matters.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committed to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

    * config/arm/aout.h (REGISTER_NAMES): Add apsrge.
    * config/arm/arm.md (APSRGE_REGNUM): Define.
    (arm_): New define_insn.
    (arm_sel): Likewise.
    * config/arm/arm.h (FIXED_REGISTERS): Add entry for apsrge.
    (CALL_USED_REGISTERS): Likewise.
    (REG_ALLOC_ORDER): Likewise.
    (FIRST_PSEUDO_REGISTER): Update value.
    (ARM_GE_BITS_READ): Define.
    * config/arm/arm.c (arm_conditional_register_usage): Clear
    APSRGE_REGNUM from operand_reg_set.
    (arm_ge_bits_access): Define.
    * config/arm/arm-builtins.c (arm_check_builtin_call): Handle
    ARM_BUIILTIN_sel.
    * config/arm/arm-protos.h (arm_ge_bits_access): Declare prototype.
    * config/arm/arm-fixed.md (add3): Convert to define_expand.
    FAIL if ARM_GE_BITS_READ.
    (*arm_add3): New define_insn.
    (sub3): Convert to define_expand.  FAIL if ARM_GE_BITS_READ.
    (*arm_sub3): New define_insn.
    * config/arm/arm_acle.h (__sel, __sadd8, __ssub8, __uadd8, __usub8,
    __sadd16, __sasx, __ssax, __ssub16, __uadd16, __uasx, __usax,
    __usub16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_GE): New int_iterator.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md (UNSPEC_GE_SET): Define.
    (UNSPEC_SEL, UNSPEC_SADD8, UNSPEC_SSUB8, UNSPEC_UADD8, UNSPEC_USUB8,
    UNSPEC_SADD16, UNSPEC_SASX, UNSPEC_SSAX, UNSPEC_SSUB16, UNSPEC_UADD16,
    UNSPEC_UASX, UNSPEC_USAX, UNSPEC_USUB16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.
    * gcc.target/arm/acle/simd32_sel.c: New test.

diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index a5f83cb503f61cc1cab0e61795edde33250610e7..72782758853a869bcb9a9d69f3fa0da979cd711f 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -72,7 +72,7 @@
   "wr8",   "wr9",   "wr10",  "wr11",\
   "wr12",  "wr13",  "wr14",  "wr15",\
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",\
-  "cc", "vfpcc", "sfp", "afp", "apsrq"\
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"		\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 995f50785f6ebff7b3cd47185516f7bcb4fd5b81..2d902d0b325bc1fe5e22831ef8a59a2bb37c1225 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3370,6 +3370,13 @@ arm_check_builtin_call (location_t , vec , tree fndecl,
 	  = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
 		   DECL_ATTRIBUTES (cfun->decl));
 }
+  if (fcode == ARM_BUILTIN_sel)
+{
+  if (cfun && cfun->decl)
+	DECL_ATTRIBUTES (cfun->decl)
+	  = tree_cons (get_identifier ("acle gebits"), NULL_TREE,
+		   DECL_ATTRIBUTES (cfun->decl));
+}
   return true;
 }
 
diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
index 85dbc5d05c35921bc5115df68d30292a712729cf..6d949ba7064c0587d4c5d7b855f2c04c6d0e08e7 100644
--- a/gcc/config/arm/arm-fixed.md
+++ b/gcc/config/arm/arm-fixed.md
@@ -28,11 +28,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "add3"
+(define_expand "add3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+  FAIL;
+  }
+)
+
+(define_insn "*arm_add3"
   [(set (match_operand:ADDSUB 0 "s_register_operand" "=r")
 	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand" "r")
 		 (match_operand:ADDSUB 2 "s_register_operand" "r")))]
-  "TARGET_INT_SIMD"
+  "TARGET_INT_SIMD && !ARM_GE_BITS_READ"
   "sadd%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_dsp_reg")])
@@ -76,11 +87,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "sub3"
+(define_expand "sub3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(minus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+

[PATCH][arm][5/X] Implement Q-bit-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-setting intrinsics of the 
multiply-accumulate
variety, but these are in the SIMD32 family in that they treat their 
operands

as packed SIMD values, but that's not important at the RTL level.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm__insn):
    New define_insns.
    (arm_): New define_expands.
    * config/arm/arm_acle.h (__smlad, __smladx, __smlsd, __smlsdx,
    __smuad, __smuadx): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_TERNOP_Q): New int_iterator.
    (SIMD32_BINOP_Q): Likewise.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md: Define unspecs for the above.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 884a224a991102955787600317581e6468463bea..7717f547ab4706183d2727013496c249edbe7abf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5865,6 +5865,62 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_sreg")])
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:SI 3 "s_register_operand" "r")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")
+	   (match_operand:SI 3 "s_register_operand")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2], operands[3]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2], operands[3]));
+DONE;
+  }
+)
+
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index b8d02a5502f273fcba492bbeba2542b13334a8ea..c30645e3949f84321fb1dfe3afd06167ef859d62 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -522,6 +522,48 @@ __usub16 (uint16x2_t __a, uint16x2_t __b)
   return __builtin_arm_usub16 (__a, __b);
 }
 
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlad (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlad (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smladx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smladx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsd (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsd (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsdx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsdx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuad (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuad (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuadx (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuadx (__a, __b);
+}
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 715c3c94e8c8f6355e880a36eb275be80d1a3912..018d89682c61a963961515823420f1b986cd40db 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -107,3 +107,10 @@ VAR1 (UBINOP, usax, si)
 VAR1 (UBINOP, usub16, si)
 
 VAR1 (UBINOP, sel, si)

[PATCH][arm][6/X] Add support for __[us]sat16 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This last patch adds the the __ssat16 and __usat16 intrinsics that perform
"clipping" to a particular bitwidth on packed SIMD values, setting the Q bit
as appropriate.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_expand.
    (arm__insn): New define_insn.
    * config/arm/arm_acle.h (__ssat16, __usat16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (USSAT16): New int_iterator.
    (simd32_op): Handle UNSPEC_SSAT16, UNSPEC_USAT16.
    (sup): Likewise.
    * config/arm/predicates.md (ssat16_imm): New predicate.
    (usat16_imm): Likewise.
    * config/arm/unspecs.md (UNSPEC_SSAT16, UNSPEC_USAT16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7717f547ab4706183d2727013496c249edbe7abf..f2f5094f9e2a802557e5c19db1edbc028a91cbd8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5921,6 +5921,33 @@
   }
 )
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "sat16_imm" "i")] USSAT16))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %2, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "sat16_imm")] USSAT16))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index c30645e3949f84321fb1dfe3afd06167ef859d62..9ea922f2d096870d2c2d34ac43f03e3bc9dc4741 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -564,6 +564,24 @@ __smuadx (int16x2_t __a, int16x2_t __b)
   return __builtin_arm_smuadx (__a, __b);
 }
 
+#define __ssat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 1, 16);			\
+int16x2_t __res = __builtin_arm_ssat16 (__arg, __sat);	\
+__res;			\
+  })
+
+#define __usat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 0, 15);			\
+int16x2_t __res = __builtin_arm_usat16 (__arg, __sat);	\
+__res;			\
+  })
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 018d89682c61a963961515823420f1b986cd40db..8a21ff74f41840dd793221e079627055d379c474 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -114,3 +114,6 @@ VAR1 (TERNOP, smlsd, si)
 VAR1 (TERNOP, smlsdx, si)
 VAR1 (BINOP, smuad, si)
 VAR1 (BINOP, smuadx, si)
+
+VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat16, si)
+VAR1 (SAT_BINOP_UNSIGNED_IMM, usat16, si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 72aba5e86fc20216bcba74f5cfa5b9f744497a6e..c412851843f4468c2c18bce264288705e076ac50 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -458,6 +458,8 @@
 
 (define_int_iterator SIMD32_BINOP_Q [UNSPEC_SMUAD UNSPEC_SMUADX])
 
+(define_int_iterator USSAT16 [UNSPEC_SSAT16 UNSPEC_USAT16])
+
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
 (define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
@@ -918,6 +920,7 @@
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
   (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
   (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+  (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
 ])
 
 (define_int_attr vfml_half
@@ -1083,7 +1086,8 @@
 			(UNSPEC_USUB16 "usub16") (UNSPEC_SMLAD "smlad")
 			(UNSPEC_SMLADX "smladx") (UNSPEC_SMLSD "smlsd")
 			(UNSPEC_SMLSDX "smlsdx") (UNSPEC_SMUAD "smuad")
-			(UNSPEC_SMUADX "smuadx")])
+			(UNSPEC_SMUADX "smuadx") (UNSPEC_SSAT16 "ssat16")
+			(UNSPEC_USAT16 "usat16")])
 
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 267c446c03e8903c21a0d74e43ae589ffcf689f4..c1f655c704011bbe8bac82c24a3234a23bf6b242 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -193,6 +193,14 @@
   (and (match_code "const_int")
(match_test "IN_RANGE (UINTVAL (op), 1, GET_MODE_BITSIZE (mode))")))
 
+(define_predicate "ssat16_imm"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 1, 16)")))
+

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2597 matches

Mail list logo