[Ping^2][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-09-05 Thread Jiong Wang
2017-08-22 9:18 GMT+01:00 Jiong Wang <jiong.w...@foss.arm.com>:
> On 10/08/17 17:39, Jiong Wang wrote:
>>
>> Hi,
>>
>>   A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for
>> ARMv8.3-A
>> return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA
>> vendor
>> extension space.
>>
>>   This patch adds necessary code to make it available to external, the GDB
>> patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is
>> intended
>> to use it.
>>
>>   A new DW_CFA_DUP for it is added in dwarf2.def.  The use of DW_CFA_DUP
>> is to
>> avoid duplicated case value issue when included in libiberty/dwarfnames.
>>
>>   Native x86 builds OK to make sure no macro expanding errors.
>>
>>   OK for trunk?
>>
>> 2017-08-10  Jiong Wang  <jiong.w...@arm.com>
>>
>> include/
>> * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
>> * dwarf2.h (DW_CFA_DUP): New define.
>>
>> libiberty/
>> * dwarfnames.c (DW_CFA_DUP): New define.
>>
>
> Ping~

Ping^2


[Ping~][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-08-22 Thread Jiong Wang

On 10/08/17 17:39, Jiong Wang wrote:

Hi,

  A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for 
ARMv8.3-A
return address signing, it is multiplexing DW_CFA_GNU_window_save in 
CFA vendor

extension space.

  This patch adds necessary code to make it available to external, the 
GDB
patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is 
intended

to use it.

  A new DW_CFA_DUP for it is added in dwarf2.def.  The use of 
DW_CFA_DUP is to

avoid duplicated case value issue when included in libiberty/dwarfnames.

  Native x86 builds OK to make sure no macro expanding errors.

  OK for trunk?

2017-08-10  Jiong Wang  <jiong.w...@arm.com>

include/
* dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
* dwarf2.h (DW_CFA_DUP): New define.

libiberty/
* dwarfnames.c (DW_CFA_DUP): New define.



Ping~


[PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-08-10 Thread Jiong Wang

Hi,

  A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for ARMv8.3-A
return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA vendor
extension space.

  This patch adds necessary code to make it available to external, the GDB
patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is intended
to use it.

  A new DW_CFA_DUP for it is added in dwarf2.def.  The use of DW_CFA_DUP is to
avoid duplicated case value issue when included in libiberty/dwarfnames.

  Native x86 builds OK to make sure no macro expanding errors.

  OK for trunk?

2017-08-10  Jiong Wang  <jiong.w...@arm.com>

include/
* dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
* dwarf2.h (DW_CFA_DUP): New define.

libiberty/
* dwarfnames.c (DW_CFA_DUP): New define.

diff --git a/include/dwarf2.def b/include/dwarf2.def
index a91e9439cd82f3bb9fdddc14904114e5490c1af6..2a3b23fef873db9bb2498cd28c4fafc72e6a234f 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -778,6 +778,7 @@ DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d)
 /* GNU extensions.
NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64.  */
 DW_CFA (DW_CFA_GNU_window_save, 0x2d)
+DW_CFA_DUP (DW_CFA_AARCH64_negate_ra_state, 0x2d)
 DW_CFA (DW_CFA_GNU_args_size, 0x2e)
 DW_CFA (DW_CFA_GNU_negative_offset_extended, 0x2f)
 
diff --git a/include/dwarf2.h b/include/dwarf2.h
index 14b6f22e39e2f2f8cadb05009bfd10fafa9ea07c..a2e022dbdb35c18bb591e0f00930978846b82c01 100644
--- a/include/dwarf2.h
+++ b/include/dwarf2.h
@@ -52,6 +52,7 @@
 #define DW_ATE(name, value) , name = value
 #define DW_ATE_DUP(name, value) , name = value
 #define DW_CFA(name, value) , name = value
+#define DW_CFA_DUP(name, value) , name = value
 #define DW_IDX(name, value) , name = value
 #define DW_IDX_DUP(name, value) , name = value
 
@@ -104,6 +105,7 @@
 #undef DW_ATE
 #undef DW_ATE_DUP
 #undef DW_CFA
+#undef DW_CFA_DUP
 #undef DW_IDX
 #undef DW_IDX_DUP
 
diff --git a/libiberty/dwarfnames.c b/libiberty/dwarfnames.c
index e58d03c3a3d814f3a271edb4689c6306a2f958f0..dacd78dbaa9b33d6e9fdf35330cdc446dcf4f76c 100644
--- a/libiberty/dwarfnames.c
+++ b/libiberty/dwarfnames.c
@@ -75,6 +75,7 @@ Boston, MA 02110-1301, USA.  */
 #define DW_ATE(name, value) case name: return # name ;
 #define DW_ATE_DUP(name, value)
 #define DW_CFA(name, value) case name: return # name ;
+#define DW_CFA_DUP(name, value)
 #define DW_IDX(name, value) case name: return # name ;
 #define DW_IDX_DUP(name, value)
 
@@ -105,5 +106,6 @@ Boston, MA 02110-1301, USA.  */
 #undef DW_ATE
 #undef DW_ATE_DUP
 #undef DW_CFA
+#undef DW_CFA_DUP
 #undef DW_IDX
 #undef DW_IDX_DUP


Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-06-15 Thread Jiong Wang

On 15/06/17 15:12, Wilco Dijkstra wrote:

This results in smaller code and unwind info.


I have done a quick test on your updated patch through building latest linux
kernel.

Dwarf frame size improved (~ 5% smaller) as using sp to address locals doesn't
need to update CFA register etc.

Though the impact on the codegen by using sp to address locals may be 
diversified,
for the case of linux kernel, I saw text size increase slightly (~ 0.05% 
bigger),
the reason looks like is GCC hardware copy propagation doesn't support stack
pointer case, see regcprop.c, so if you have the following sequences, the fp
case will be optimized into "add x0, x29, 36" while sp case left with two
instructions. A simple testcase listed below.

sp
===
mov x0, sp
add x0, x0, 36

fp
===
mov x0, x29
add x0, x0, 36

test.c
===
struct K {
 int a;
 int b;
 int c;
 int d;
 char e;
 short f;
 long g;
 float h;
 double i;
};

void foo (int, struct K *);

void test (int i)
{
 struct K k = {
  .a = 5,
  .b = 0,
  .c = i,
 };

 foo (5, );



Re: [PATCH 1/5] testsuite: attr-alloc_size-11.c (PR79356)

2017-03-15 Thread Jiong Wang

On 15/03/17 15:34, Rainer Orth wrote:

Hi Jiong,


Subject: [PATCH] testsuite, 79356

As stated in the PR (and elsewhere), this test now passes on aarch64,
ia64, mips, powerpc, sparc, and s390x.  This patch disables the xfails
for those targets.


gcc/testsuite/
PR testsuite/79356
* gcc.dg/attr-alloc_size-11.c: Don't xfail on aarch64, ia64, mips,
powerpc, sparc, or s390x.


It's passing on ARM as well.

I will commit the following patch which add arm*-*-* to the "Don't xfail".

gcc/testsuite/
 PR testsuite/79356
 * gcc.dg/attr-alloc_size-11.c: Don't xfail on arm.

please keep the lists sorted alphabetically.


Thanks, noticed that just during committing, the committed one has been 
corrected.

https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/testsuite/gcc.dg/attr-alloc_size-11.c?r1=246167=246166=246167



Re: [PATCH 1/5] testsuite: attr-alloc_size-11.c (PR79356)

2017-03-15 Thread Jiong Wang

On 10/03/17 15:26, Segher Boessenkool wrote:

On Fri, Mar 10, 2017 at 01:57:31PM +0100, Rainer Orth wrote:

I just noticed that nothing has happened at all in a month, so anything
is better than the tests XPASSing on a number of targets.

So the patch is ok for mainline with sparc*-*-* added to the target
lists and a reference to PR testsuite/79356 in the comment.

I'd still be very grateful if Martin could have a look what's really
going on here, though.

Same here.

Committed as:


Subject: [PATCH] testsuite, 79356

As stated in the PR (and elsewhere), this test now passes on aarch64,
ia64, mips, powerpc, sparc, and s390x.  This patch disables the xfails
for those targets.


gcc/testsuite/
PR testsuite/79356
* gcc.dg/attr-alloc_size-11.c: Don't xfail on aarch64, ia64, mips,
powerpc, sparc, or s390x.


It's passing on ARM as well.

I will commit the following patch which add arm*-*-* to the "Don't xfail".

gcc/testsuite/
PR testsuite/79356
* gcc.dg/attr-alloc_size-11.c: Don't xfail on arm.


diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index ccf2c2196c065b3387a91cc764dad3fcc1b4e3ee..3c1867bfb4e1cb762308dc6ac03afc7dc01cc075 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fail because of missing range information.  The xfail
exclusions are PR79356.  */
-TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */
-TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */
+TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { arm*-*-* aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */
+TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { arm*-*-* aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */
 TEST (int, INT_MIN + 2, ALLOC_MAX);/* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -3, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */


Re: [PING 6, PATCH] Remove xfail from thread_local-order2.C.

2017-03-10 Thread Jiong Wang

On 07/02/17 16:01, Mike Stump wrote:

On Feb 7, 2017, at 2:20 AM, Rainer Orth <r...@cebitec.uni-bielefeld.de> wrote:

No.  In fact, I'd go for something like this:

2017-02-07  Dominik Vogt  <v...@linux.vnet.ibm.com>
Rainer Orth  <r...@cebitec.uni-bielefeld.de>

* g++.dg/tls/thread_local-order2.C: Only xfail execution on
*-*-solaris*.

# HG changeset patch
# Parent  031bb7a327cc984d387a8ae64e7c65d4b8793731
Only xfail g++.dg/tls/thread_local-order2.C on Solaris

diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C 
b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
--- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C
+++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
@@ -2,10 +2,11 @@
// that isn't reverse order of construction.  We need to move
// __cxa_thread_atexit into glibc to get this right.

-// { dg-do run { xfail *-*-* } }
+// { dg-do run }
// { dg-require-effective-target c++11 }
// { dg-add-options tls }
// { dg-require-effective-target tls_runtime }
+// { dg-xfail-run-if "" { *-*-solaris* } }

extern "C" void abort();
extern "C" int printf (const char *, ...);

This way one can easily add per-target PR references or explanations,
e.g. for darwin10 or others should they come up.

Tested on i386-pc-solaris2.12 and x86_64-pc-linux-gnu.  Ok for mainline?

Ok.

I think that addresses most all known issues.  I'll pre-appove any additional 
targets people find as trivial.  For example, if darwin10 doesn't pass, then 
*-*-darwin10* would be fine to add if that fixes the issue.  I don't happen to 
have one that old to just test on.


I am seeing this failure on arm and aarch64 bare-metal environment where newlib 
are used.

This patch also XFAIL this testcase on newlib.

OK for trunk?

Regards,
Jiong

gcc/testsuite/
2017-03-10  Jiong Wang  <jiong.w...@arm.com>

* g++.dg/tls/thread_local-order2.C: XFAIL on newlib.


diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
index 3cbd257b5fab05d9af7aeceb4f97e9a79d2a283e..d274e8c606542893f8a792469e075056793335ea 100644
--- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C
+++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
@@ -6,7 +6,7 @@
 // { dg-require-effective-target c++11 }
 // { dg-add-options tls }
 // { dg-require-effective-target tls_runtime }
-// { dg-xfail-run-if "" { hppa*-*-hpux* *-*-solaris* } }
+// { dg-xfail-run-if "" { { hppa*-*-hpux* *-*-solaris* } || { newlib } } }
 
 extern "C" void abort();
 extern "C" int printf (const char *, ...);


Re: [AArch64] Accelerate -fstack-protector through pointer authentication extension

2017-02-15 Thread Jiong Wang



On 15/02/17 15:45, Richard Earnshaw (lists) wrote:

On 18/01/17 17:10, Jiong Wang wrote:

NOTE, this approach however requires DWARF change as the original LR
is signed,
the binary needs new libgcc to make sure c++ eh works correctly.
Given this
acceleration already needs the user specify
-mstack-protector-dialect=pauth
which means the target platform largely should have install new
libgcc, otherwise
you can't utilize new pointer authentication features.

gcc/
2016-11-11  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64-opts.h
(aarch64_stack_protector_type): New
 enum.
 (aarch64_layout_frame): Swap callees and locals when
 -mstack-protector-dialect=pauth specified.
 (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN
instead
 of AARCH64_ENABLE_RETURN_ADDRESS_SIGN.
 (aarch64_expand_epilogue): Likewise.
 * config/aarch64/aarch64.md (*do_return): Likewise.
 (aarch64_override_options): Sanity check for ILP32 and
TARGET_PAUTH.
 * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION,
AARCH64_PAUTH_SSP,
 AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines.
 * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New
option.
 * doc/invoke.texi (AArch64 Options): Documents
 -mstack-protector-dialect=.


  Patch updated
to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P.

aarch64 cross check OK with the following options enabled on all testcases.
 -fstack-protector-all -mstack-protector-pauth

OK for trunk?
 gcc/
2017-01-18  Jiong Wang  <jiong.w...@arm.com>
* config/aarch64/aarch64-protos.h
 (aarch64_pauth_stack_protector_enabled): New declaration.
 * config/aarch64/aarch64.c (aarch64_layout_frame): Swap
callee-save area
 and locals area when aarch64_pauth_stack_protector_enabled
returns true.
 (aarch64_stack_protect_runtime_enabled): New function.
 (aarch64_pauth_stack_protector_enabled): New function.
 (aarch64_return_address_signing_enabled): Enabled by
 aarch64_pauth_stack_protector_enabled.
 (aarch64_override_options): Sanity check for
-mstack-protector-pauth.
 (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define.
 * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise.
 * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option.
 * doc/invoke.texi (AArch64 Options): Documents
-mstack-protector-pauth.

gcc/testsuite/
 * gcc.target/aarch64/stack_protector_1.c: New test.


1.patch


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, 
const_tree, rtx,
  void aarch64_init_expanders (void);
  void aarch64_init_simd_builtins (void);
  void aarch64_emit_call_insn (rtx);
+bool aarch64_pauth_stack_protector_enabled (void);
  void aarch64_register_pragmas (void);
  void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
  extern tree aarch64_fp16_type_node;
  extern tree aarch64_fp16_ptr_type_node;
  
+#ifndef TARGET_LIBC_PROVIDES_SSP

+#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\
+%{fstack-protector|fstack-protector-all\
+  |fstack-protector-strong|fstack-protector-explicit:\
+  -lssp_nonshared -lssp}}"
+#endif
+

I don't think we want to suppress this.  PAUTH pased stack protections
isn't an all-or-nothing solution.  What if some object files are built
with traditional -fstack-protector code?


I had done a decription on this in the ping email (changed summary may caused
trouble to email client)

--
Code compiled with "-mstack-protector-pauth" can co-work with code compiled
without "-mstack-protector-pauth".  The only problem is when
"-mstack-protector-pauth" is specified, "-lssp/-lssp_nonshared" won't be implied
as the software runtime supports are not required any more.  So if the user has
some object files compiled using default stack protector and wants them to be
linked with object files compiled using "-mstack-protector-pauth", if
"-mstack-protector-pauth" appear in the final command line and "gcc" is used as
linker

Ping [AArch64] Accelerate -fstack-protector

2017-02-07 Thread Jiong Wang

On 18/01/17 17:10, Jiong Wang wrote:

aarch64 cross check OK with the following options enabled on all testcases.
-fstack-protector-all -mstack-protector-pauth

OK for trunk?
gcc/
2017-01-18  Jiong Wang  <jiong.w...@arm.com>
   * config/aarch64/aarch64-protos.h
(aarch64_pauth_stack_protector_enabled): New declaration.
* config/aarch64/aarch64.c (aarch64_layout_frame): Swap callee-save area
and locals area when aarch64_pauth_stack_protector_enabled returns true.
(aarch64_stack_protect_runtime_enabled): New function.
(aarch64_pauth_stack_protector_enabled): New function.
(aarch64_return_address_signing_enabled): Enabled by
aarch64_pauth_stack_protector_enabled.
(aarch64_override_options): Sanity check for -mstack-protector-pauth.
(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define.
* config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise.
* config/aarch64/aarch64.opt (-mstack-protector-pauth): New option.
* doc/invoke.texi (AArch64 Options): Documents -mstack-protector-pauth.

gcc/testsuite/
* gcc.target/aarch64/stack_protector_1.c: New test.


I'd like to ping this patch which acclerates GCC -fstack-protector using
ARMv8.3-A Pointer Authentication Extension.  The whole acceleration will
only be enabled through the new option "-mstack-protector-pauth" which is
disabled at default.

This patch does not touch any generic code and does not change GCC codegen on
AArch64 at default, it should be with very low risk.  So is it OK to commit
to GCC trunk?

Code compiled with "-mstack-protector-pauth" can co-work with code compiled
without "-mstack-protector-pauth".  The only problem is when
"-mstack-protector-pauth" is specified, "-lssp/-lssp_nonshared" won't be implied
as the software runtime supports are not required any more.  So if the user has
some object files compiled using default stack protector and wants them to be
linked with object files compiled using "-mstack-protector-pauth", if
"-mstack-protector-pauth" appear in the final command line and "gcc" is used as
linker driver, then "-lssp/-lssp_nonshared" needs to be specified explicitly.



Re: Fix profile updating in ifcombine

2017-02-06 Thread Jiong Wang

On 06/02/17 15:26, Jan Hubicka wrote:

I think it is not a regression, just the testcase if fragile and depends on 
outcome
of ifcombine.  It seems it was updated several time in the past. I am not quite
sure what the test is testing.


They are tring to make sure optimal stack adjustment decisions are made.

Fix the testcases by disabling relevant transformation passes looks one way to
me.  The other way, might be more reliable, is we dump the decisions made during
aarch64 frame layout if dump_file be true, and prefix the dump entry by function
name to make it easier caught by dejagnu.  We then scan rtl dump instead of
instructions.
 



Re: [PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7

2017-02-02 Thread Jiong Wang

On 02/02/17 13:31, Gerald Pfeifer wrote:

On Thu, 2 Feb 2017, Jiong Wang wrote:

This patch adds a short entry for the -march=armv8.3-a and
-msign-return-address= options in GCC 7 to the "AArch64" section.


Thanks, Jiong.

Index: gcc-7/changes.html
===
 
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  Please refer to the documentation for more information on
+   the arguments accepted by this option.
+ 

Would it make sense to make this two different items?  The way it
is currently marked up, the blank line will be "gone" once rendered.


OK, seperated them into two different items.



Where you "refer to the documentation", what kind of documentation
is that?  ARM reference manuals, GCC's documentation,...?  Being a
bit more explicit here and/or using a link would be good.


It's GCC user manual, have added the link in the updated patch.

Please review, thanks.


Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.54
diff -u -r1.54 changes.html
--- htdocs/gcc-7/changes.html	1 Feb 2017 19:23:00 -	1.54
+++ htdocs/gcc-7/changes.html	2 Feb 2017 14:34:49 -
@@ -711,6 +711,18 @@
 AArch64

  
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+ 
+ 
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  For more information on the arguments accepted by this
+   option, please refer to
+	https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>
+	AArch64-Options.
+ 
+ 
The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point
Extensions are now supported.  They can be used by specifying the
-march=armv8.2-a or -march=armv8.2-a+fp16


[PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7

2017-02-02 Thread Jiong Wang

Hi all,

This patch adds a short entry for the -march=armv8.3-a and -msign-return-address= options 
in GCC 7 to the "AArch64" section.

Eyeballed the result in Firefox.

Ok to commit?

Thanks,
Jiong
Index: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.39
diff -u -r1.39 changes.html
--- gcc-7/changes.html	17 Jan 2017 21:26:31 -	1.39
+++ gcc-7/changes.html	20 Jan 2017 14:31:21 -
@@ -384,6 +384,15 @@
 AArch64

  
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  Please refer to the documentation for more information on
+   the arguments accepted by this option.
+ 
+ 
The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point
Extensions are now supported.  They can be used by specifying the
-march=armv8.2-a or -march=armv8.2-a+fp16


Re: [PATCH v2] aarch64: Add split-stack initial support

2017-01-25 Thread Jiong Wang

On 24/01/17 18:05, Adhemerval Zanella wrote:


On 03/01/2017 13:13, Wilco Dijkstra wrote:


+  /* If function uses stacked arguments save the old stack value so morestack
+ can return it.  */
+  reg11 = gen_rtx_REG (Pmode, R11_REGNUM);
+  if (cfun->machine->frame.saved_regs_size
+  || cfun->machine->frame.saved_varargs_size)
+emit_move_insn (reg11, stack_pointer_rtx);

This doesn't look right - we could have many arguments even without varargs or
saved regs.  This would need to check varargs as well as ctrl->args.size (I 
believe
that is the size of the arguments on the stack). It's fine to omit this 
optimization
in the first version - we already emit 2-3 extra instructions for the check 
anyway.

I will check for a better solution.


Hi Adhemerval

  My only concern on this this patch is the initialization of R11 (internal arg
pointer).  The current implementation looks to me is generating wrong code for a
testcase simply return the sum of ten int param, I see the function body is
using R11 while there is no initialization of it in split prologue,  so if the
execution flow is *not* through __morestack, then R11 is not initialized.

As Wilco suggested, I feel using crtl->args.size instead of

cfun->machine->frame.saved_regs_size might be the correct approach after
checking assign_parms in function.c.



Re: [1/5][AArch64] Return address protection on AArch64

2017-01-20 Thread Jiong Wang


On 20/01/17 18:23, Jiong Wang wrote:


OK, the attached patch disable the building of pointer signing code in 
libgcc

on ILP32 mode, except the macro bit RA_A_SIGNED_BIT is still defined as I
want to book this bit for ILP32 as LP64 in case we later enable ILP32 
support.


All pauth builtins are not registered as well for ILP32 mode as these 
builtins

are supposed to be used by libgcc unwinder code only.

I also gated the three new testcases for return address signing using the
following directive and verified it works under my dejagnu environment.

{ dg-require-effective-target lp64 }

multilib cross build finished (lp64, ilp32), OK for trunk?

BTW, the mode fix patch doesn't have conflict with this patch, we may
still need it if we want to enable ILP32 support later.

Thanks.

gcc/
2017-01-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (aarch64_init_builtins): 
Don't

register pauth builtins for ILP32.

libgcc/
* config/aarch64/aarch64-unwind.h: Restrict this file on LP64 
only.

* unwind-dw2.c (execute_cfa_program):  Only multiplexing
DW_CFA_GNU_window_save for AArch64 LP64.




Missing testcase change in Changelog, added:

gcc/
2017-01-20  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64-builtins.c (aarch64_init_builtins): Register
 register pauth builtins for LP64 only.
 * testsuite/gcc.target/aarch64/return_address_sign_1.c: Enable on LP64
 only.
 * testsuite/gcc.target/aarch64/return_address_sign_2.c: Likewise.
 * testsuite/gcc.target/aarch64/return_address_sign_3.c: Likewise.

libgcc/
 * config/aarch64/aarch64-unwind.h: Empty this file on ILP32.
 * unwind-dw2.c (execute_cfa_program):  Only multiplexing
 DW_CFA_GNU_window_save for AArch64 and LP64.




Re: [1/5][AArch64] Return address protection on AArch64

2017-01-20 Thread Jiong Wang



Here is the patch.

For XPACLRI builtin which drops the signature in a pointer, it's
prototype is  "void *foo (void *)"
FOR PAC/AUT builtin which sign or authenticate a pointer, it's prototype
is "void *foo (void *, uint64)".

This patch adjusted those modes to make sure they strictly follow the C
prototype. I also borrow the type define in ARM backend

   typedef unsigned _uw64 __attribute__((mode(__DI__)));

And this is need to type cast the salt value which is always DImode.

It passed my local ILP32 cross build.

OK for trunk?

gcc/
2017-01-20  Jiong Wang  <jiong.w...@arm.com>
 * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin):
Fix modes
 for AARCH64_PAUTH_BUILTIN_XPACLRI,
AARCH64_PAUTH_BUILTIN_PACIA1716 and
 AARCH64_PAUTH_BUILTIN_AUTIA1716.

libgcc/
 * config/aarch64/aarch64-unwind.h (_uw64): New typedef.
 (aarch64_post_extract_frame_addr):  Cast salt to _uw64.
 (aarch64_post_frob_eh_handler_addr): Likewise.



Hmm, we currently don't support ILP32 at all for pointer signing (sorry
("Return address signing is only supported for -mabi=lp64");), so I
wonder if it would be better for now to simply suppress all the new
hooks in aarch64-unwind.h ifdef __ILP32__.

R.



OK, the attached patch disable the building of pointer signing code in libgcc
on ILP32 mode, except the macro bit RA_A_SIGNED_BIT is still defined as I
want to book this bit for ILP32 as LP64 in case we later enable ILP32 support.

All pauth builtins are not registered as well for ILP32 mode as these builtins
are supposed to be used by libgcc unwinder code only.

I also gated the three new testcases for return address signing using the
following directive and verified it works under my dejagnu environment.

{ dg-require-effective-target lp64 }

multilib cross build finished (lp64, ilp32), OK for trunk?

BTW, the mode fix patch doesn't have conflict with this patch, we may
still need it if we want to enable ILP32 support later.

Thanks.

gcc/
2017-01-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (aarch64_init_builtins): Don't
register pauth builtins for ILP32.

libgcc/
* config/aarch64/aarch64-unwind.h: Restrict this file on LP64 only.
* unwind-dw2.c (execute_cfa_program):  Only multiplexing
DW_CFA_GNU_window_save for AArch64 LP64.


diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 7ef351eb53b21c94a07dbd7c49813276dfcebdb2..66bcb9ad5872d1f6cac4ce9613806eb390be33af 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -983,9 +983,14 @@ aarch64_init_builtins (void)
   aarch64_init_crc32_builtins ();
   aarch64_init_builtin_rsqrt ();
 
-/* Initialize pointer authentication builtins which are backed by instructions
-   in NOP encoding space.  */
-  aarch64_init_pauth_hint_builtins ();
+  /* Initialize pointer authentication builtins which are backed by instructions
+ in NOP encoding space.
+
+ NOTE: these builtins are supposed to be used by libgcc unwinder only, as
+ there is no support on return address signing under ILP32, we don't
+ register them.  */
+  if (!TARGET_ILP32)
+aarch64_init_pauth_hint_builtins ();
 }
 
 tree
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
index fda72a414f1df7e81785864e994681e3695852f1..f87c3d28d1edff473a787a39a436e57076f97508 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
@@ -1,6 +1,7 @@
 /* Testing return address signing where no combined instructions used.  */
 /* { dg-do compile } */
 /* { dg-options "-O2 -msign-return-address=all" } */
+/* { dg-require-effective-target lp64 } */
 
 int foo (int);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
index 54fe47a69723d182c65941ddb73e2f1a5aa0af84..c5c1439b92e6637f85c47c6161cd797c0d68df25 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
@@ -1,6 +1,7 @@
 /* Testing return address signing where combined instructions used.  */
 /* { dg-do compile } */
 /* { dg-options "-O2 -msign-return-address=all" } */
+/* { dg-require-effective-target lp64 } */
 
 int foo (int);
 int bar (int, int);
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c
index adc5effdded8900b2dfb68459883d399ebd91ac8..7d9ec6eebd1ce452013d2895a551671c59e98f0c 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c
@@ -1,6 +1,7 @@
 /* Testing the disable of return address signing.  */
 /* { dg-do compile } */
 /* { dg-options &

Re: [1/5][AArch64] Return address protection on AArch64

2017-01-20 Thread Jiong Wang



On 20/01/17 11:15, Jiong Wang wrote:



On 20/01/17 03:39, Andrew Pinski wrote:
On Fri, Jan 6, 2017 at 3:47 AM, Jiong Wang <jiong.w...@foss.arm.com> 
wrote:

On 11/11/16 18:22, Jiong Wang wrote:

As described in the cover letter, this patch implements return address
signing
for AArch64, it's controlled by the new option:

-msign-return-address=[none | non-leaf | all]

"none" means don't do return address signing at all on any function.
"non-leaf"
means only sign non-leaf function.  "all" means sign all functions.
Return
address signing is currently disabled on ILP32.  I haven't tested it.

The instructions added in the architecture are of 2 kinds.

* In the NOP instruction space, which allows binaries to run 
without any

traps
on older versions of the architecture. This doesn't give any 
additional
protection on older hardware but allows for the same binary to be 
used on

earlier versions of the architecture and newer versions of the
architecture.

* New instructions that are only valid for v8.3 and will trap if 
used on

earlier
versions of the architecture.

At default, once return address signing is enabled, it will only 
generates

NOP
instruction.

While if -march=armv8.3-a specified, GCC will try to use the most
efficient
pointer authentication instruction as it can.

The architecture has 2 user invisible system keys for signing and 
creating

signed addresses as part of these instructions. For some use case, the
user
might want to use difference key for different functions. The new 
option

"-msign-return-address-key=key_name" let GCC select the key used for
return
address signing.  Permissible values are "a_key" for A key and 
"b_key" for

B
key, and this option are supported by function target attribute and 
LTO

will
hopefully just work.



gcc/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

  * config/aarch64/aarch64-opts.h 
(aarch64_pauth_key_index): New

enum.
  (aarch64_function_type): New enum.
  * config/aarch64/aarch64-protos.h 
(aarch64_output_sign_auth_reg):

New
  declaration.
  * config/aarch64/aarch64.c (aarch64_expand_prologue): 
Sign return

  address before it's pushed onto stack.
  (aarch64_expand_epilogue): Authenticate return address 
fetched

from
  stack.
  (aarch64_output_sign_auth_reg): New function.
  (aarch64_override_options): Sanity check for ILP32 and 
ISA level.

  (aarch64_attributes): New function attributes for
"sign-return-address",
  "pauth-key".
  * config/aarch64/aarch64.md (UNSPEC_AUTH_REG,
UNSPEC_AUTH_REG1716,
  UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN,
  UNSPEC_STRIP_X30_SIGN): New unspecs.
  ("*do_return"): Generate combined instructions according 
to key

index.
  ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716",
  "strip_reg_sign", "strip_lr_sign"): New.
  * config/aarch64/aarch64.opt (msign-return-address, 
mpauth-key):

New.
  * config/aarch64/predicates.md (aarch64_const0_const1): New
predicate.
  * doc/extend.texi (AArch64 Function Attributes): Documents
  "sign-return-address=", "pauth-key".
  * doc/invoke.texi (AArch64 Options): Documents
"-msign-return-address=",
  "-pauth-key".

gcc/testsuite/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

  * gcc.target/aarch64/return_address_sign_1.c: New testcase.
  * gcc.target/aarch64/return_address_sign_scope_1.c: New 
testcase.


Update the patchset according to new DWARF proposal described at

   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html

One of these patches of this patch set break ILP32 building for
aarch64-elf and most likely also aarch64-linux-gnu.

/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c: 


In function ‘uw_init_context_1’:
/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c:1567:6: 


internal compiler error: in emit_move_insn, at expr.c:3698
ra = MD_POST_EXTRACT_ROOT_ADDR (ra);
0x8270cf emit_move_insn(rtx_def*, rtx_def*)
/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/expr.c:3697 


0x80867b force_reg(machine_mode, rtx_def*)
Must be the Pmode issue under ILP32, I am testing a fix (I don't have 
full ILP32 environment, so can only test simply by force libgcc build 
with -mabi=ilp32)


Here is the patch.

For XPACLRI builtin which drops the signature in a pointer, it's prototype is  "void 
*foo (void *)"
FOR PAC/AUT builtin which sign or authenticate a pointer, it's prototype is "void 
*foo (void *, uint64)".

This patch adjusted those modes to mak

Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-20 Thread Jiong Wang

On 20/01/17 10:30, Christophe Lyon wrote:

error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this
function)
   fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
^


Hi Christophe, could you please confirm you svn revision please?

I do have done bootstrap and regression on both x86 and aarch64 before
commit this patch.  I had forgotten to "svn add" one header file, but add
it
later.


The failures started with r244673, and are still present with r244687.
When did you add the missing file?


It was r244674, https://gcc.gnu.org/ml/gcc-cvs/2017-01/msg00689.html,  so
should have been included in your code.  The faliure looks strange to me
then,  I will svn up and re-start a fresh bootstrap on AArch64.


The file is present in my git clone.
I'm not bootstrapping on AArch64, I'm building a cross-compiler on x86_64,
but it shouldn't matter.


Hi Christophe,

  Thanks, I reproduced this on cross linux environment, the reason is the 
header file is not included because of the inhabit_libc guard, while the 
unwinder header file should always be included.

   I will committed the attached patch as obvious, once I finished a fresh 
bootstrap, cross elf, cross linux.

   Thanks.

libgcc/

2017-01-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/linux-unwind.h: Always include aarch64-unwind.h.


diff --git a/libgcc/config/aarch64/linux-unwind.h b/libgcc/config/aarch64/linux-unwind.h
index a8fa1d5..70e5a8a 100644
--- a/libgcc/config/aarch64/linux-unwind.h
+++ b/libgcc/config/aarch64/linux-unwind.h
@@ -20,11 +20,13 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
<http://www.gnu.org/licenses/>.  */
 
+/* Always include AArch64 unwinder header file.  */
+#include "config/aarch64/aarch64-unwind.h"
+
 #ifndef inhibit_libc
 
 #include 
 #include 
-#include "config/aarch64/aarch64-unwind.h"
 
 
 /* Since insns are always stored LE, on a BE system the opcodes will


Re: [1/5][AArch64] Return address protection on AArch64

2017-01-20 Thread Jiong Wang



On 20/01/17 03:39, Andrew Pinski wrote:

On Fri, Jan 6, 2017 at 3:47 AM, Jiong Wang <jiong.w...@foss.arm.com> wrote:

On 11/11/16 18:22, Jiong Wang wrote:

As described in the cover letter, this patch implements return address
signing
for AArch64, it's controlled by the new option:

-msign-return-address=[none | non-leaf | all]

"none" means don't do return address signing at all on any function.
"non-leaf"
means only sign non-leaf function.  "all" means sign all functions.
Return
address signing is currently disabled on ILP32.  I haven't tested it.

The instructions added in the architecture are of 2 kinds.

* In the NOP instruction space, which allows binaries to run without any
traps
on older versions of the architecture. This doesn't give any additional
protection on older hardware but allows for the same binary to be used on
earlier versions of the architecture and newer versions of the
architecture.

* New instructions that are only valid for v8.3 and will trap if used on
earlier
versions of the architecture.

At default, once return address signing is enabled, it will only generates
NOP
instruction.

While if -march=armv8.3-a specified, GCC will try to use the most
efficient
pointer authentication instruction as it can.

The architecture has 2 user invisible system keys for signing and creating
signed addresses as part of these instructions. For some use case, the
user
might want to use difference key for different functions.  The new option
"-msign-return-address-key=key_name" let GCC select the key used for
return
address signing.  Permissible values are "a_key" for A key and "b_key" for
B
key, and this option are supported by function target attribute and LTO
will
hopefully just work.



gcc/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

  * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New
enum.
  (aarch64_function_type): New enum.
  * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg):
New
  declaration.
  * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return
  address before it's pushed onto stack.
  (aarch64_expand_epilogue): Authenticate return address fetched
from
  stack.
  (aarch64_output_sign_auth_reg): New function.
  (aarch64_override_options): Sanity check for ILP32 and ISA level.
  (aarch64_attributes): New function attributes for
"sign-return-address",
  "pauth-key".
  * config/aarch64/aarch64.md (UNSPEC_AUTH_REG,
UNSPEC_AUTH_REG1716,
  UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN,
  UNSPEC_STRIP_X30_SIGN): New unspecs.
  ("*do_return"): Generate combined instructions according to key
index.
  ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716",
  "strip_reg_sign", "strip_lr_sign"): New.
  * config/aarch64/aarch64.opt (msign-return-address, mpauth-key):
New.
  * config/aarch64/predicates.md (aarch64_const0_const1): New
predicate.
  * doc/extend.texi (AArch64 Function Attributes): Documents
  "sign-return-address=", "pauth-key".
  * doc/invoke.texi (AArch64 Options): Documents
"-msign-return-address=",
  "-pauth-key".

gcc/testsuite/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

  * gcc.target/aarch64/return_address_sign_1.c: New testcase.
  * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase.


Update the patchset according to new DWARF proposal described at

   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html

One of these patches of this patch set break ILP32 building for
aarch64-elf and most likely also aarch64-linux-gnu.

/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c:
In function ‘uw_init_context_1’:
/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c:1567:6:
internal compiler error: in emit_move_insn, at expr.c:3698
ra = MD_POST_EXTRACT_ROOT_ADDR (ra);
0x8270cf emit_move_insn(rtx_def*, rtx_def*)
/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/expr.c:3697
0x80867b force_reg(machine_mode, rtx_def*)
Must be the Pmode issue under ILP32, I am testing a fix (I don't have 
full ILP32 environment, so can only test simply by force libgcc build 
with -mabi=ilp32)




Thanks,
Andrew





While A key support for return address signing using DW_CFA_GNU_window_save
only
needs simple modifications on code and associated DWARF generation, B key
support is complexer, it needs multiple CIE support in GCC and Binutils, so
currently we fall back to DWARF value expression which fully works although
requires longer encodings. Value expre

Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-20 Thread Jiong Wang



On 20/01/17 10:11, Christophe Lyon wrote:



/tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:
In function 'execute_cfa_program':

/tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:1193:17:
error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this
function)
  fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
   ^


Hi Christophe, could you please confirm you svn revision please?

I do have done bootstrap and regression on both x86 and aarch64 before
commit this patch.  I had forgotten to "svn add" one header file, but add it
later.


The failures started with r244673, and are still present with r244687.
When did you add the missing file?


It was r244674, https://gcc.gnu.org/ml/gcc-cvs/2017-01/msg00689.html,  
so should have been included in your code.  The faliure looks strange to 
me then,  I will svn up and re-start a fresh bootstrap on AArch64.





Thanks.


Christophe






Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-20 Thread Jiong Wang



On 20/01/17 08:41, Christophe Lyon wrote:

Hi Jiong,

On 19 January 2017 at 15:46, Jiong Wang <jiong.w...@foss.arm.com> wrote:

Thanks for the review.

On 19/01/17 14:18, Richard Earnshaw (lists) wrote:




diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index
8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2
100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -136,6 +136,8 @@ struct _Unwind_Context
  #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
/* Context which has version/args_size/by_value fields.  */
  #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Bit reserved on AArch64, return address has been signed with A key.
*/
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)


Why is this here?   It appears to only be used within the
AArch64-specific header file.


I was putting it here so that when we allocate the next general purpose bit,
we
know clearly that bit 3 is allocated to AArch64 already, and the new general
bit
needs to go to the next one.  This can avoid bit collision.


...

+/* Frob exception handler's address kept in TARGET before installing
into
+   CURRENT context.  */
+
+static void *
+uw_frob_return_addr (struct _Unwind_Context *current,
+ struct _Unwind_Context *target)
+{
+  void *ret_addr = __builtin_frob_return_addr (target->ra);
+#ifdef MD_POST_FROB_EH_HANDLER_ADDR
+  ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr);
+#endif
+  return ret_addr;
+}
+


I think this function should be marked inline.  The optimizers would
probably inline it anyway, but it seems wrong for us to rely on that.


Thanks, fixed.

Does the updated patch looks OK to you know?

libgcc/

2017-01-19  Jiong Wang  <jiong.w...@arm.com>


 * config/aarch64/aarch64-unwind.h: New file.
 (DWARF_REGNUM_AARCH64_RA_STATE): Define.
 (MD_POST_EXTRACT_ROOT_ADDR): Define.
 (MD_POST_EXTRACT_FRAME_ADDR): Define.
 (MD_POST_FROB_EH_HANDLER_ADDR): Define.
 (MD_FROB_UPDATE_CONTEXT): Define.
 (aarch64_post_extract_frame_addr): New function.
 (aarch64_post_frob_eh_handler_addr): New function.
 (aarch64_frob_update_context): New function.
 * config/aarch64/linux-unwind.h: Include aarch64-unwind.h
 * config.host (aarch64*-*-elf, aarch64*-*-rtems*,
aarch64*-*-freebsd*):
 Initialize md_unwind_header to include aarch64-unwind.h.
 * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
 (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for
__aarch64__.
 (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
 (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
 (uw_frob_return_addr): New function.
 (_Unwind_DebugHook): Use uw_frob_return_addr.


Since you committed this (r244673), GCC fails to build for AArch64:
/tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:
In function 'execute_cfa_program':
/tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:1193:17:
error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this
function)
 fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
  ^


Hi Christophe, could you please confirm you svn revision please?

I do have done bootstrap and regression on both x86 and aarch64 before 
commit this patch.  I had forgotten to "svn add" one header file, but 
add it later.


Thanks.


Christophe




Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-19 Thread Jiong Wang

Thanks for the review.

On 19/01/17 14:18, Richard Earnshaw (lists) wrote:





diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 
8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2
 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -136,6 +136,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Bit reserved on AArch64, return address has been signed with A key.  */
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)


Why is this here?   It appears to only be used within the
AArch64-specific header file.


I was putting it here so that when we allocate the next general purpose bit, we
know clearly that bit 3 is allocated to AArch64 already, and the new general bit
needs to go to the next one.  This can avoid bit collision.




...

+/* Frob exception handler's address kept in TARGET before installing into
+   CURRENT context.  */
+
+static void *
+uw_frob_return_addr (struct _Unwind_Context *current,
+ struct _Unwind_Context *target)
+{
+  void *ret_addr = __builtin_frob_return_addr (target->ra);
+#ifdef MD_POST_FROB_EH_HANDLER_ADDR
+  ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr);
+#endif
+  return ret_addr;
+}
+


I think this function should be marked inline.  The optimizers would
probably inline it anyway, but it seems wrong for us to rely on that.


Thanks, fixed.

Does the updated patch looks OK to you know?

libgcc/

2017-01-19  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-unwind.h: New file.
(DWARF_REGNUM_AARCH64_RA_STATE): Define.
(MD_POST_EXTRACT_ROOT_ADDR): Define.
(MD_POST_EXTRACT_FRAME_ADDR): Define.
(MD_POST_FROB_EH_HANDLER_ADDR): Define.
(MD_FROB_UPDATE_CONTEXT): Define.
(aarch64_post_extract_frame_addr): New function.
(aarch64_post_frob_eh_handler_addr): New function.
(aarch64_frob_update_context): New function.
* config/aarch64/linux-unwind.h: Include aarch64-unwind.h
* config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*):
Initialize md_unwind_header to include aarch64-unwind.h.
* unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__.
(uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
(uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
(uw_frob_return_addr): New function.
(_Unwind_DebugHook): Use uw_frob_return_addr.

diff --git a/libgcc/config.host b/libgcc/config.host
index 6f2e458e74e776a6b7a310919558bcca76389232..540bfa9635802adabb36a2d1b7cf3416462c59f3 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -331,11 +331,13 @@ aarch64*-*-elf | aarch64*-*-rtems*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	md_unwind_header=aarch64/aarch64-unwind.h
 	;;
 aarch64*-*-freebsd*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	md_unwind_header=aarch64/aarch64-unwind.h
 	;;
 aarch64*-*-linux*)
 	extra_parts="$extra_parts crtfastmath.o"
diff --git a/libgcc/config/aarch64/aarch64-unwind.h b/libgcc/config/aarch64/aarch64-unwind.h
new file mode 100644
index ..a43d965b358f3e830b85fc42c7bceacf7d41a671
--- /dev/null
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -0,0 +1,87 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef AARCH64_UNWIND_H
+#define AARCH64_UNWIND_H
+
+#define DWARF_REGNUM_AARCH64_RA_STATE 34
+
+#define MD_POS

[AArch64] Accelerate -fstack-protector through pointer authentication extension

2017-01-18 Thread Jiong Wang

NOTE, this approach however requires DWARF change as the original LR is signed,
the binary needs new libgcc to make sure c++ eh works correctly.  Given this
acceleration already needs the user specify -mstack-protector-dialect=pauth
which means the target platform largely should have install new libgcc, 
otherwise
you can't utilize new pointer authentication features.

gcc/
2016-11-11  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-opts.h (aarch64_stack_protector_type): New
enum.
(aarch64_layout_frame): Swap callees and locals when
-mstack-protector-dialect=pauth specified.
(aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN instead
of AARCH64_ENABLE_RETURN_ADDRESS_SIGN.
(aarch64_expand_epilogue): Likewise.
* config/aarch64/aarch64.md (*do_return): Likewise.
(aarch64_override_options): Sanity check for ILP32 and TARGET_PAUTH.
* config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION, AARCH64_PAUTH_SSP,
AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines.
* config/aarch64/aarch64.opt (-mstack-protector-dialect=): New option.
* doc/invoke.texi (AArch64 Options): Documents
-mstack-protector-dialect=.

 
Patch updated to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P.


aarch64 cross check OK with the following options enabled on all testcases.
  
  -fstack-protector-all -mstack-protector-pauth


OK for trunk?

gcc/

2017-01-18  Jiong Wang  <jiong.w...@arm.com>
   
* config/aarch64/aarch64-protos.h

(aarch64_pauth_stack_protector_enabled): New declaration.
* config/aarch64/aarch64.c (aarch64_layout_frame): Swap callee-save area
and locals area when aarch64_pauth_stack_protector_enabled returns true.
(aarch64_stack_protect_runtime_enabled): New function.
(aarch64_pauth_stack_protector_enabled): New function.
(aarch64_return_address_signing_enabled): Enabled by
aarch64_pauth_stack_protector_enabled.
(aarch64_override_options): Sanity check for -mstack-protector-pauth.
(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define.
* config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise.
* config/aarch64/aarch64.opt (-mstack-protector-pauth): New option.
* doc/invoke.texi (AArch64 Options): Documents -mstack-protector-pauth.

gcc/testsuite/
* gcc.target/aarch64/stack_protector_1.c: New test.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 void aarch64_init_expanders (void);
 void aarch64_init_simd_builtins (void);
 void aarch64_emit_call_insn (rtx);
+bool aarch64_pauth_stack_protector_enabled (void);
 void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 extern tree aarch64_fp16_type_node;
 extern tree aarch64_fp16_ptr_type_node;
 
+#ifndef TARGET_LIBC_PROVIDES_SSP
+#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\
+			 %{fstack-protector|fstack-protector-all\
+			   |fstack-protector-strong|fstack-protector-explicit:\
+			   -lssp_nonshared -lssp}}"
+#endif
+
 #endif /* GCC_AARCH64_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6451b08191cf1a44aed502930da8603111f6e8ca..461f7b59584af9315accaecc0256abc9a2df4350 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2884,8 +2884,28 @@ aarch64_layout_frame (void)
   else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM)
 max_push_offset = 256;
 
-  if (cfun->machine->frame.frame_size < max_push_offset
-  && crtl->outgoing_args_size == 0)
+  /* Swap callee-save and local variables area to make callee-save which
+ includes return address register X30/LR position above local variables
+ that any local buffer overflow will override return address.  */
+  if (aarch64_pauth_stack_protector_enabled ())
+{
+  if (varargs_and_saved_regs_size < max_push_offset)
+	/* stp reg1, reg2, [sp, -varargs_and_saved_regs_size]!.  */
+	cfun->machine->frame.callee_adjust = varargs_and_saved_regs_size;
+  else
+	/* sub sp, sp, varargs_and_saved_regs_size.  */
+	cfun->

[Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)

2017-01-18 Thread Jiong Wang

On 12/01/17 18:10, Jiong Wang wrote:

On 06/01/17 11:47, Jiong Wang wrote:

This is the update on libgcc unwinder support according to new DWARF proposal.

As Joseph commented, duplication of unwind-dw2.c is not encouraged in libgcc,
But from this patch, you can see there are a few places we need to modify for
AArch64 in unwind-aarch64.c, so the file duplication approach is acceptable?


libgcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE,
RA_A_SIGNED_BIT): New macros.
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save on AArch64.
(uw_frame_state_for): Clear bit[0] of DWARF_REGNUM_AARCH64_RA_STATE.
(uw_update_context): Authenticate return address according to
DWARF_REGNUM_AARCH64_RA_STATE.
(uw_init_context_1): Strip signature of seed address.
(uw_install_context): Re-authenticate EH handler's address.


Ping~

For comparision, I have also attached the patch using the target macros.

Four new target macros are introduced:

  MD_POST_EXTRACT_ROOT_ADDR
  MD_POST_EXTRACT_FRAME_ADDR
  MD_POST_FROB_EH_HANDLER_ADDR
  MD_POST_INIT_CONTEXT

MD_POST_EXTRACT_ROOT_ADDR is to do target private post processing on the address
inside _Unwind* functions, they are serving as root address to start the
unwinding.  MD_POST_EXTRACT_FRAME_ADDR is to do target private post processing
on th address inside the real user program which throws the exceptions.

MD_POST_FROB_EH_HANDLER_ADDR is to do target private frob on the EH handler's
address before we install it into current context.

MD_POST_INIT_CONTEXT it to do target private initialization on the context
structure after common initialization.

One "__aarch64__" macro check is needed to multiplex DW_CFA_window_save.


Ping ~

Could global reviewers or libgcc maintainers please give a review on the generic
part change?

One small change is I removed MD_POST_INIT_CONTEXT as I found there is
MD_FROB_UPDATE_CONTEXT which serve the same purpose.  I still need to define

   MD_POST_EXTRACT_ROOT_ADDR
   MD_POST_EXTRACT_FRAME_ADDR
   MD_POST_FROB_EH_HANDLER_ADDR

And do one __aarch64__ check to multiplexing DW_CFA_GNU_window_save.

Thanks.

libgcc/ChangeLog:

2017-01-18  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-unwind.h: New file.
(DWARF_REGNUM_AARCH64_RA_STATE): Define.
(MD_POST_EXTRACT_ROOT_ADDR): Define.
(MD_POST_EXTRACT_FRAME_ADDR): Define.
(MD_POST_FROB_EH_HANDLER_ADDR): Define.
(MD_FROB_UPDATE_CONTEXT): Define.
(aarch64_post_extract_frame_addr): New function.
(aarch64_post_frob_eh_handler_addr): New function.
(aarch64_frob_update_context): New function.
* config/aarch64/linux-unwind.h: Include aarch64-unwind.h
* config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*):
Initialize md_unwind_header to include aarch64-unwind.h.
* unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__.
(uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
(uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR.
(uw_frob_return_addr): New function.
(_Unwind_DebugHook): Use uw_frob_return_addr.

diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -136,6 +136,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Bit reserved on AArch64, return address has been signed with A key.  */
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
   _Unwind_Word flags;
   /* 0 for now, can be increased when further fields are added to
  struct _Unwind_Context.  */
@@ -1185,6 +1187,11 @@ execute_cfa_program (const unsigned char *insn_ptr,
 	  break;
 
 	case DW_CFA_GNU_window_save:
+#ifdef __aarch64__
+	  /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
+	 return address signing status.  */
+	  fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
+#else
 	  /* ??? Hardcoded for SPARC register window configuration.  */
 	  if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32)
 	for (reg = 16; reg < 32; ++reg)
@@ -1192,6 +1199,7 @@ execute_cfa_program (const unsigned char *insn_ptr,
 		fs->regs.reg[reg].how = REG_SAVED_OFFSET;
 		fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *);
 	  }
+#endif
 	  break;
 
 	case DW_CFA_GNU_args_size:
@@ -1513,10 +1521,15 @@ uw_update_context (struct _Unwind_Context *context, _Unwind_FrameState *fs)
stack frame.  */
 context->ra = 0;
   else
-/* Compute the re

Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-17 Thread Jiong Wang



On 17/01/17 13:57, Richard Earnshaw (lists) wrote:

On 16/01/17 14:29, Jiong Wang wrote:



I can see the reason for doing this is if you want to seperate the
interpretion
of GCC CFA reg-note and the final DWARF CFA operation.  My
understanding is all
reg notes defined in gcc/reg-note.def should have general meaning,
even the
CFA_WINDOW_SAVE.  For those which are architecture specific we might
need a
mechanism to define them in backend only.
For general reg-notes in gcc/reg-note.def, they are not always have
the
corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
therefore if we want to achieve what you described, I think we also
need to
define a new target hook which maps a GCC CFA reg-note into
architecture DWARF
CFA operation.

Regards,
Jiong



Here is the patch.


Hmm, I really wasn't expecting any more than something like the
following in dwarf2cfi.c:

@@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
 handled_one = true;
 break;

+  case REG_CFA_TOGGLE_RA_MANGLE:
case REG_CFA_WINDOW_SAVE:
+   /* We overload both of these operations onto the same DWARF
opcode.  */
 dwarf2out_frame_debug_cfa_window_save ();
 handled_one = true;
 break;

This keeps the two reg notes separate within the compiler, but emits the
same dwarf operation during final output.  This avoids the need for new
hooks or anything more complicated.


This was my initial thoughts and the patch would be very small as you've
demonstrated.  I later moved to this complexer patch as I am thinking it's
better to completely treat notes in reg-notes.def as having generic meaning and
maps them to standard DWARF CFA if there is, otherwise maps them to target
private DWARF CFA through this new hook.  This give other targets a chance to
map, for example REG_CFA_TOGGLE_RA_MANGLE, to their architecture DWARF number.

The introduction of new hook looks be very low risk in this stage, the only
painful thing is the header file needs to be reorganized as we need to use some
DWARF type and reg-note type in targhooks.c.

Anyway, if the new hook patch is too heavy, I have attached the the simplified
version which simply defines the new REG_CFA_TOGGLE_RA_MANGLE and maps to same
code of REG_CFA_WINDOW_SAVE.


gcc/

2017-01-17  Jiong Wang  <jiong.w...@arm.com>

* reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
* combine-stack-adj.c (no_unhandled_cfa): Handle
REG_CFA_TOGGLE_RA_MANGLE.
* dwarf2cfi.c
(dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Generates DWARF
info for return address signing.
(aarch64_expand_epilogue): Likewise.

diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 20cd59ad08329e9f4f834bfc01d6f9ccc4485283..9ec14a3e44363f35f6419c38233ce5eebddd3458 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -208,6 +208,7 @@ no_unhandled_cfa (rtx_insn *insn)
   case REG_CFA_SET_VDRAP:
   case REG_CFA_WINDOW_SAVE:
   case REG_CFA_FLUSH_QUEUE:
+  case REG_CFA_TOGGLE_RA_MANGLE:
 	return false;
   }
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3bcad76b68b6ea7c9d75d150d79c45fb74d6bf0d..6451b08191cf1a44aed502930da8603111f6e8ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void)
 
   /* Sign return address for functions.  */
   if (aarch64_return_address_signing_enabled ())
-emit_insn (gen_pacisp ());
+{
+  insn = emit_insn (gen_pacisp ());
+  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = frame_size;
@@ -3707,7 +3711,11 @@ aarch64_expand_epilogue (bool for_sibcall)
 */
   if (aarch64_return_address_signing_enabled ()
   && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
-emit_insn (gen_autisp ());
+{
+  insn = emit_insn (gen_autisp ());
+  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   /* Stack adjustment for exception handler.  */
   if (crtl->calls_eh_return)
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 2748e2fa48e4794181496b26df9b51b7e51e7b84..2a527c9fecab091dccb417492e5dbb2ade244be2 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	handled_one = true;
 	break;
 
+  case REG_CFA_TOGGLE_RA_MANGLE:
   case REG_CFA_WINDOW_SAVE:
+	/* We overload both of these operations onto the same DWARF opcode.  */
 	dwarf2out_frame_debug_cfa_window_save ();
 	handled_one = true;
 	break;
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index ead4a9f58e8621288ee765e029c673640fdf38f4..175da119b6a534b04bd154f2c69dd087afd474ea 100644
--- a/gcc/reg

Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-16 Thread Jiong Wang

On 13/01/17 18:02, Jiong Wang wrote:

On 13/01/17 16:09, Richard Earnshaw (lists) wrote:

On 06/01/17 11:47, Jiong Wang wrote:


This patch is an update on DWARF generation for return address signing.

According to new proposal, we simply needs to generate 
REG_CFA_WINDOW_SAVE

annotation.

gcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate
dwarf
 annotation (REG_CFA_WINDOW_SAVE) for return address signing.
 (aarch64_expand_epilogue): Likewise.



I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in
the compiler -- it's one thing to do it in the dwarf output tables, but
quite another to be doing it elsewhere in the compiler.

Instead we should create a new reg note kind and use that, but in the
final dwarf output then emit the overloaded opcode.


I can see the reason for doing this is if you want to seperate the 
interpretion
of GCC CFA reg-note and the final DWARF CFA operation.  My 
understanding is all
reg notes defined in gcc/reg-note.def should have general meaning, 
even the
CFA_WINDOW_SAVE.  For those which are architecture specific we might 
need a

mechanism to define them in backend only.
   For general reg-notes in gcc/reg-note.def, they are not always have 
the

corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
therefore if we want to achieve what you described, I think we also 
need to
define a new target hook which maps a GCC CFA reg-note into 
architecture DWARF

CFA operation.

Regards,
Jiong



Here is the patch.

Introduced one new target hook TARGET_DWARF_MAP_REGNOTE_TO_CFA.  The purpose is
to allow GCC to map DWARF CFA reg notes in reg-note.def, which looks to me have
generic meaning, into target private DWARF CFI if there is no standard DWARF CFI
mapping.

One new GCC reg-note REG_TOGGLE_RA_MANGLE introduced as well, currently, it's
only used by AArch64 to implement return address signing and is mapped to
AArch64's target private DWARF CFI.

Does this approach and implementation looks OK?

I can come up with seperate patches to define this hook on Sparc for
CFA_WINDOW_SAVE, and to remove redundant including of dwarf2.h although there is
"ifdef" protector in header file.

The default hook implementation "default_dwarf_map_regnote_to_cfa" in
targhooks.c used the types "enum reg_note" and "enum dwarf_call_frame_info"
which is not included in coretypes.h thus this patch has several change in
header files.  I have done X86 bootstrap to make sure no build breakage.  I'd
appreciate there is better ideas to handle these type define.

Thanks.

gcc/ChangeLog:

2017-01-16  Jiong Wang  <jiong.w...@arm.com>

* target.def (dwarf_map_regnote_to_cfa): New hook.
* targhooks.c (default_dwarf_map_regnote_to_cfa): Default implementation
for TARGET_DWARF_MAP_REGNOTE_TO_CFA.
* targhooks.h (default_dwarf_map_regnote_to_cfa): New declaration.
* rtl.h (enum reg_note): Move enum reg_note to...
* coretypes.h: ... here.
(dwarf2.h): New include file.
* reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
* combine-stack-adj.c (no_unhandled_cfa): Handle
REG_CFA_TOGGLE_RA_MANGLE.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_toggle_ra_mangle): New
function.
(dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Documents TARGET_DWARF_MAP_REGNOTE_TO_CFA.
* config/aarch64/aarch64.c (aarch64_map_regnote_to_cfa): Implements
TARGET_DWARF_MAP_REGNOTE_TO_CFA.
(aarch64_expand_prologue): Generate DWARF info for return address
signing.
(aarch64_expand_epilogue): Likewise.
(TARGET_DWARF_MAP_REGNOTE_TO_CFA): Define.

diff --git a/gcc/target.def b/gcc/target.def
index 0443390..6aaa9e6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3995,6 +3995,14 @@ the CFI label attached to the insn, @var{pattern} is the pattern of\n\
 the insn and @var{index} is @code{UNSPEC_INDEX} or @code{UNSPECV_INDEX}.",
  void, (const char *label, rtx pattern, int index), NULL)
 
+/* This target hook allows the backend to map GCC DWARF CFA reg-note to
+   architecture specific DWARF call frame instruction.  */
+DEFHOOK
+(dwarf_map_regnote_to_cfa,
+ "Maps the incoming GCC DWARF CFA reg-note to architecture specific DWARF call\
+ frame instruction.",
+ enum dwarf_call_frame_info, (enum reg_note), default_dwarf_map_regnote_to_cfa)
+
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
 (stdarg_optimize_hook,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 2f2abd3..df07911 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1711,6 +1711,17 @@ default_dwarf_frame_reg_mode (int regno)
   return save_mode;
 }
 
+/* Determine the correct mode for a Dwarf frame register that represents
+   register REGNO.  */
+

Re: [2/5][AArch64] Generate dwarf information for -msign-return-address

2017-01-13 Thread Jiong Wang

On 13/01/17 16:09, Richard Earnshaw (lists) wrote:

On 06/01/17 11:47, Jiong Wang wrote:


This patch is an update on DWARF generation for return address signing.

According to new proposal, we simply needs to generate REG_CFA_WINDOW_SAVE
annotation.

gcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate
dwarf
 annotation (REG_CFA_WINDOW_SAVE) for return address signing.
 (aarch64_expand_epilogue): Likewise.



I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in
the compiler -- it's one thing to do it in the dwarf output tables, but
quite another to be doing it elsewhere in the compiler.

Instead we should create a new reg note kind and use that, but in the
final dwarf output then emit the overloaded opcode.


I can see the reason for doing this is if you want to seperate the interpretion
of GCC CFA reg-note and the final DWARF CFA operation.  My understanding is all
reg notes defined in gcc/reg-note.def should have general meaning, even the
CFA_WINDOW_SAVE.  For those which are architecture specific we might need a
mechanism to define them in backend only.
   
For general reg-notes in gcc/reg-note.def, they are not always have the

corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
therefore if we want to achieve what you described, I think we also need to
define a new target hook which maps a GCC CFA reg-note into architecture DWARF
CFA operation.

Regards,
Jiong




Re: [1/5][AArch64] Return address protection on AArch64

2017-01-13 Thread Jiong Wang

On 13/01/17 16:04, James Greenhalgh wrote:

On Fri, Jan 06, 2017 at 11:47:07AM +, Jiong Wang wrote:

On 11/11/16 18:22, Jiong Wang wrote:
gcc/
2017-01-06  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum.
 * config/aarch64/aarch64-protos.h
 (aarch64_return_address_signing_enabled): New declaration.
 * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
 New function.
 (aarch64_expand_prologue): Sign return address before it's pushed onto
 stack.
 (aarch64_expand_epilogue): Authenticate return address fetched from
 stack.
 (aarch64_override_options): Sanity check for ILP32 and ISA level.
 (aarch64_attributes): New function attributes for 
"sign-return-address".
 * config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP,
 UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs.
 ("*do_return"): Generate combined instructions according to key index.
 ("sp", "<pauth_mnem_prefix1716", "xpaclri"): New.
 * config/aarch64/iterators.md (PAUTH_LR_SP, PAUTH_17_16): New integer
 iterators.
 (pauth_mnem_prefix, pauth_hint_num_a): New integer attributes.
 * config/aarch64/aarch64.opt (msign-return-address=): New.
 * doc/extend.texi (AArch64 Function Attributes): Documents
 "sign-return-address=".
 * doc/invoke.texi (AArch64 Options): Documents 
"-msign-return-address=".

gcc/testsuite/
2017-01-06  Jiong Wang  <jiong.w...@arm.com>

 * gcc.target/aarch64/return_address_sign_1.c: New testcase.
 * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase.

I have a few comments on this patch


All fixed.  New patch attached.

gcc/
2017-01-13  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-opts.h (aarch64_function_type): New enum.
* config/aarch64/aarch64-protos.h
(aarch64_return_address_signing_enabled): New declaration.
* config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
New function.
(aarch64_expand_prologue): Sign return address before it's pushed onto
stack.
(aarch64_expand_epilogue): Authenticate return address fetched from
stack.
(aarch64_override_options): Sanity check for ILP32 and ISA level.
(aarch64_attributes): New function attributes for "sign-return-address".
* config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP,
UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs.
("*do_return"): Generate combined instructions according to key index.
("sp", "<pauth_mnem_prefix1716", "xpaclri"): New.
* config/aarch64/iterators.md (PAUTH_LR_SP, PAUTH_17_16): New integer
iterators.
(pauth_mnem_prefix, pauth_hint_num_a): New integer attributes.
* config/aarch64/aarch64.opt (msign-return-address=): New.
* doc/extend.texi (AArch64 Function Attributes): Documents
    "sign-return-address=".
* doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=".

gcc/testsuite/
2017-01-13  Jiong Wang  <jiong.w...@arm.com>

* gcc.target/aarch64/return_address_sign_1.c: New testcase for no
combined instructions.
* gcc.target/aarch64/return_address_sign_2.c: New testcase for combined
instructions.
* gcc.target/aarch64/return_address_sign_3.c: New testcase for disable
of pointer authentication.

diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 9f37b9b..ba5d052 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -71,4 +71,14 @@ enum aarch64_code_model {
   AARCH64_CMODEL_LARGE
 };
 
+/* Function types -msign-return-address should sign.  */
+enum aarch64_function_type {
+  /* Don't sign any function.  */
+  AARCH64_FUNCTION_NONE,
+  /* Non-leaf functions.  */
+  AARCH64_FUNCTION_NON_LEAF,
+  /* All functions.  */
+  AARCH64_FUNCTION_ALL
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd7..632dd47 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -386,6 +386,7 @@ void aarch64_emit_call_insn (rtx);
 void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
+bool aarch64_return_address_signing_enabled (void);
 void aarch64_save_restore_target_globals (tree);
 
 /* Initialize builtins for SIMD intrinsics.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9dd75b0..3bcad76 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3117,6 +3

[Ping~]Re: [5/5][AArch64, libgcc] Runtime support for AArch64 return address signing (also attached target macros version)

2017-01-12 Thread Jiong Wang

On 06/01/17 11:47, Jiong Wang wrote:
This is the update on libgcc unwinder support according to new DWARF 
proposal.


As Joseph commented, duplication of unwind-dw2.c is not encouraged in 
libgcc,
But from this patch, you can see there are a few places we need to 
modify for
AArch64 in unwind-aarch64.c, so the file duplication approach is 
acceptable?



libgcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE,
RA_A_SIGNED_BIT): New macros.
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save on 
AArch64.
(uw_frame_state_for): Clear bit[0] of 
DWARF_REGNUM_AARCH64_RA_STATE.

(uw_update_context): Authenticate return address according to
DWARF_REGNUM_AARCH64_RA_STATE.
(uw_init_context_1): Strip signature of seed address.
(uw_install_context): Re-authenticate EH handler's address.


Ping~

For comparision, I have also attached the patch using the target macros.

Four new target macros are introduced:

  MD_POST_EXTRACT_ROOT_ADDR
  MD_POST_EXTRACT_FRAME_ADDR
  MD_POST_FROB_EH_HANDLER_ADDR
  MD_POST_INIT_CONTEXT

MD_POST_EXTRACT_ROOT_ADDR is to do target private post processing on the address
inside _Unwind* functions, they are serving as root address to start the
unwinding.  MD_POST_EXTRACT_FRAME_ADDR is to do target private post processing
on th address inside the real user program which throws the exceptions.

MD_POST_FROB_EH_HANDLER_ADDR is to do target private frob on the EH handler's
address before we install it into current context.

MD_POST_INIT_CONTEXT it to do target private initialization on the context
structure after common initialization.

One "__aarch64__" macro check is needed to multiplex DW_CFA_window_save.

libgcc/ChangeLog:

2017-01-11  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-unwind.h: New file.
(DWARF_REGNUM_AARCH64_RA_STATE): Define.
(MD_POST_EXTRACT_ROOT_ADDR): Define.
(MD_POST_EXTRACT_FRAME_ADDR): Define.
(MD_POST_FROB_EH_HANDLER_ADDR): Define.
(MD_POST_INIT_CONTEXT): Define.
* config/aarch64/linux-unwind.h: Include aarch64-unwind.h
* config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*):
Initialize md_unwind_header to include aarch64-unwind.h.
* unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT".
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__.
(uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR.
(uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR and
MD_POST_INIT_CONTEXT.
(uw_frob_return_addr): New function.
(_Unwind_DebugHook): Use uw_frob_return_addr.

diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 8085a42ace15d53f4cb0c6681717012d906a6d47..35010a4065bb83f706701cb05392193f0ffa1f11 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -136,6 +136,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Bit reserved on AArch64, return address has been signed with A key.  */
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
   _Unwind_Word flags;
   /* 0 for now, can be increased when further fields are added to
  struct _Unwind_Context.  */
@@ -1185,6 +1187,11 @@ execute_cfa_program (const unsigned char *insn_ptr,
 	  break;
 
 	case DW_CFA_GNU_window_save:
+#ifdef __aarch64__
+	  /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
+	 return address signing status.  */
+	  fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
+#else
 	  /* ??? Hardcoded for SPARC register window configuration.  */
 	  if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32)
 	for (reg = 16; reg < 32; ++reg)
@@ -1192,6 +1199,7 @@ execute_cfa_program (const unsigned char *insn_ptr,
 		fs->regs.reg[reg].how = REG_SAVED_OFFSET;
 		fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *);
 	  }
+#endif
 	  break;
 
 	case DW_CFA_GNU_args_size:
@@ -1513,10 +1521,15 @@ uw_update_context (struct _Unwind_Context *context, _Unwind_FrameState *fs)
stack frame.  */
 context->ra = 0;
   else
-/* Compute the return address now, since the return address column
-   can change from frame to frame.  */
-context->ra = __builtin_extract_return_addr
-  (_Unwind_GetPtr (context, fs->retaddr_column));
+{
+  /* Compute the return address now, since the return address column
+	 can change from frame to frame.  */
+  context->ra = __builtin_extract_return_addr
+	(_Unwind_GetPtr (context, fs->retaddr_column));
+#ifdef MD_POST_EXTRACT_FRAME_ADDR
+  context->ra = MD_POST_EXTRACT_FRAME_ADDR (context, fs, context->ra);
+#endif
+}
 }
 
 static void
@@ -1550,

[4/5][AArch64, libgcc] Let AArch64 use customized unwinder file

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

We need customized EH unwinder support for AArch64 DWARF operations introduced
earlier in this patchset, these changes mostly need to be done in the generic
file unwind-dw2.c.

There are two ways of introducing these AArch64 support:
   * Introducing a few target macros so we can customize functions like
 uw_init_context, uw_install_context etc.
   * Use target private unwind-dw2 implementation, i.e duplicate the generic
 unwind-dw2.c into target config directory and use it instead of generic 
one.
 This is current used by IA64 and CR16 is using.

I am not sure which approach is the convention in libgcc, Ian, any comments on 
this?
Thanks.

This patch is the start of using approach 2 includes necessary Makefile support
and copying of original unwind-dw2.c.

A follow up patch will implement those AArch64 specific stuff so the change will
be very clear.

OK for trunk?

libgcc/
2016-11-08  Jiong Wang<jiong.w...@arm.com>

 * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-linux*):
 Include new AArch64 EH makefile.
 * config/aarch64/t-eh-aarch64: New EH makefile.
 * config/aarch64/unwind-aarch64.c: New EH unwinder implementation,
 copied from unwind-dw2.c.


Ping ~
No change on this patch for new DWARF proposal.



[3/5][AArch64] New builtins required by libgcc unwinder

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

This patch implements a few ARMv8.3-A new builtins for pointer sign and
authentication instructions.

Currently, these builtins are supposed to be used by libgcc EH unwinder
only.  They are not public interface to external user.

OK to install?

gcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

 * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New 
entries
 for AARCH64_PAUTH_BUILTIN_PACI1716, AARCH64_PAUTH_BUILTIN_AUTIA1716,
 AARCH64_PAUTH_BUILTIN_AUTIB1716, AARCH64_PAUTH_BUILTIN_XPACLRI.
 (aarch64_init_v8_3_builtins): New.
 (aarch64_init_builtins): Call aarch64_init_builtins.
 (arch64_expand_builtin): Expand new builtins.



This patch is an update on builtins support.  All these builtins are to be
internally used by libgcc only, so the updates only keeps those used.

OK for trunk?

gcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New entries
for AARCH64_PAUTH_BUILTIN_XPACLRI, AARCH64_PAUTH_BUILTIN_PACIA1716,
AARCH64_PAUTH_BUILTIN_AUTIA1716);
(aarch64_init_pauth_hint_builtins): New.
(aarch64_init_builtins): Call aarch64_init_pauth_hint_builtins.
(aarch64_expand_builtin): Expand new builtins.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 69fb756f0fbdc016f35ce1d08f2aaf092a034704..9ae9d9afc9c141235d7eee037d5571b9f35edc31 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -376,6 +376,10 @@ enum aarch64_builtins
   AARCH64_CRC32_BUILTIN_BASE,
   AARCH64_CRC32_BUILTINS
   AARCH64_CRC32_BUILTIN_MAX,
+  /* ARMv8.3-A Pointer Authentication Builtins.  */
+  AARCH64_PAUTH_BUILTIN_AUTIA1716,
+  AARCH64_PAUTH_BUILTIN_PACIA1716,
+  AARCH64_PAUTH_BUILTIN_XPACLRI,
   AARCH64_BUILTIN_MAX
 };
 
@@ -923,6 +927,33 @@ aarch64_init_fp16_types (void)
   aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
 }
 
+/* Pointer authentication builtins that will become NOP on legacy platform.
+   Currently, these builtins are for internal use only (libgcc EH unwinder).  */
+
+void
+aarch64_init_pauth_hint_builtins (void)
+{
+  /* Pointer Authentication builtins.  */
+  tree ftype_pointer_auth
+= build_function_type_list (ptr_type_node, ptr_type_node,
+unsigned_intDI_type_node, NULL_TREE);
+  tree ftype_pointer_strip
+= build_function_type_list (ptr_type_node, ptr_type_node, NULL_TREE);
+
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIA1716]
+= add_builtin_function ("__builtin_aarch64_autia1716", ftype_pointer_auth,
+			AARCH64_PAUTH_BUILTIN_AUTIA1716, BUILT_IN_MD, NULL,
+			NULL_TREE);
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_PACIA1716]
+= add_builtin_function ("__builtin_aarch64_pacia1716", ftype_pointer_auth,
+			AARCH64_PAUTH_BUILTIN_PACIA1716, BUILT_IN_MD, NULL,
+			NULL_TREE);
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_XPACLRI]
+= add_builtin_function ("__builtin_aarch64_xpaclri", ftype_pointer_strip,
+			AARCH64_PAUTH_BUILTIN_XPACLRI, BUILT_IN_MD, NULL,
+			NULL_TREE);
+}
+
 void
 aarch64_init_builtins (void)
 {
@@ -951,6 +982,10 @@ aarch64_init_builtins (void)
 
   aarch64_init_crc32_builtins ();
   aarch64_init_builtin_rsqrt ();
+
+/* Initialize pointer authentication builtins which are backed by instructions
+   in NOP encoding space.  */
+  aarch64_init_pauth_hint_builtins ();
 }
 
 tree
@@ -1293,6 +1328,43 @@ aarch64_expand_builtin (tree exp,
 	}
   emit_insn (pat);
   return target;
+case AARCH64_PAUTH_BUILTIN_AUTIA1716:
+case AARCH64_PAUTH_BUILTIN_PACIA1716:
+case AARCH64_PAUTH_BUILTIN_XPACLRI:
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  op0 = force_reg (Pmode, expand_normal (arg0));
+
+  if (!target)
+	target = gen_reg_rtx (Pmode);
+  else
+	target = force_reg (Pmode, target);
+
+  emit_move_insn (target, op0);
+
+  if (fcode == AARCH64_PAUTH_BUILTIN_XPACLRI)
+	{
+	  rtx lr = gen_rtx_REG (Pmode, R30_REGNUM);
+	  icode = CODE_FOR_xpaclri;
+	  emit_move_insn (lr, op0);
+	  emit_insn (GEN_FCN (icode) ());
+	  emit_move_insn (target, lr);
+	}
+  else
+	{
+	  tree arg1 = CALL_EXPR_ARG (exp, 1);
+	  rtx op1 = force_reg (Pmode, expand_normal (arg1));
+	  icode = (fcode == AARCH64_PAUTH_BUILTIN_PACIA1716
+		   ? CODE_FOR_paci1716 : CODE_FOR_auti1716);
+
+	  rtx x16_reg = gen_rtx_REG (Pmode, R16_REGNUM);
+	  rtx x17_reg = gen_rtx_REG (Pmode, R17_REGNUM);
+	  emit_move_insn (x17_reg, op0);
+	  emit_move_insn (x16_reg, op1);
+	  emit_insn (GEN_FCN (icode) ());
+	  emit_move_insn (target, x17_reg);
+	}
+
+  return target;
 }
 
   if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)


[5/5][AArch64, libgcc] Runtime support for AArch64 DWARF operations

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

This patch add AArch64 specific runtime EH unwinding support for
DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref.

The semantics of them are described at the specification in patch [1/9].

The support includes:
   * Parsing these DWARF operations.  Perform unwinding actions according to
 their semantics.

   * Handling eh_return multi return paths.
 Function calling __builtin_eh_return (_Unwind_RaiseException*) will have
 multiple return paths.  One is for normal exit, the other is for install
 EH handler.  If the _Unwind_RaiseException itself is return address signed,
 then there will always be return address authentication before return,
 however, if the return path in _Unwind_RaiseException if from installing EH
 handler the address of which has already been authenticated during
 unwinding,  then we need to re-sign that address, so when the execution 
flow
 continues at _Unwind_RaiseException's epilogue, the authentication still
 works correctly.


OK for trunk?

libgcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

 * config/aarch64/unwind-aarch64.c (RA_SIGN_BIT): New flag to indicate
 one frame is return address signed.
 (execute_stack_op): Handle DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp,
 DW_OP_AARCH64_paciasp_deref.
 (uw_init_context): Call aarch64_uw_init_context_1.
 (uw_init_context_1): Rename to aarch64_uw_init_context_1.  Strip
 signature for seed address.
 (uw_install_context): Re-sign handler's address so it works correctly
 with caller's context.
 (uw_install_context_1): by_value[LR] can be true, after return address
 signing LR will come from DWARF value expression rule which is a
 by_value true rule.



This is the update on libgcc unwinder support according to new DWARF proposal.

As Joseph commented, duplication of unwind-dw2.c is not encouraged in libgcc,
But from this patch, you can see there are a few places we need to modify for
AArch64 in unwind-aarch64.c, so the file duplication approach is acceptable?


libgcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE,
RA_A_SIGNED_BIT): New macros.
(execute_cfa_program): Multiplex DW_CFA_GNU_window_save on AArch64.
(uw_frame_state_for): Clear bit[0] of DWARF_REGNUM_AARCH64_RA_STATE.
(uw_update_context): Authenticate return address according to
DWARF_REGNUM_AARCH64_RA_STATE.
(uw_init_context_1): Strip signature of seed address.
(uw_install_context): Re-authenticate EH handler's address.

diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c
index 1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff..11e3c9f724c9bc5796103a0d973bfe769d23b6e7 100644
--- a/libgcc/config/aarch64/unwind-aarch64.c
+++ b/libgcc/config/aarch64/unwind-aarch64.c
@@ -37,6 +37,9 @@
 #include "gthr.h"
 #include "unwind-dw2.h"
 
+/* This is a copy of libgcc/unwind-dw2.c with AArch64 return address signing
+   support.  */
+
 #ifdef HAVE_SYS_SDT_H
 #include 
 #endif
@@ -55,6 +58,8 @@
 #define PRE_GCC3_DWARF_FRAME_REGISTERS __LIBGCC_DWARF_FRAME_REGISTERS__
 #endif
 
+#define DWARF_REGNUM_AARCH64_RA_STATE 32
+
 /* ??? For the public function interfaces, we tend to gcc_assert that the
column numbers are in range.  For the dwarf2 unwind info this does happen,
although so far in a case that doesn't actually matter.
@@ -136,6 +141,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Return address has been signed with A key.  */
+#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
   _Unwind_Word flags;
   /* 0 for now, can be increased when further fields are added to
  struct _Unwind_Context.  */
@@ -1185,13 +1192,9 @@ execute_cfa_program (const unsigned char *insn_ptr,
 	  break;
 
 	case DW_CFA_GNU_window_save:
-	  /* ??? Hardcoded for SPARC register window configuration.  */
-	  if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32)
-	for (reg = 16; reg < 32; ++reg)
-	  {
-		fs->regs.reg[reg].how = REG_SAVED_OFFSET;
-		fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *);
-	  }
+	  /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
+	 return address signing status.  */
+	  fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1;
 	  break;
 
 	case DW_CFA_GNU_args_size:
@@ -1263,6 +1266,8 @@ uw_frame_state_for (struct _Unwind_Context *context, _Unwind_FrameState *fs)
   /* First decode all the insns in the CIE.  */
   end = (const unsigned char *) next_fde ((const struct dwarf_fde *) cie);
   execute_cfa_program (insn, end, context, fs);
+

[2/5][AArch64] Generate dwarf information for -msign-return-address

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

This patch generate DWARF description for pointer authentication.  DWARF value
expression is used to describe the authentication action.

Please see the cover letter and AArch64 DWARF specification for the semantics
of AArch64 DWARF operations.

When authentication key index is A key, we use compact DWARF description which
can largely save DWARF frame size, otherwise we fallback to general operator.



Example
===

 int
 cal (int a, int b, int c)
 {
   return a + dec (b) + c;
 }

Compact DWARF description
(-march=armv8.3-a -msign-return-address)
===

   DW_CFA_advance_loc: 4 to 0004
   DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp)
   DW_CFA_advance_loc: 4 to 0008
   DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp_deref: -24)

General DWARF description
===
(-march=armv8.3-a -msign-return-address -mpauth-key=b_key)

   DW_CFA_advance_loc: 4 to 0004
   DW_CFA_val_expression: r30 (x30) (DW_OP_breg30 (x30): 0; 
DW_OP_AARCH64_pauth: 18)
   DW_CFA_advance_loc: 4 to 0008
   DW_CFA_val_expression: r30 (x30) (DW_OP_dup; DW_OP_const1s: -24; DW_OP_plus; 
DW_OP_deref; DW_OP_AARCH64_pauth: 18)

 From Linux kernel testing, -msign-return-address will introduce +24%
.debug_frame size increase when signing all functions and using compact
description, and about +45% .debug_frame size increase if using general
description.


gcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

 * config/aarch64/aarch64.h (aarch64_pauth_action_type): New enum.
 * config/aarch64/aarch64.c (aarch64_attach_ra_auth_dwarf_note): New 
function.
 (aarch64_attach_ra_auth_dwarf_general): New function.
 (aarch64_attach_ra_auth_dwarf_shortcut): New function.
 (aarch64_save_callee_saves): Generate dwarf information if LR is 
signed.
 (aarch64_expand_prologue): Likewise.
 (aarch64_expand_epilogue): Likewise.


This patch is an update on DWARF generation for return address signing.

According to new proposal, we simply needs to generate REG_CFA_WINDOW_SAVE
annotation.

gcc/

2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64.c (aarch64_expand_prologue): Generate dwarf
annotation (REG_CFA_WINDOW_SAVE) for return address signing.
(aarch64_expand_epilogue): Likewise.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 002895a167ce0deb45a5c1726527651af18bb4df..20ed79e5690f45ec121ef516245c686cc0cc82b5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void)
 
   /* Sign return address for functions.  */
   if (aarch64_return_address_signing_enabled ())
-emit_insn (gen_pacisp ());
+{
+  insn = emit_insn (gen_pacisp ());
+  add_reg_note (insn, REG_CFA_WINDOW_SAVE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = frame_size;
@@ -3698,7 +3702,11 @@ aarch64_expand_epilogue (bool for_sibcall)
  want to use the CFA of the function which calls eh_return.  */
   if (aarch64_return_address_signing_enabled ()
   && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
-emit_insn (gen_autisp ());
+{
+  insn = emit_insn (gen_autisp ());
+  add_reg_note (insn, REG_CFA_WINDOW_SAVE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   /* Stack adjustment for exception handler.  */
   if (crtl->calls_eh_return)


[1/5][AArch64] Return address protection on AArch64

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

As described in the cover letter, this patch implements return address signing
for AArch64, it's controlled by the new option:

   -msign-return-address=[none | non-leaf | all]

"none" means don't do return address signing at all on any function.  "non-leaf"
means only sign non-leaf function.  "all" means sign all functions.  Return
address signing is currently disabled on ILP32.  I haven't tested it.

The instructions added in the architecture are of 2 kinds.

* In the NOP instruction space, which allows binaries to run without any traps
on older versions of the architecture. This doesn't give any additional
protection on older hardware but allows for the same binary to be used on
earlier versions of the architecture and newer versions of the architecture.

* New instructions that are only valid for v8.3 and will trap if used on earlier
versions of the architecture.

At default, once return address signing is enabled, it will only generates NOP
instruction.

While if -march=armv8.3-a specified, GCC will try to use the most efficient
pointer authentication instruction as it can.

The architecture has 2 user invisible system keys for signing and creating
signed addresses as part of these instructions. For some use case, the user
might want to use difference key for different functions.  The new option
"-msign-return-address-key=key_name" let GCC select the key used for return
address signing.  Permissible values are "a_key" for A key and "b_key" for B
key, and this option are supported by function target attribute and LTO will
hopefully just work.



gcc/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

 * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum.
 (aarch64_function_type): New enum.
 * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New
 declaration.
 * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return
 address before it's pushed onto stack.
 (aarch64_expand_epilogue): Authenticate return address fetched from
 stack.
 (aarch64_output_sign_auth_reg): New function.
 (aarch64_override_options): Sanity check for ILP32 and ISA level.
 (aarch64_attributes): New function attributes for 
"sign-return-address",
 "pauth-key".
 * config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716,
 UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN,
 UNSPEC_STRIP_X30_SIGN): New unspecs.
 ("*do_return"): Generate combined instructions according to key index.
 ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716",
 "strip_reg_sign", "strip_lr_sign"): New.
 * config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New.
 * config/aarch64/predicates.md (aarch64_const0_const1): New predicate.
 * doc/extend.texi (AArch64 Function Attributes): Documents
 "sign-return-address=", "pauth-key".
 * doc/invoke.texi (AArch64 Options): Documents 
"-msign-return-address=",
 "-pauth-key".

gcc/testsuite/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

 * gcc.target/aarch64/return_address_sign_1.c: New testcase.
 * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase.


Update the patchset according to new DWARF proposal described at

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html

While A key support for return address signing using DW_CFA_GNU_window_save only
needs simple modifications on code and associated DWARF generation, B key
support is complexer, it needs multiple CIE support in GCC and Binutils, so
currently we fall back to DWARF value expression which fully works although
requires longer encodings. Value expression also requires a few changes on
AArch64 prologue and epilogue hooks that code review will not be easy.

Therefore I have removed all B key support code in the initial support patch 
set,
and will organize them into a seperate follow up patchset so that we can do A 
key
code review first.

This patch is an update on the return address signing code generation.

gcc/
2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-opts.h (aarch64_function_type): New enum.
* config/aarch64/aarch64-protos.h
(aarch64_return_address_signing_enabled): New declaration.
* config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
New function.
(aarch64_expand_prologue): Sign return address before it's pushed onto
stack.
(aarch64_expand_epilogue): Authenticate return address fetched from
stack.
(aarch64_override_options): Sanity check for ILP32 and ISA level.
(aarch64_attributes): New function a

[Ping~][AArch64] Add commandline support for -march=armv8.3-a

2017-01-06 Thread Jiong Wang

On 11/11/16 18:22, Jiong Wang wrote:

This patch add command line support for ARMv8.3-A through new architecture:

   -march=armv8.3-a

ARMv8.3-A implies all default features of ARMv8.2-A and meanwhile it includes
the new pointer authentication extension.


gcc/
2016-11-08  Jiong Wang<jiong.w...@arm.com>

 * config/aarch64/aarch64-arches.def: New entry for "armv8.3-a".
 * config/aarch64/aarch64.h (AARCH64_FL_PAUTH, AARCH64_FL_V8_3,
 AARCH64_FL_FOR_ARCH8_3, AARCH64_ISA_PAUTH, AARCH64_ISA_V8_3,
 TARGET_PAUTH, TARGET_ARMV8_3): New.
 * doc/invoke.texi (AArch64 Options): Document "armv8.3-a".


Ping ~

As pointer authentication extension is defined to be mandatory extension on
ARMv8.3-A and is not optional, I adjusted the patch slightly.

This also let GCC treating pointer authentication extension in consistent way
with Binutils.

OK for trunk?

gcc/
2017-01-06  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-arches.def: New entry for "armv8.3-a".
* config/aarch64/aarch64.h (AARCH64_FL_V8_3, AARCH64_FL_FOR_ARCH8_3,
AARCH64_ISA_V8_3, TARGET_ARMV8_3): New.
* doc/invoke.texi (AArch64 Options): Document "armv8.3-a".


diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 830a7cf545532c050847a8c915d21bef12152388..ce6f73b3e5853b3d40e07545b9581298c768edca 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -33,5 +33,6 @@
 AARCH64_ARCH("armv8-a",	  generic,	 8A,	8,  AARCH64_FL_FOR_ARCH8)
 AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
+AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 584ff5c43afcd1a7918019b09165371bb88bfda1..51916c95a736ade697a823f15d483336651ac99a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -138,6 +138,8 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.2-A architecture extensions.  */
 #define AARCH64_FL_V8_2	  (1 << 8)  /* Has ARMv8.2-A features.  */
 #define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2-A FP16 extensions.  */
+/* ARMv8.3-A architecture extensions.  */
+#define AARCH64_FL_V8_3	  (1 << 10)  /* Has ARMv8.3-A features.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -151,6 +153,8 @@ extern unsigned aarch64_architecture_version;
   (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
 #define AARCH64_FL_FOR_ARCH8_2			\
   (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2)
+#define AARCH64_FL_FOR_ARCH8_3			\
+  (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
 
 /* Macros to test ISA flags.  */
 
@@ -162,6 +166,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_RDMA	   (aarch64_isa_flags & AARCH64_FL_V8_1)
 #define AARCH64_ISA_V8_2	   (aarch64_isa_flags & AARCH64_FL_V8_2)
 #define AARCH64_ISA_F16		   (aarch64_isa_flags & AARCH64_FL_F16)
+#define AARCH64_ISA_V8_3	   (aarch64_isa_flags & AARCH64_FL_V8_3)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -176,6 +181,9 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16)
 #define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16)
 
+/* ARMv8.3-A features.  */
+#define TARGET_ARMV8_3	(AARCH64_ISA_V8_3)
+
 /* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs.  */
 #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8e2f46617b8e44ccf16941c31029ab5625322867..791718831d7089c44dfadb137f5e93caa9cd05f0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13874,7 +13874,10 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @var{native}.
+
+The value @samp{armv8.3-a} implies @samp{armv8.2-a} and enables compiler
+support for the ARMv8.3-A architecture extensions.
 
 The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
 support for the ARMv8.2-A architecture extensions.


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2017-01-03 Thread Jiong Wang



On 28/12/16 19:54, Cary Coutant wrote:

   OK on this proposal and to install this patch to gcc trunk?

Hi GDB, Binutils maintainer:

   OK on this proposal and install this patch to binutils-gdb master?

include/
2016-11-29   Richard Earnshaw  <rearn...@arm.com>
  Jiong Wang  <jiong.w...@arm.com>

 * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea.

This is OK, but:

+/* AARCH64 extensions.
+   DW_OP_AARCH64_operation takes one mandatory unsigned LEB128 operand.
+   Bits[6:0] of this operand is the action code, all others bits are
initialized
+   to 0 except explicitly documented for one action.  Please refer
AArch64 DWARF
+   ABI documentation for details.  */

Is it possible to include a stable URL that points to the ABI document?

Hi Cary,

  Thanks for the review.

  Currently there is no URL for this AArch64 DWARF ABI updates.  I will 
update the comments as soon as the stable URL is available.


Regards,
Jiong


[Ping^3][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-28 Thread Jiong Wang

Jiong Wang writes:

> Jiong Wang writes:
>
>> Jiong Wang writes:
>>
>>> On 16/11/16 14:02, Jakub Jelinek wrote:
>>>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:
>>>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
>>>>>>   The two operations DW_OP_AARCH64_paciasp and 
>>>>>> DW_OP_AARCH64_paciasp_deref were
>>>>>> designed as shortcut operations when LR is signed with A key and using
>>>>>> function's CFA as salt.  This is the default behaviour of return address
>>>>>> signing so is expected to be used for most of the time.  
>>>>>> DW_OP_AARCH64_pauth
>>>>>> is designed as a generic operation that allow describing pointer signing 
>>>>>> on
>>>>>> any value using any salt and key in case we can't use the shortcut 
>>>>>> operations
>>>>>> we can use this.
>>>>>
>>>>> I admit to not fully understand the salting/keying involved. But given
>>>>> that the DW_OP space is really tiny, so we would like to not eat up too
>>>>> many of them for new opcodes. And given that introducing any new DW_OPs
>>>>> using for CFI unwinding will break any unwinder anyway causing us to
>>>>> update them all for this new feature. Have you thought about using a new
>>>>> CIE augmentation string character for describing that the return
>>>>> address/link register used by a function/frame is salted/keyed?
>>>>>
>>>>> This seems a good description of CIE records and augmentation
>>>>> characters:http://www.airs.com/blog/archives/460
>>>>>
>>>>> It obviously also involves updating all unwinders to understand the new
>>>>> augmentation character (and possible arguments). But it might be more
>>>>> generic and saves us from using up too many DW_OPs.
>>>>
>>>> From what I understood, the return address is not always scrambled, so
>>>> it doesn't apply to the whole function, just to most of it (except for
>>>> an insn in the prologue and some in the epilogue).  So I think one op is
>>>> needed.  But can't it be just a toggable flag whether the return address
>>>> is scrambled + some arguments to it?
>>>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
>>>> way of scrambling starts here (if not already active) or any kind of
>>>> scrambling ends here (if already active), and
>>>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you 
>>>> need
>>>> to represent details of the less common variants with details what to do.
>>>> Then you'd just hook through some MD_* macro in the unwinder the
>>>> descrambling operation if the scrambling is active at the insns you unwind
>>>> on.
>>>>
>>>>   Jakub
>>>
>>> Hi Mark, Jakub:
>>>
>>>Thanks very much for the suggestions.
>>>
>>>I have done some experiments on your ideas and am thinking it's good to
>>>combine them together.  The use of DW_CFA instead of DW_OP can avoid 
>>> building
>>>all information from scratch at each unwind location, while we can 
>>> indicate
>>>the signing key index through new AArch64 CIE augmentation 'B'. This new
>>>approach reduce the unwind table size overhead from ~25% to ~5% when 
>>> return
>>>address signing enabled, it also largely simplified dwarf generation 
>>> code for
>>>return address signing.
>>>
>>>As one new DWARF call frame instruction is needed for AArch64, I want to 
>>> reuse
>>>DW_CFA_GNU_window_save to save the space.  It is in vendor extension 
>>> space and
>>>used for Sparc only, I think it make sense to reuse it for AArch64. On
>>>AArch64, DW_CFA_GNU_window_save toggle return address sign status which 
>>> kept
>>>in a new boolean type column in DWARF table,  so DW_CFA_GNU_window_save 
>>> takes
>>>no argument on AArch64, the same as on Sparc, this makes no difference 
>>> to those
>>>existed encoding, length calculation code.
>>>
>>>Meanwhile one new DWARF expression operation number is still needed for
>>>AArch64, it's useful for describing those complex pointer signing 
>>> scenarios
>>>and it will be used to multiplex some further extensions on AArch64.
>>>
>>>OK on this proposal and to install this patch to gcc trunk?
>>>
>>> Hi GDB, Binutils maintainer:
>>>
>>>OK on this proposal and install this patch to binutils-gdb master?
>>>
>>> include/
>>> 2016-11-29   Richard Earnshaw  <rearn...@arm.com>
>>>   Jiong Wang  <jiong.w...@arm.com>
>>>
>>>  * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea.
>>
>> Ping~
> Ping^2

Ping^3

Can DWARF maintainers or global reviewers have a look at this?

Thanks very much.

-- 
Regards,
Jiong


[Ping^2][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-19 Thread Jiong Wang

Jiong Wang writes:

> Jiong Wang writes:
>
>> On 16/11/16 14:02, Jakub Jelinek wrote:
>>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:
>>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
>>>>>   The two operations DW_OP_AARCH64_paciasp and 
>>>>> DW_OP_AARCH64_paciasp_deref were
>>>>> designed as shortcut operations when LR is signed with A key and using
>>>>> function's CFA as salt.  This is the default behaviour of return address
>>>>> signing so is expected to be used for most of the time.  
>>>>> DW_OP_AARCH64_pauth
>>>>> is designed as a generic operation that allow describing pointer signing 
>>>>> on
>>>>> any value using any salt and key in case we can't use the shortcut 
>>>>> operations
>>>>> we can use this.
>>>>
>>>> I admit to not fully understand the salting/keying involved. But given
>>>> that the DW_OP space is really tiny, so we would like to not eat up too
>>>> many of them for new opcodes. And given that introducing any new DW_OPs
>>>> using for CFI unwinding will break any unwinder anyway causing us to
>>>> update them all for this new feature. Have you thought about using a new
>>>> CIE augmentation string character for describing that the return
>>>> address/link register used by a function/frame is salted/keyed?
>>>>
>>>> This seems a good description of CIE records and augmentation
>>>> characters:http://www.airs.com/blog/archives/460
>>>>
>>>> It obviously also involves updating all unwinders to understand the new
>>>> augmentation character (and possible arguments). But it might be more
>>>> generic and saves us from using up too many DW_OPs.
>>>
>>> From what I understood, the return address is not always scrambled, so
>>> it doesn't apply to the whole function, just to most of it (except for
>>> an insn in the prologue and some in the epilogue).  So I think one op is
>>> needed.  But can't it be just a toggable flag whether the return address
>>> is scrambled + some arguments to it?
>>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
>>> way of scrambling starts here (if not already active) or any kind of
>>> scrambling ends here (if already active), and
>>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need
>>> to represent details of the less common variants with details what to do.
>>> Then you'd just hook through some MD_* macro in the unwinder the
>>> descrambling operation if the scrambling is active at the insns you unwind
>>> on.
>>>
>>>   Jakub
>>
>> Hi Mark, Jakub:
>>
>>Thanks very much for the suggestions.
>>
>>I have done some experiments on your ideas and am thinking it's good to
>>combine them together.  The use of DW_CFA instead of DW_OP can avoid 
>> building
>>all information from scratch at each unwind location, while we can 
>> indicate
>>the signing key index through new AArch64 CIE augmentation 'B'. This new
>>approach reduce the unwind table size overhead from ~25% to ~5% when 
>> return
>>address signing enabled, it also largely simplified dwarf generation code 
>> for
>>return address signing.
>>
>>As one new DWARF call frame instruction is needed for AArch64, I want to 
>> reuse
>>DW_CFA_GNU_window_save to save the space.  It is in vendor extension 
>> space and
>>used for Sparc only, I think it make sense to reuse it for AArch64. On
>>AArch64, DW_CFA_GNU_window_save toggle return address sign status which 
>> kept
>>in a new boolean type column in DWARF table,  so DW_CFA_GNU_window_save 
>> takes
>>no argument on AArch64, the same as on Sparc, this makes no difference to 
>> those
>>existed encoding, length calculation code.
>>
>>Meanwhile one new DWARF expression operation number is still needed for
>>AArch64, it's useful for describing those complex pointer signing 
>> scenarios
>>and it will be used to multiplex some further extensions on AArch64.
>>
>>OK on this proposal and to install this patch to gcc trunk?
>>
>> Hi GDB, Binutils maintainer:
>>
>>OK on this proposal and install this patch to binutils-gdb master?
>>
>> include/
>> 2016-11-29   Richard Earnshaw  <rearn...@arm.com>
>>   Jiong Wang  <jiong.w...@arm.com>
>>
>>  * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea.
>
> Ping~

Ping^2

-- 
Regards,
Jiong


Re: [Ping~][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-12 Thread Jiong Wang

Jiong Wang writes:

> On 16/11/16 14:02, Jakub Jelinek wrote:
>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:
>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
>>>>   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref 
>>>> were
>>>> designed as shortcut operations when LR is signed with A key and using
>>>> function's CFA as salt.  This is the default behaviour of return address
>>>> signing so is expected to be used for most of the time.  
>>>> DW_OP_AARCH64_pauth
>>>> is designed as a generic operation that allow describing pointer signing on
>>>> any value using any salt and key in case we can't use the shortcut 
>>>> operations
>>>> we can use this.
>>>
>>> I admit to not fully understand the salting/keying involved. But given
>>> that the DW_OP space is really tiny, so we would like to not eat up too
>>> many of them for new opcodes. And given that introducing any new DW_OPs
>>> using for CFI unwinding will break any unwinder anyway causing us to
>>> update them all for this new feature. Have you thought about using a new
>>> CIE augmentation string character for describing that the return
>>> address/link register used by a function/frame is salted/keyed?
>>>
>>> This seems a good description of CIE records and augmentation
>>> characters:http://www.airs.com/blog/archives/460
>>>
>>> It obviously also involves updating all unwinders to understand the new
>>> augmentation character (and possible arguments). But it might be more
>>> generic and saves us from using up too many DW_OPs.
>>
>> From what I understood, the return address is not always scrambled, so
>> it doesn't apply to the whole function, just to most of it (except for
>> an insn in the prologue and some in the epilogue).  So I think one op is
>> needed.  But can't it be just a toggable flag whether the return address
>> is scrambled + some arguments to it?
>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
>> way of scrambling starts here (if not already active) or any kind of
>> scrambling ends here (if already active), and
>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need
>> to represent details of the less common variants with details what to do.
>> Then you'd just hook through some MD_* macro in the unwinder the
>> descrambling operation if the scrambling is active at the insns you unwind
>> on.
>>
>>   Jakub
>
> Hi Mark, Jakub:
>
>Thanks very much for the suggestions.
>
>I have done some experiments on your ideas and am thinking it's good to
>combine them together.  The use of DW_CFA instead of DW_OP can avoid 
> building
>all information from scratch at each unwind location, while we can indicate
>the signing key index through new AArch64 CIE augmentation 'B'. This new
>approach reduce the unwind table size overhead from ~25% to ~5% when return
>address signing enabled, it also largely simplified dwarf generation code 
> for
>return address signing.
>
>As one new DWARF call frame instruction is needed for AArch64, I want to 
> reuse
>DW_CFA_GNU_window_save to save the space.  It is in vendor extension space 
> and
>used for Sparc only, I think it make sense to reuse it for AArch64. On
>AArch64, DW_CFA_GNU_window_save toggle return address sign status which 
> kept
>in a new boolean type column in DWARF table,  so DW_CFA_GNU_window_save 
> takes
>no argument on AArch64, the same as on Sparc, this makes no difference to 
> those
>existed encoding, length calculation code.
>
>Meanwhile one new DWARF expression operation number is still needed for
>AArch64, it's useful for describing those complex pointer signing scenarios
>and it will be used to multiplex some further extensions on AArch64.
>
>OK on this proposal and to install this patch to gcc trunk?
>
> Hi GDB, Binutils maintainer:
>
>OK on this proposal and install this patch to binutils-gdb master?
>
> include/
> 2016-11-29   Richard Earnshaw  <rearn...@arm.com>
>   Jiong Wang  <jiong.w...@arm.com>
>
>  * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea.

Ping~

Thanks.

-- 
Regards,
Jiong


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-12-01 Thread Jiong Wang

On 01/12/16 10:42, Richard Earnshaw (lists) wrote:

On 30/11/16 21:43, Cary Coutant wrote:

How about if instead of special DW_OP codes, you instead define a new
virtual register that contains the mangled return address? If the rule
for that virtual register is anything other than DW_CFA_undefined,
you'd expect to find the mangled return address using that rule;
otherwise, you would use the rule for LR instead and expect an
unmangled return address. The earlier example would become (picking an
arbitrary value of 120 for the new virtual register number):

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_val 120, DW_OP_reg30
0x4  stp x29, x30, [sp, -32]!
 .cfi_offset 120, -16
 .cfi_offset 29, -32
 .cfi_def_cfa_offset 32
0x8  add x29, sp, 0

Just a suggestion...

What about signing other registers?  And what if the value is then
copied to another register?  Don't you end up with every possible
register (including the FP/SIMD registers) needing a shadow copy?


  
  Another issue is compared with the DW_CFA approach, this virtual register

  approach is less efficient on unwind table size and complexer to implement.

  .cfi_register takes two ULEB128 register number, it needs 3 bytes rather
   than DW_CFA's 1 byte.  From example .debug_frame section size for linux
   kernel increment will be ~14% compared with DW_CFA approach's 5%.

  In the implementation, the prologue then normally will be

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_val 120, DW_OP_reg30  <-A
0x4  stp x29, x30, [sp, -32]!
 .cfi_offset 120, -16   <-B
 .cfi_offset 29, -32
 .cfi_def_cfa_offset 32

The epilogue normally will be
...
ldp x29, x30, [sp], 32
  .cfi_val 120, DW_OP_reg30  <- C
  .cfi_restore 29
  .cfi_def_cfa 31, 0

autiasp (this instruction unsign LR/X30)
  .cfi_restore 30

   For the virual register approach, GCC needs to track dwarf generation for
   LR/X30 in every place (A/B/C, maybe some other rare LR copy places), and
   rewrite LR to new virtual register accordingly. This seems easy, but my
   practice shows GCC won't do any DWARF auto-deduction if you have one
   explict DWARF CFI note attached to an insn (handled_one will be true in
   dwarf2out_frame_debug).  So for instruction like stp/ldp, we then need to
   explicitly generate all three DWARF CFI note manually.

   While for DW_CFA approach, they will be:

 .cfi_startproc
0x0  paciasp (this instruction sign return address register LR/X30)
 .cfi_cfa_window_save
0x4  stp x29, x30, [sp, -32]! \
 .cfi_offset 30, -16  |
 .cfi_offset 29, -32  |
 .cfi_def_cfa_offset 32   |  all dwarf generation between sign 
and
...   |  unsign (paciasp/autiasp) is the 
same
ldp x29, x30, [sp], 16|  as before
  .cfi_restore 30 |
  .cfi_restore 29 |
  .cfi_def_cfa 31, 0  |
  /
autiasp (this instruction unsign LR/X30)
  .cfi_cfa_window_save

   The DWARF generation implementation in backend is very simple, nothing needs 
to be
   updated between sign and unsign instruction.

 For the impact on the unwinder, the virtual register approach needs to change
 the implementation of "save value" rule which is quite general code. A target 
hook
 might need for AArch64 that when the destination register is the special 
virtual
 register, it seems a little bit hack to me.


-cary


On Wed, Nov 16, 2016 at 6:02 AM, Jakub Jelinek <ja...@redhat.com> wrote:

On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:

On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:

   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were
designed as shortcut operations when LR is signed with A key and using
function's CFA as salt.  This is the default behaviour of return address
signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
is designed as a generic operation that allow describing pointer signing on
any value using any salt and key in case we can't use the shortcut operations
we can use this.

I admit to not fully understand the salting/keying involved. But given
that the DW_OP space is really tiny, so we would like to not eat up too
many of them for new opcodes. And given that introducing any new DW_OPs
using for CFI unwinding will break any unwinder anyway causing us to
update them all for this new feature. Have you thought about using a new
CIE augmentation string character for describing that the return
address/link register used by a function/frame is salted/keyed?

This seems a good description of CIE records an

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-30 Thread Jiong Wang

On 16/11/16 14:02, Jakub Jelinek wrote:

On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:

On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:

  The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were
designed as shortcut operations when LR is signed with A key and using
function's CFA as salt.  This is the default behaviour of return address
signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
is designed as a generic operation that allow describing pointer signing on
any value using any salt and key in case we can't use the shortcut operations
we can use this.


I admit to not fully understand the salting/keying involved. But given
that the DW_OP space is really tiny, so we would like to not eat up too
many of them for new opcodes. And given that introducing any new DW_OPs
using for CFI unwinding will break any unwinder anyway causing us to
update them all for this new feature. Have you thought about using a new
CIE augmentation string character for describing that the return
address/link register used by a function/frame is salted/keyed?

This seems a good description of CIE records and augmentation
characters:http://www.airs.com/blog/archives/460

It obviously also involves updating all unwinders to understand the new
augmentation character (and possible arguments). But it might be more
generic and saves us from using up too many DW_OPs.


From what I understood, the return address is not always scrambled, so
it doesn't apply to the whole function, just to most of it (except for
an insn in the prologue and some in the epilogue).  So I think one op is
needed.  But can't it be just a toggable flag whether the return address
is scrambled + some arguments to it?
Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
way of scrambling starts here (if not already active) or any kind of
scrambling ends here (if already active), and
DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need
to represent details of the less common variants with details what to do.
Then you'd just hook through some MD_* macro in the unwinder the
descrambling operation if the scrambling is active at the insns you unwind
on.

  Jakub


Hi Mark, Jakub:

  Thanks very much for the suggestions.

  I have done some experiments on your ideas and am thinking it's good to
  combine them together.  The use of DW_CFA instead of DW_OP can avoid building
  all information from scratch at each unwind location, while we can indicate
  the signing key index through new AArch64 CIE augmentation 'B'. This new
  approach reduce the unwind table size overhead from ~25% to ~5% when return
  address signing enabled, it also largely simplified dwarf generation code for
  return address signing.

  As one new DWARF call frame instruction is needed for AArch64, I want to reuse
  DW_CFA_GNU_window_save to save the space.  It is in vendor extension space and
  used for Sparc only, I think it make sense to reuse it for AArch64. On
  AArch64, DW_CFA_GNU_window_save toggle return address sign status which kept
  in a new boolean type column in DWARF table,  so DW_CFA_GNU_window_save takes
  no argument on AArch64, the same as on Sparc, this makes no difference to 
those
  existed encoding, length calculation code.

  Meanwhile one new DWARF expression operation number is still needed for
  AArch64, it's useful for describing those complex pointer signing scenarios
  and it will be used to multiplex some further extensions on AArch64.

  OK on this proposal and to install this patch to gcc trunk?

Hi GDB, Binutils maintainer:

  OK on this proposal and install this patch to binutils-gdb master?

include/
2016-11-29   Richard Earnshaw  <rearn...@arm.com>
     Jiong Wang  <jiong.w...@arm.com>

* dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea.


diff --git a/include/dwarf2.def b/include/dwarf2.def
index bb916ca238221151cf49359c25fd92643c7e60af..f3892a20da1fe13ddb419e5d7eda07f2c8d8b0c6 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -684,6 +684,12 @@ DW_OP (DW_OP_HP_unmod_range, 0xe5)
 DW_OP (DW_OP_HP_tls, 0xe6)
 /* PGI (STMicroelectronics) extensions.  */
 DW_OP (DW_OP_PGI_omp_thread_num, 0xf8)
+/* AARCH64 extensions.
+   DW_OP_AARCH64_operation takes one mandatory unsigned LEB128 operand.
+   Bits[6:0] of this operand is the action code, all others bits are initialized
+   to 0 except explicitly documented for one action.  Please refer AArch64 DWARF
+   ABI documentation for details.  */
+DW_OP (DW_OP_AARCH64_operation, 0xea)
 DW_END_OP
 
 DW_FIRST_ATE (DW_ATE_void, 0x0)
@@ -765,7 +771,8 @@ DW_CFA (DW_CFA_hi_user, 0x3f)
 
 /* SGI/MIPS specific.  */
 DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d)
-/* GNU extensions.  */
+/* GNU extensions.
+   NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64.  */
 DW_CFA (DW_CFA_GNU_window_save, 0x2d)
 DW_CFA (DW_CFA_GNU_args_size, 0x2e)
 DW_CFA (DW_CFA_GNU_negative_off

Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-11-24 Thread Jiong Wang

gcc/
2016-11-11  Jiong Wang  <jiong.w...@arm.com>
* function.c (expand_function_end): Guard stack_protect_epilogue
with
ENABLE_DEFAULT_SSP_RUNTIME.
* cfgexpand.c (pass_expand::execute): Likewise guard for
stack_protect_prologue.
* defaults.h (ENABLE_DEFAULT_SSP_RUNTIME): New macro.  Default
set to 1.
* doc/tm.texi.in (Misc): Documents ENABLE_DEFAULT_SSP_RUNTIME.
* doc/tm.texi: Regenerate.


Like Joseph, I think this should be a hook rather than a new target macro.  I 
do think it's closer to the right track though (separation of access to the 
guard from the rest of the SSP runtime bits).


Hi Josephy, Jeff:

  Thanks for the review.

  I was planning to update the patch after resolving the pending DWARF issue 
(https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01156.html),
  While as this patch itself it quite independent, so OK to commit the attached 
patch?  x86-64 boostrap and regression OK.


gcc/
2016-11-24  Jiong Wang  <jiong.w...@arm.com>
* target.def (stack_protect_runtime_enabled_p): New.
* function.c (expand_function_end): Guard stack_protect_epilogue with
targetm.stack_protect_runtime_enabled_p.
* cfgexpand.c (pass_expand::execute): Likewise.
* calls.c (expand_call): Likewise.
* doc/tm.texi.in (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Add it.
* doc/tm.texi: Regenerate.

diff --git a/gcc/calls.c b/gcc/calls.c
index c916e07..21385ce 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3083,7 +3083,9 @@ expand_call (tree exp, rtx target, int ignore)
   if (pass && (flags & ECF_MALLOC))
 	start_sequence ();
 
-  if (pass == 0 && crtl->stack_protect_guard)
+  if (pass == 0
+	  && crtl->stack_protect_guard
+	  && targetm.stack_protect_runtime_enabled_p ())
 	stack_protect_epilogue ();
 
   adjusted_args_size = args_size;
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 7ffb558..9c5a892 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6334,7 +6334,7 @@ pass_expand::execute (function *fun)
 
   /* Initialize the stack_protect_guard field.  This must happen after the
  call to __main (if any) so that the external decl is initialized.  */
-  if (crtl->stack_protect_guard)
+  if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ())
 stack_protect_prologue ();
 
   expand_phi_nodes ();
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 84bba07..c4f4ec3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4949,6 +4949,10 @@ The default version of this hook invokes a function called
 normally defined in @file{libgcc2.c}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_PROTECT_RUNTIME_ENABLED_P (void)
+Returns true if the target wants GCC's default stack protect runtime support, otherwise return false.  The default implementation always returns true.
+@end deftypefn
+
 @deftypefn {Common Target Hook} bool TARGET_SUPPORTS_SPLIT_STACK (bool @var{report}, struct gcc_options *@var{opts})
 Whether this target supports splitting the stack when the options described in @var{opts} have been passed.  This is called after options have been parsed, so the target may reject splitting the stack in some configurations.  The default version of this hook returns false.  If @var{report} is true, this function may issue a warning or error; if @var{report} is false, it must simply return a value
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9afd5daa..9202bfe6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3825,6 +3825,8 @@ generic code.
 
 @hook TARGET_STACK_PROTECT_FAIL
 
+@hook TARGET_STACK_PROTECT_RUNTIME_ENABLED_P
+
 @hook TARGET_SUPPORTS_SPLIT_STACK
 
 @node Miscellaneous Register Hooks
diff --git a/gcc/function.c b/gcc/function.c
index 0b1d168..871f5a0 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5627,7 +5627,7 @@ expand_function_end (void)
 emit_insn (gen_blockage ());
 
   /* If stack protection is enabled for this function, check the guard.  */
-  if (crtl->stack_protect_guard)
+  if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ())
 stack_protect_epilogue ();
 
   /* If we had calls to alloca, and this machine needs
diff --git a/gcc/target.def b/gcc/target.def
index c24b4cf..a63b850 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4039,6 +4039,15 @@ normally defined in @file{libgcc2.c}.",
  tree, (void),
  default_external_stack_protect_fail)
 
+/* This target hook allows the operating system to disable the default stack
+   protector runtime support.  */
+DEFHOOK
+(stack_protect_runtime_enabled_p,
+ "Returns true if the target wants GCC's default stack protect runtime support,\
+ otherwise return false.  The default implementation always returns true.",
+ bool, (void),
+ hook_bool_void_true)
+
 DEFHOOK
 (can_use_doloop_p,
  "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-16 Thread Jiong Wang

On 15/11/16 19:25, Richard Earnshaw (lists) wrote:

On 15/11/16 16:48, Jiong Wang wrote:

On 15/11/16 16:18, Jakub Jelinek wrote:

I know nothing about the aarch64 return address signing, would all 3
or say
2 usually appear together without any separate pc advance, or are they
all
going to appear frequently and at different pcs?


  I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at
different pcs.
For example, the following function prologue, there are three
instructions
at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The
"paciasp" always
mangle LR register using SP as salt and write back the value into LR.
We then generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is
mangled in this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will
be pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes:
first fetch the
mangled value from stack offset -16, then do whatever to restore the
original value
from the mangled value.  This is represented by
(DW_OP_AARCH64_paciasp_deref, offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0



Now I'm confused.

I was thinking that we needed one opcode for the sign operation in the
prologue and one for the unsign/validate operation in the epilogue (to
support non-call exceptions.


  IMO, non-call exceptions is fine, it looks to me doesn't need extra
description as for non-call exceptions (exceptions thrown from signal
handler) the key point is how to unwind across signal frame.  For libgcc EH
unwinder, when normal unwinding failed, it will fall back to architecture
unwinding hook which restore some information from signal frame which is just
on top of the signal handler's frame.

  I can see AArch64 implementation will setup return address column like the
following logic where "sc->pc" is initialized by kernel and it's not signed
therefore should sucess on further unwinding.

fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].how =
  REG_SAVED_VAL_OFFSET;
fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].loc.offset =
  (_Unwind_Ptr) (sc->pc) - new_cfa;


But why do we need a separate code to say
that a previously signed value has now been pushed on the stack?  Surely
that's just a normal store operation that can be tracked through the
unwinding state machine.


  I was thinking the same thing, but found it doesn't work.  My understanding
of frame unwinding described at DWARF specification is: there are two steps
for frame unwinding.  The first step is to calculate register restore rules.
Unwinder scans register rules from function start to the unwinding PC, one
rule will be *overridden* by the next for the same register, there is *no
inheritance*.  The second step is then to evaluate all the final rules
collected at the unwinding PC.  According to the rule, either fetch the value
from stack or evaluate the value on DWARF expression stack etc.

 I also had tried to modify ".cfi_val_expression" at offset 0x4 in above
example into ".cfi_offset 30, -24", libgcc EH unwinder just doesn't work.



I was expecting the third opcode to be needed for the special operations
that are not frequently used by the compiler.


 The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were
designed as shortcut operations when LR is signed with A key and using
function's CFA as salt.  This is the default behaviour of return address
signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
is designed as a generic operation that allow describing pointer signing on
any value using any salt and key in case we can't use the shortcut operations
we can use this.



Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang

On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


 I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
For example, the following function prologue, there are three instructions
at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0



  Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Jakub




Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang



On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


  I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
  
  For example, the following function prologue, there are three instructions

at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0


Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...




Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang

On 11/11/16 19:38, Jakub Jelinek wrote:


On Fri, Nov 11, 2016 at 06:21:48PM +, Jiong Wang wrote:

This patch introduces three AARCH64 private DWARF operations in vendor extension
space.

DW_OP_AARCH64_pauth 0xea
===
   Takes one unsigned LEB 128 Pointer Authentication Description. Bits [3:0] of
   the description contain the Authentication Action Code. All unused bits are
   initialized to 0. The operation then proceeds according to the value of the
   action code as described in the Action Code Table.

DW_OP_AARCH64_paciasp 0xeb
===
   Authenticates the contents in X30/LR register as per A key for instruction
   pointer using current CFA as salt. The result is pushed onto the stack.

DW_OP_AARCH64_paciasp_deref 0xec
===
   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

Jakub


   
  Seperate DWARF operations are introduced instead of combining all of them into

one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

  IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

  In include/dwarf2.def, I saw DW_OP_GNU* has reserved 13, DW_OP_HP* has 
reserved
7 and DW_OP_PGI has reserved 1.

  So for an alternative approach, can these AArch64 extensions overlap and reuse
those numbers reserved for DW_OP_HP* ? for example 0xe4, 0xe5, 0xe6.  I am even
thinking GNU toolchain makes the 8 numbers reserved by existed DW_OP_HP* and
DW_OP_SGI* as architecture vendor area and allow multiplexing on them for
different architectures.  This may offer more flexibilities for architecture
vendors.

  Under current code base, my search shows the overlap should be safe inside
GCC/GDB and we only needs minor disassemble tweak in Binutils.

  Thanks.

Regards,
Jiong



Re: [7/9][AArch64, libgcc] Let AArch64 use customized unwinder file

2016-11-14 Thread Jiong Wang

On 11/11/16 22:12, Joseph Myers wrote:

On Fri, 11 Nov 2016, Jiong Wang wrote:


There are two ways of introducing these AArch64 support:
   * Introducing a few target macros so we can customize functions like
 uw_init_context, uw_install_context etc.
   * Use target private unwind-dw2 implementation, i.e duplicate the generic
 unwind-dw2.c into target config directory and use it instead of generic
one.
 This is current used by IA64 and CR16 is using.

I am not sure which approach is the convention in libgcc, Ian, any comments on
this?

Although as you note duplication has been used before, I think it should
be strongly discouraged; duplicated files are unlikely to be kept up to
date with relevant changes to the main file.


Hi Joseph,


  The changes AArch64 needs to do on top of the generic unwind-dw2.c is at:

 https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01167.html

  If I don't duplicate unwind-dw2.c, then I need to guard those changes with
something like __aarch64__ or introduce several target macros.  It looks to me
only the hunk that supports AArch64 DWARF operations worth a target macro,
something like MD_DW_OP_HANDLER, for the other changes they are quite scattered,
for example the field extension on "struct _Unwind_Context" and the relax of
assertion on uw_install_context_1.

  Any comments on this?

  Thanks.

Regards,
Jiong




Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-11-11 Thread Jiong Wang

On 24/10/16 16:22, Jeff Law wrote:


Asserting couldn't hurt.  I'd much rather have the compiler issue an error, ICE 
or somesuch than silently not generate a call to the stack protector fail 
routine.


Hi Jeff,

  I have just send out the other patch which accelerates -fstack-protector on
  AArch64, more background information there at:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01168.html

  Previous, I was emptying three target insns/hooks, and was relying GCC to
  optimize all remaining SSP runtime stuff out.  I am thinking it's better and
  safer that GCC allow one backend to disable the default SSP runtime cleanly,
  so the backend does't need to rely on optimization level, and libssp is not
  needed at all optimization level.
 
  In this new patch, I introduced a new target macro for SSP to allow one

  backend GCC's default SSP runtime generation be disabled.

  How does this looks to you?

  Thanks.

gcc/
2016-11-11  Jiong Wang  <jiong.w...@arm.com>
* function.c (expand_function_end): Guard stack_protect_epilogue with
ENABLE_DEFAULT_SSP_RUNTIME.
* cfgexpand.c (pass_expand::execute): Likewise guard for
stack_protect_prologue.
* defaults.h (ENABLE_DEFAULT_SSP_RUNTIME): New macro.  Default set to 1.
* doc/tm.texi.in (Misc): Documents ENABLE_DEFAULT_SSP_RUNTIME.
* doc/tm.texi: Regenerate.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 130a16b1d7d06c4ec9e31439037ffcbcbd0e085f..99f055d2db622f7acd393a223b3968be12b6235f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6343,7 +6343,7 @@ pass_expand::execute (function *fun)
 
   /* Initialize the stack_protect_guard field.  This must happen after the
  call to __main (if any) so that the external decl is initialized.  */
-  if (crtl->stack_protect_guard)
+  if (crtl->stack_protect_guard && ENABLE_DEFAULT_SSP_RUNTIME)
 stack_protect_prologue ();
 
   expand_phi_nodes ();
diff --git a/gcc/defaults.h b/gcc/defaults.h
index af8fe916be49e745c842d992a5af372c46ec2fe3..ec5e52c9761e3e5aee5274c54628157d0bde1808 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1404,6 +1404,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # define DEFAULT_FLAG_SSP 0
 #endif
 
+/* Supply a default definition of ENABLE_DEFAULT_SSP_RUNTIME.  GCC use this to
+   decide whether stack_protect_prologue and stack_protect_epilogue based
+   runtime support should be generated.  */
+
+#ifndef ENABLE_DEFAULT_SSP_RUNTIME
+#define ENABLE_DEFAULT_SSP_RUNTIME 1
+#endif
+
 /* Provide default values for the macros controlling stack checking.  */
 
 /* The default is neither full builtin stack checking...  */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 586626062435f3718cfae84c6aab3024d08d79d7..64d20bc493470221286b6248354f0d6122405cb6 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10487,6 +10487,14 @@ The default implementation does nothing.
 @c prevent bad page break with this line
 Here are several miscellaneous parameters.
 
+@defmac ENABLE_DEFAULT_SSP_RUNTIME
+Define this boolean macro to indicate whether or not your architecture
+use GCC default stack protector runtime.  If this macro is set to false,
+stack_protect_prologue and stack_protect_epilogue based runtime support will be
+generated, otherwise GCC assumes your architecture generates private runtime
+support.  This macro is default set to true.
+@end defmac
+
 @defmac HAS_LONG_COND_BRANCH
 Define this boolean macro to indicate whether or not your architecture
 has conditional branches that can span all of memory.  It is used in
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index da133a4b7010533d85d5bb9a850b91e8a80ce1ca..729c76fa182076828a5819ab391b4f61fb80a771 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7499,6 +7499,14 @@ c_register_addr_space ("__ea", ADDR_SPACE_EA);
 @c prevent bad page break with this line
 Here are several miscellaneous parameters.
 
+@defmac ENABLE_DEFAULT_SSP_RUNTIME
+Define this boolean macro to indicate whether or not your architecture
+use GCC default stack protector runtime.  If this macro is set to false,
+stack_protect_prologue and stack_protect_epilogue based runtime support will be
+generated, otherwise GCC assumes your architecture generates private runtime
+support.  This macro is default set to true.
+@end defmac
+
 @defmac HAS_LONG_COND_BRANCH
 Define this boolean macro to indicate whether or not your architecture
 has conditional branches that can span all of memory.  It is used in
diff --git a/gcc/function.c b/gcc/function.c
index 53bad8736e9ef251347d23d40bc0ab767a979bc7..9dce8929590f6cb06155a540e33960c2cf0e3b16 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5624,7 +5624,7 @@ expand_function_end (void)
 emit_insn (gen_blockage ());
 
   /* If stack protection is enabled for this function, check the guard.  */
-  if (crtl->stack_protect_guard)
+  if (crtl->stack_protect_guard && ENABLE_DEFA

[9/9][RFC][AArch64] Accelerate -fstack-protector through pointer authentication extension

2016-11-11 Thread Jiong Wang

This patch accelerates GCC's existed -fstack-protector using ARMv8.3-A pointer
authentication instructions.

Given AArch64 currently has the following stack layout:

  | caller's LR
  | 
  |
  | canary<- sentinel for -fstack-protector
  | locals (buffer located here)
  |
  |
  | other callees
  |
  | callee's LR  <- sentinel for -msign-return-address
  |
  |

we can switch locals and callees,

| ...
| vararg
|
| other callee
|
| LR
|
| locals (buffer located here)

We then sign LR and make it serve as canary value.  There are several benefits
of this approach:

  *  It's evetually -msign-return-address + swap locals and callees areas.
  *  Require nearly no modifications on prologue and epilogue, avoid making them
 complexer.
  *  No need of any other runtime support, libssp is not required.

The runtime overhead before and after this patch will be:

  o canary insert

GCC default SSP runtime was loading from global variable "__stack_chk_guard"
initilized in libssp:

  adrpx19, _GLOBAL_OFFSET_TABLE_
  ldr x19, [x19, #:gotpage_lo15:__stack_chk_guard]
  ldr x2, [x19]
  str x2, [x29, 56]

this patch accelerats into:

  sign lr

  o canary check

GCC default SSP runtime was reloading from stack, then comparing with
original value and branch to abort function:

  ldr x2, [x29, 56]
  ldr x1, [x19]
  eor x1, x2, x1
  cbnzx1, .L5
  ...
  ret
  .L5:
  bl  __stack_chk_fail

acclerated into:

  aut lr + ret or retaa
the the canary value (signed LR) fails authentication, the return to invalid
address will cause exception.

NOTE, this approach however requires DWARF change as the original LR is signed,
the binary needs new libgcc to make sure c++ eh works correctly.  Given this
acceleration already needs the user specify -mstack-protector-dialect=pauth
which means the target platform largely should have install new libgcc, 
otherwise
you can't utilize new pointer authentication features.

gcc/
2016-11-11  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-opts.h (aarch64_stack_protector_type): New
enum.
(aarch64_layout_frame): Swap callees and locals when
-mstack-protector-dialect=pauth specified.
(aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN instead
of AARCH64_ENABLE_RETURN_ADDRESS_SIGN.
(aarch64_expand_epilogue): Likewise.
* config/aarch64/aarch64.md (*do_return): Likewise.
(aarch64_override_options): Sanity check for ILP32 and TARGET_PAUTH.
* config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION, AARCH64_PAUTH_SSP,
AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines.
* config/aarch64/aarch64.opt (-mstack-protector-dialect=): New option.
* doc/invoke.texi (AArch64 Options): Documents
-mstack-protector-dialect=.

diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 41c14b38a6188d399eb04baca2896e033c03ff1b..ff464ea5675146d62f0b676fe776f882fc1b8d80 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -99,4 +99,10 @@ enum aarch64_function_type {
   AARCH64_FUNCTION_ALL
 };
 
+/* GCC standard stack protector (Canary insertion based) types for AArch64.  */
+enum aarch64_stack_protector_type {
+  STACK_PROTECTOR_TRAD,
+  STACK_PROTECTOR_PAUTH
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 907e8bdf5b4961b3107dcd5a481de28335e4be89..73ef2677a11450fe21f765011317bd3367ef0d94 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -982,4 +982,25 @@ enum aarch64_pauth_action_type
   AARCH64_PAUTH_AUTH
 };
 
+/* Pointer authentication accelerated -fstack-protector.  */
+#define AARCH64_PAUTH_SSP_OPTION \
+  (TARGET_PAUTH && aarch64_stack_protector_dialect == STACK_PROTECTOR_PAUTH)
+
+#define AARCH64_PAUTH_SSP \
+  (crtl->stack_protect_guard && AARCH64_PAUTH_SSP_OPTION)
+
+#define AARCH64_PAUTH_SSP_OR_RA_SIGN \
+  (AARCH64_PAUTH_SSP || AARCH64_ENABLE_RETURN_ADDRESS_SIGN)
+
+#ifndef TARGET_LIBC_PROVIDES_SSP
+#define LINK_SSP_SPEC "%{!mstack-protector-dialect=pauth:\
+			 %{fstack-protector|fstack-protector-all\
+			   |fstack-protector-strong|fstack-protector-explicit:\
+			   -lssp_nonshared -lssp}}"
+#endif
+
+/* Don't use GCC default SSP runtime if pointer authentication acceleration
+   enabled.  */
+#define ENABLE_DEFAULT_SSP_RUNTIME  !(AARCH64_PAUTH_SSP_OPTION)
+
 #endif /* GCC_AARCH64_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cae177dca511fdb909ef82c972d3bbdebab215e2..c469baf92268ff894f5cf0ea9f5dbd4180714b98 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2993,6 +2993,15 @@ aarch64_layo

[8/9][AArch64, libgcc] Runtime support for AArch64 DWARF operations

2016-11-11 Thread Jiong Wang

This patch add AArch64 specific runtime EH unwinding support for
DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref.

The semantics of them are described at the specification in patch [1/9].

The support includes:
  * Parsing these DWARF operations.  Perform unwinding actions according to
their semantics.

  * Handling eh_return multi return paths.
Function calling __builtin_eh_return (_Unwind_RaiseException*) will have
multiple return paths.  One is for normal exit, the other is for install
EH handler.  If the _Unwind_RaiseException itself is return address signed,
then there will always be return address authentication before return,
however, if the return path in _Unwind_RaiseException if from installing EH
handler the address of which has already been authenticated during
unwinding,  then we need to re-sign that address, so when the execution flow
continues at _Unwind_RaiseException's epilogue, the authentication still
works correctly.


OK for trunk?

libgcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/unwind-aarch64.c (RA_SIGN_BIT): New flag to indicate
one frame is return address signed.
(execute_stack_op): Handle DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp,
DW_OP_AARCH64_paciasp_deref.
(uw_init_context): Call aarch64_uw_init_context_1.
(uw_init_context_1): Rename to aarch64_uw_init_context_1.  Strip
signature for seed address.
(uw_install_context): Re-sign handler's address so it works correctly
with caller's context.
(uw_install_context_1): by_value[LR] can be true, after return address
signing LR will come from DWARF value expression rule which is a
by_value true rule.



diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c
index 1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff..f6441a56960dbd4b754f8fc17d581402389a4812 100644
--- a/libgcc/config/aarch64/unwind-aarch64.c
+++ b/libgcc/config/aarch64/unwind-aarch64.c
@@ -37,6 +37,10 @@
 #include "gthr.h"
 #include "unwind-dw2.h"
 
+/* This AArch64 implementation is exactly the same as libgcc/unwind-dw2.c,
+   except we have a customized uw_init_context_1 to handle pointer
+   authentication.  */
+
 #ifdef HAVE_SYS_SDT_H
 #include 
 #endif
@@ -67,7 +71,7 @@
waste.  However, some runtime libraries supplied with ICC do contain such
an unorthodox transition, as well as the unwind info to match.  This loss
of register restoration doesn't matter in practice, because the exception
-   is caught in the native unix abi, where all of the xmm registers are 
+   is caught in the native unix abi, where all of the xmm registers are
call clobbered.
 
Ideally, we'd record some bit to notice when we're failing to restore some
@@ -136,6 +140,8 @@ struct _Unwind_Context
 #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1)
   /* Context which has version/args_size/by_value fields.  */
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
+  /* Return address has been signed.  */
+#define RA_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
   _Unwind_Word flags;
   /* 0 for now, can be increased when further fields are added to
  struct _Unwind_Context.  */
@@ -908,6 +914,89 @@ execute_stack_op (const unsigned char *op_ptr, const unsigned char *op_end,
 	case DW_OP_nop:
 	  goto no_push;
 
+	case DW_OP_AARCH64_paciasp:
+	  {
+	_Unwind_Word lr_value = _Unwind_GetGR (context, LR_REGNUM);
+	/* Note: initial is guaranteed to be CFA by DWARF specification.  */
+	result
+	  = (_Unwind_Word) __builtin_aarch64_autia1716 ((void *) lr_value,
+			initial);
+	context->flags |= RA_SIGNED_BIT;
+	break;
+	  }
+
+	case DW_OP_AARCH64_paciasp_deref:
+	  {
+	_sleb128_t offset;
+	op_ptr = read_sleb128 (op_ptr, );
+	result = (_Unwind_Word) read_pointer ((void *) initial + offset);
+	result
+	  = (_Unwind_Word) __builtin_aarch64_autia1716 ((void *) result,
+			initial);
+	context->flags |= RA_SIGNED_BIT;
+	break;
+	  }
+
+	case DW_OP_AARCH64_pauth:
+	  {
+	_uleb128_t auth_descriptor;
+	op_ptr = read_uleb128 (op_ptr, _descriptor);
+	enum aarch64_pauth_action_type action_code =
+	  (enum aarch64_pauth_action_type) (auth_descriptor & 0xf);
+	context->flags |= RA_SIGNED_BIT;
+
+	/* Different action may take different number of operands.
+	   AARCH64_PAUTH_DROP* takes one operand while AARCH64_PAUTH_AUTH
+	   takes two and both of them produce one result.  */
+	switch (action_code)
+	  {
+	  case AARCH64_PAUTH_DROP_I:
+		{
+		  /* Fetch the value to drop signature.  */
+		  stack_elt -= 1;
+		  result = stack[stack_elt];
+		  result
+		= (_Unwind_Word)
+		__builtin_aarch64_xpaclri ((void *) result);
+		  break;
+		}
+	  case AARCH64_PAUTH_AUTH:
+		{
+		  enum aarch64_pauth_k

[7/9][AArch64, libgcc] Let AArch64 use customized unwinder file

2016-11-11 Thread Jiong Wang

We need customized EH unwinder support for AArch64 DWARF operations introduced
earlier in this patchset, these changes mostly need to be done in the generic
file unwind-dw2.c.

There are two ways of introducing these AArch64 support:
  * Introducing a few target macros so we can customize functions like
uw_init_context, uw_install_context etc.
  * Use target private unwind-dw2 implementation, i.e duplicate the generic
unwind-dw2.c into target config directory and use it instead of generic one.
This is current used by IA64 and CR16 is using.

I am not sure which approach is the convention in libgcc, Ian, any comments on 
this?
Thanks.

This patch is the start of using approach 2 includes necessary Makefile support
and copying of original unwind-dw2.c.

A follow up patch will implement those AArch64 specific stuff so the change will
be very clear.

OK for trunk?

libgcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

* config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-linux*):
Include new AArch64 EH makefile.
* config/aarch64/t-eh-aarch64: New EH makefile.
* config/aarch64/unwind-aarch64.c: New EH unwinder implementation,
copied from unwind-dw2.c.

diff --git a/libgcc/config.host b/libgcc/config.host
index 002f650be9a7cd6f69ce3d51639a735ca7eba564..2bf90818c03e71bd3a601b607b98ac6b78fe763a 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -330,12 +330,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
 	extra_parts="$extra_parts crtfastmath.o"
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	tmake_file="${tmake_file} ${cpu_type}/t-eh-aarch64"
 	;;
 aarch64*-*-linux*)
 	extra_parts="$extra_parts crtfastmath.o"
 	md_unwind_header=aarch64/linux-unwind.h
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	tmake_file="${tmake_file} ${cpu_type}/t-eh-aarch64"
 	;;
 alpha*-*-linux*)
 	tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm alpha/t-linux"
diff --git a/libgcc/config/aarch64/t-eh-aarch64 b/libgcc/config/aarch64/t-eh-aarch64
new file mode 100644
index ..2ccc02d409ff850ec9db355a4d06efd125b4f68d
--- /dev/null
+++ b/libgcc/config/aarch64/t-eh-aarch64
@@ -0,0 +1,3 @@
+# Use customized EH unwinder implementation.
+LIB2ADDEH = $(srcdir)/config/aarch64/unwind-aarch64.c $(srcdir)/unwind-dw2-fde-dip.c \
+  $(srcdir)/unwind-sjlj.c $(srcdir)/unwind-c.c
diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c
new file mode 100644
index ..1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff
--- /dev/null
+++ b/libgcc/config/aarch64/unwind-aarch64.c
@@ -0,0 +1,1715 @@
+/* DWARF2 exception handling and frame unwind runtime interface routines.
+   Copyright (C) 1997-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "tconfig.h"
+#include "tsystem.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "libgcc_tm.h"
+#include "dwarf2.h"
+#include "unwind.h"
+#ifdef __USING_SJLJ_EXCEPTIONS__
+# define NO_SIZE_OF_ENCODED_VALUE
+#endif
+#include "unwind-pe.h"
+#include "unwind-dw2-fde.h"
+#include "gthr.h"
+#include "unwind-dw2.h"
+
+#ifdef HAVE_SYS_SDT_H
+#include 
+#endif
+
+#ifndef __USING_SJLJ_EXCEPTIONS__
+
+#ifndef __LIBGCC_STACK_GROWS_DOWNWARD__
+#define __LIBGCC_STACK_GROWS_DOWNWARD__ 0
+#else
+#undef __LIBGCC_STACK_GROWS_DOWNWARD__
+#define __LIBGCC_STACK_GROWS_DOWNWARD__ 1
+#endif
+
+/* Dwarf frame registers used for pre gcc 3.0 compiled glibc.  */
+#ifndef PRE_GCC3_DWARF_FRAME_REGISTERS
+#define PRE_GCC3_DWARF_FRAME_REGISTERS __LIBGCC_DWARF_FRAME_REGISTERS__
+#endif
+
+/* ??? For the public function interfaces, we tend to gcc_assert that the
+   column numbers 

[6/9][AArch64] Add builtins support for pac/aut/xpac

2016-11-11 Thread Jiong Wang

This patch implements a few ARMv8.3-A new builtins for pointer sign and
authentication instructions.

Currently, these builtins are supposed to be used by libgcc EH unwinder
only.  They are not public interface to external user.

OK to install?

gcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New entries
for AARCH64_PAUTH_BUILTIN_PACI1716, AARCH64_PAUTH_BUILTIN_AUTIA1716,
AARCH64_PAUTH_BUILTIN_AUTIB1716, AARCH64_PAUTH_BUILTIN_XPACLRI.
(aarch64_init_v8_3_builtins): New.
(aarch64_init_builtins): Call aarch64_init_builtins.
(arch64_expand_builtin): Expand new builtins.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 9136910cd324a391de929ea9d1a13419dbcfb8bc..20679a5d3f6138f4c55b84f3aff5dfd0341e6787 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -353,6 +353,11 @@ enum aarch64_builtins
   AARCH64_CRC32_BUILTIN_BASE,
   AARCH64_CRC32_BUILTINS
   AARCH64_CRC32_BUILTIN_MAX,
+  /* ARMv8.3-A Pointer Authentication Builtins.  */
+  AARCH64_PAUTH_BUILTIN_AUTIA1716,
+  AARCH64_PAUTH_BUILTIN_AUTIB1716,
+  AARCH64_PAUTH_BUILTIN_XPACLRI,
+  AARCH64_PAUTH_BUILTIN_PACI1716,
   AARCH64_BUILTIN_MAX
 };
 
@@ -900,6 +905,37 @@ aarch64_init_fp16_types (void)
   aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
 }
 
+/* Pointer authentication builtins that will become NOP on legacy platform.
+   Currently, these builtins are for internal use only (libgcc EH unwinder).  */
+
+void
+aarch64_init_pauth_hint_builtins (void)
+{
+  /* Pointer Authentication builtins.  */
+  tree ftype_pointer_auth
+= build_function_type_list (ptr_type_node, ptr_type_node,
+unsigned_intDI_type_node, NULL_TREE);
+  tree ftype_pointer_strip
+= build_function_type_list (ptr_type_node, ptr_type_node, NULL_TREE);
+
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIA1716]
+= add_builtin_function ("__builtin_aarch64_autia1716", ftype_pointer_auth,
+			AARCH64_PAUTH_BUILTIN_AUTIA1716, BUILT_IN_MD, NULL,
+			NULL_TREE);
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIB1716]
+= add_builtin_function ("__builtin_aarch64_autib1716", ftype_pointer_auth,
+			AARCH64_PAUTH_BUILTIN_AUTIB1716, BUILT_IN_MD, NULL,
+			NULL_TREE);
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_XPACLRI]
+= add_builtin_function ("__builtin_aarch64_xpaclri", ftype_pointer_strip,
+			AARCH64_PAUTH_BUILTIN_XPACLRI, BUILT_IN_MD, NULL,
+			NULL_TREE);
+  aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_PACI1716]
+= add_builtin_function ("__builtin_aarch64_paci1716", ftype_pointer_auth,
+			AARCH64_PAUTH_BUILTIN_PACI1716, BUILT_IN_MD, NULL,
+			NULL_TREE);
+}
+
 void
 aarch64_init_builtins (void)
 {
@@ -928,6 +964,10 @@ aarch64_init_builtins (void)
 
   aarch64_init_crc32_builtins ();
   aarch64_init_builtin_rsqrt ();
+
+/* Initialize pointer authentication builtins which are backed by instructions
+   in NOP encoding space.  */
+  aarch64_init_pauth_hint_builtins ();
 }
 
 tree
@@ -1270,6 +1310,76 @@ aarch64_expand_builtin (tree exp,
 	}
   emit_insn (pat);
   return target;
+case AARCH64_PAUTH_BUILTIN_AUTIA1716:
+case AARCH64_PAUTH_BUILTIN_AUTIB1716:
+case AARCH64_PAUTH_BUILTIN_PACI1716:
+case AARCH64_PAUTH_BUILTIN_XPACLRI:
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  op0 = force_reg (Pmode, expand_normal (arg0));
+
+  if (!target)
+	target = gen_reg_rtx (Pmode);
+  else
+	target = force_reg (Pmode, target);
+
+  emit_move_insn (target, op0);
+
+  if (fcode == AARCH64_PAUTH_BUILTIN_XPACLRI)
+	{
+	  rtx lr_reg = gen_rtx_REG (Pmode, R30_REGNUM);
+	  icode = CODE_FOR_strip_lr_sign;
+	  emit_move_insn (lr_reg, op0);
+	  emit_insn (GEN_FCN (icode) (const0_rtx));
+	  emit_move_insn (target, lr_reg);
+	}
+  else
+	{
+	  tree arg1 = CALL_EXPR_ARG (exp, 1);
+	  rtx op1 = expand_normal (arg1);
+	  bool sign_op_p = (fcode == AARCH64_PAUTH_BUILTIN_PACI1716);
+
+	  bool x1716_op_p = (fcode == AARCH64_PAUTH_BUILTIN_AUTIA1716
+			 || fcode == AARCH64_PAUTH_BUILTIN_AUTIB1716
+			 || fcode == AARCH64_PAUTH_BUILTIN_PACI1716);
+
+	  bool a_key_p = (fcode == AARCH64_PAUTH_BUILTIN_AUTIA1716
+			  || (aarch64_pauth_key == AARCH64_PAUTH_IKEY_A
+			  && fcode == AARCH64_PAUTH_BUILTIN_PACI1716));
+	  HOST_WIDE_INT key_index =
+	a_key_p ? AARCH64_PAUTH_IKEY_A : AARCH64_PAUTH_IKEY_B;
+
+	  if (sign_op_p)
+	{
+	  if (x1716_op_p)
+		icode = CODE_FOR_sign_reg1716;
+	  else
+		icode = CODE_FOR_sign_reg;
+	}
+	  else
+	{
+	  if (x1716_op_p)
+		icode = CODE_FOR_auth_reg1716;
+	  else
+		icode = CODE_FOR_auth_reg;;
+	}
+
+	  op1 = force_reg (Pmode, op1);
+
+	  if (x1716_op_p)
+	{
+	  rtx x16_reg = gen_rtx_REG (Pmode, R16_REGNUM);
+	  rtx x17_reg = gen_rtx_REG (Pm

[5/9][AArch64] Generate dwarf information for -msign-return-address

2016-11-11 Thread Jiong Wang

This patch generate DWARF description for pointer authentication.  DWARF value
expression is used to describe the authentication action.

Please see the cover letter and AArch64 DWARF specification for the semantics
of AArch64 DWARF operations.

When authentication key index is A key, we use compact DWARF description which
can largely save DWARF frame size, otherwise we fallback to general operator.



Example
===

int
cal (int a, int b, int c)
{
  return a + dec (b) + c;
}

Compact DWARF description
(-march=armv8.3-a -msign-return-address)
===

  DW_CFA_advance_loc: 4 to 0004
  DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp)
  DW_CFA_advance_loc: 4 to 0008
  DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp_deref: -24)

General DWARF description
===
(-march=armv8.3-a -msign-return-address -mpauth-key=b_key)

  DW_CFA_advance_loc: 4 to 0004
  DW_CFA_val_expression: r30 (x30) (DW_OP_breg30 (x30): 0; DW_OP_AARCH64_pauth: 
18)
  DW_CFA_advance_loc: 4 to 0008
  DW_CFA_val_expression: r30 (x30) (DW_OP_dup; DW_OP_const1s: -24; DW_OP_plus; 
DW_OP_deref; DW_OP_AARCH64_pauth: 18)

From Linux kernel testing, -msign-return-address will introduce +24%
.debug_frame size increase when signing all functions and using compact
description, and about +45% .debug_frame size increase if using general
description.


gcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64.h (aarch64_pauth_action_type): New enum.
* config/aarch64/aarch64.c (aarch64_attach_ra_auth_dwarf_note): New 
function.
(aarch64_attach_ra_auth_dwarf_general): New function.
(aarch64_attach_ra_auth_dwarf_shortcut): New function.
(aarch64_save_callee_saves): Generate dwarf information if LR is signed.
(aarch64_expand_prologue): Likewise.
(aarch64_expand_epilogue): Likewise.

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 4bfadb512915d5dc606f7fc06f027868d6be7613..907e8bdf5b4961b3107dcd5a481de28335e4be89 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -970,4 +970,16 @@ extern tree aarch64_fp16_ptr_type_node;
 	 || (aarch64_ra_sign_scope == AARCH64_FUNCTION_NON_LEAF \
 		 && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0))
 
+/* AArch64 pointer authentication action types.  See AArch64 DWARF ABI for
+   details.  */
+enum aarch64_pauth_action_type
+{
+  /* Drop the authentication signature for instruction pointer.  */
+  AARCH64_PAUTH_DROP_I,
+  /* Likewise for data pointer.  */
+  AARCH64_PAUTH_DROP_D,
+  /* Do authentication.  */
+  AARCH64_PAUTH_AUTH
+};
+
 #endif /* GCC_AARCH64_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b3d9a2a3f51ee240d00beb4cc65f99b089a3215e..cae177dca511fdb909ef82c972d3bbdebab215e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2717,6 +2717,104 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
+/* Generate return address signing DWARF annotation using general DWARF
+   operator.  DWARF frame size will be bigger than using shortcut DWARF
+   operator.  See aarch64_attach_ra_auth_dwarf for parameter meanings.  */
+
+static rtx
+aarch64_attach_ra_auth_dwarf_general (rtx notes, HOST_WIDE_INT offset)
+{
+  /* The authentication descriptor.  */
+  HOST_WIDE_INT desc_const = (AARCH64_PAUTH_AUTH | (aarch64_pauth_key << 4));
+
+  /* DW_OP_AARCH64_pauth takes one uleb128 operand which is the authentication
+ descriptor.  The low 4 bits of the descriptor is the authentication action
+ code, all other bits are reserved and initialized into zero except when the
+ action code is AARCH64_PAUTH_AUTH then bits[7:4] is the authentication key
+ index.  */
+  rtx auth_op
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (2, GEN_INT (desc_const), const0_rtx),
+		  DW_OP_AARCH64_pauth);
+
+  rtx par;
+  if (offset == 0)
+{
+  /* Step 1: Push LR onto stack.
+		 NOTE: the bottom of DWARF expression stack is always CFA.
+	 Step 2: Issue AArch64 authentication operation.  */
+par = gen_rtx_PARALLEL (DImode,
+			gen_rtvec (2, gen_rtx_REG (Pmode, LR_REGNUM),
+   auth_op));
+}
+  else
+{
+  rtx dup_cfa
+	= gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx),
+			  DW_OP_dup);
+
+  rtx deref_op
+	= gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx),
+			  DW_OP_deref);
+
+  rtx raw_plus
+	= gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx),
+			  DW_OP_plus);
+  /* Step 1: Push the authentication key on to dwarf expression stack.
+	 Step 2: Push the stack address of where return address saved, followed
+	 by a memory de-reference operation.
+	 Step 3: Push the authentication descriptor.
+	 Step 4: Issue AArch64 authentication operation.  */
+  par = gen_rtx_PARALLEL (DImode,
+			  gen_rtvec (5, dup_cfa, GEN_

[4/9][AArch64] Return address protection on AArch64

2016-11-11 Thread Jiong Wang

As described in the cover letter, this patch implements return address signing
for AArch64, it's controlled by the new option:

  -msign-return-address=[none | non-leaf | all]

"none" means don't do return address signing at all on any function.  "non-leaf"
means only sign non-leaf function.  "all" means sign all functions.  Return
address signing is currently disabled on ILP32.  I haven't tested it.

The instructions added in the architecture are of 2 kinds.

* In the NOP instruction space, which allows binaries to run without any traps
on older versions of the architecture. This doesn't give any additional
protection on older hardware but allows for the same binary to be used on
earlier versions of the architecture and newer versions of the architecture.

* New instructions that are only valid for v8.3 and will trap if used on earlier
versions of the architecture.

At default, once return address signing is enabled, it will only generates NOP
instruction.

While if -march=armv8.3-a specified, GCC will try to use the most efficient
pointer authentication instruction as it can.

The architecture has 2 user invisible system keys for signing and creating
signed addresses as part of these instructions. For some use case, the user
might want to use difference key for different functions.  The new option
"-msign-return-address-key=key_name" let GCC select the key used for return
address signing.  Permissible values are "a_key" for A key and "b_key" for B
key, and this option are supported by function target attribute and LTO will
hopefully just work.



gcc/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum.
(aarch64_function_type): New enum.
* config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New
declaration.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return
address before it's pushed onto stack.
(aarch64_expand_epilogue): Authenticate return address fetched from
stack.
(aarch64_output_sign_auth_reg): New function.
(aarch64_override_options): Sanity check for ILP32 and ISA level.
(aarch64_attributes): New function attributes for "sign-return-address",
"pauth-key".
* config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716,
UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN,
UNSPEC_STRIP_X30_SIGN): New unspecs.
("*do_return"): Generate combined instructions according to key index.
("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716",
"strip_reg_sign", "strip_lr_sign"): New.
* config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New.
* config/aarch64/predicates.md (aarch64_const0_const1): New predicate.
* doc/extend.texi (AArch64 Function Attributes): Documents
"sign-return-address=", "pauth-key".
* doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=",
"-pauth-key".

gcc/testsuite/
2016-11-09  Jiong Wang<jiong.w...@arm.com>

* gcc.target/aarch64/return_address_sign_1.c: New testcase.
* gcc.target/aarch64/return_address_sign_scope_1.c: New testcase.



diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index c550a74..41c14b3 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -73,4 +73,30 @@ enum aarch64_code_model {
   AARCH64_CMODEL_LARGE
 };
 
+/* AArch64 pointer authentication key indexes.  "key_array" in
+   aarch64_output_sign_auth_reg depends on the order of this enum.  */
+enum aarch64_pauth_key_index
+{
+  /* A key for instruction pointer.  */
+  AARCH64_PAUTH_IKEY_A = 0,
+  /* B key for instruction pointer.  */
+  AARCH64_PAUTH_IKEY_B,
+  /* A key for data pointer.  */
+  AARCH64_PAUTH_DKEY_A,
+  /* B key for data pointer.  */
+  AARCH64_PAUTH_DKEY_B,
+  /* A key for general pointer.  */
+  AARCH64_PAUTH_GKEY_A
+};
+
+/* Function types -msign-return-address should sign.  */
+enum aarch64_function_type {
+  /* Don't sign any function.  */
+  AARCH64_FUNCTION_NONE,
+  /* Non-leaf functions.  */
+  AARCH64_FUNCTION_NON_LEAF,
+  /* All functions.  */
+  AARCH64_FUNCTION_ALL
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 3cdd69b..fa6d16b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -329,6 +329,7 @@ rtx aarch64_reverse_mask (enum machine_mode);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode);
 char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned);
+const char *aarch64_output_

[3/9][AArch64] Add commandline support for -march=armv8.3-a

2016-11-11 Thread Jiong Wang

This patch add command line support for ARMv8.3-A through new architecture:

  -march=armv8.3-a

ARMv8.3-A implies all default features of ARMv8.2-A and meanwhile it includes
the new pointer authentication extension.


gcc/
2016-11-08  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-arches.def: New entry for "armv8.3-a".
* config/aarch64/aarch64.h (AARCH64_FL_PAUTH, AARCH64_FL_V8_3,
AARCH64_FL_FOR_ARCH8_3, AARCH64_ISA_PAUTH, AARCH64_ISA_V8_3,
TARGET_PAUTH, TARGET_ARMV8_3): New.
* doc/invoke.texi (AArch64 Options): Document "armv8.3-a".

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 7dcf140411f6eb95504d9b92df9dadce50529a28..0a33f799e66b4ec6e016845eb333f24aaf63383e 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -33,4 +33,5 @@
 AARCH64_ARCH("armv8-a",	  generic,	 8A,	8,  AARCH64_FL_FOR_ARCH8)
 AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
+AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 19caf9f2979e30671720823829464300b5349273..70efbe9b5f97bd38d61ad66e38608f7ac5bdfb38 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -138,6 +138,10 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.2-A architecture extensions.  */
 #define AARCH64_FL_V8_2	  (1 << 8)  /* Has ARMv8.2-A features.  */
 #define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2-A FP16 extensions.  */
+/* ARMv8.3-A architecture extensions.  */
+#define AARCH64_FL_PAUTH  (1 << 10)  /* Has Pointer Authentication
+	Extensions.  */
+#define AARCH64_FL_V8_3	  (1 << 11)  /* Has ARMv8.3-A features.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -151,6 +155,8 @@ extern unsigned aarch64_architecture_version;
   (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
 #define AARCH64_FL_FOR_ARCH8_2			\
   (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2)
+#define AARCH64_FL_FOR_ARCH8_3			\
+  (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3 | AARCH64_FL_PAUTH)
 
 /* Macros to test ISA flags.  */
 
@@ -162,6 +168,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_RDMA	   (aarch64_isa_flags & AARCH64_FL_V8_1)
 #define AARCH64_ISA_V8_2	   (aarch64_isa_flags & AARCH64_FL_V8_2)
 #define AARCH64_ISA_F16		   (aarch64_isa_flags & AARCH64_FL_F16)
+#define AARCH64_ISA_PAUTH	   (aarch64_isa_flags & AARCH64_FL_PAUTH)
+#define AARCH64_ISA_V8_3	   (aarch64_isa_flags & AARCH64_FL_V8_3)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -176,6 +184,12 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16)
 #define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16)
 
+/* Pointer Authentication extension.  */
+#define TARGET_PAUTH (AARCH64_ISA_PAUTH)
+
+/* ARMv8.3-A extension.  */
+#define TARGET_ARMV8_3 (AARCH64_ISA_V8_3)
+
 /* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs.  */
 #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 87da1f1c12b718fa63c9b89fdd8f85fbc6b54cb0..18ab6d9f20eca7fa29317e10678f1e46f64039bd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13257,7 +13257,10 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @var{native}.
+
+The value @samp{armv8.3-a} implies @samp{armv8.2-a} and enables compiler
+support for the ARMv8.3-A architecture extensions.
 
 The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
 support for the ARMv8.2-A architecture extensions.



[2/9] Encoding support for AArch64 DWARF operations

2016-11-11 Thread Jiong Wang

The encoding for new added AARCH64 DWARF operations.

I am thinking DWARF specification actually allows vendor private operations
overlap with each other as one can't co-exist with the other.  So in theory we
should introduce target hook to handle target private operations.

But in GCC/binutils/LLVM scope, I only see one overlap between
DW_OP_GNU_push_tls_address and and DW_OP_HP_unknown, and DW_OP_HP_unknown seems
not used.

So I added the support in GCC generic code directly instead of introducing
target hook.

Is this OK to install?


gcc/
2016-11-11  Jiong Wang<jiong.w...@arm.com>

* dwarf2out.c (size_of_loc_descr): Increase set for
DW_OP_AARCH64_pauth and DW_OP_AARCH64_paciasp_deref.
(output_loc_operands): Generate encoding for DW_OP_AARCH64_pauth
and DW_OP_AARCH64_paciasp_deref.
(output_loc_operands_raw): Likewise.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4a5c602f535fa49a45ae96f356f63c955dc527c6..fd159abe3c402cc8dedb0422e7b2680aabd28f93 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1698,6 +1698,12 @@ size_of_loc_descr (dw_loc_descr_ref loc)
 case DW_OP_GNU_parameter_ref:
   size += 4;
   break;
+case DW_OP_AARCH64_pauth:
+  size += size_of_uleb128 (loc->dw_loc_oprnd1.v.val_unsigned);
+  break;
+case DW_OP_AARCH64_paciasp_deref:
+  size += size_of_sleb128 (loc->dw_loc_oprnd1.v.val_int);
+  break;
 default:
   break;
 }
@@ -2177,6 +2183,13 @@ output_loc_operands (dw_loc_descr_ref loc, int for_eh_or_skip)
   }
   break;
 
+case DW_OP_AARCH64_pauth:
+  dw2_asm_output_data_uleb128 (val1->v.val_unsigned, NULL);
+  break;
+case DW_OP_AARCH64_paciasp_deref:
+  dw2_asm_output_data_sleb128 (val1->v.val_int, NULL);
+  break;
+
 default:
   /* Other codes have no operands.  */
   break;
@@ -2365,6 +2378,15 @@ output_loc_operands_raw (dw_loc_descr_ref loc)
   gcc_unreachable ();
   break;
 
+case DW_OP_AARCH64_pauth:
+  fputc (',', asm_out_file);
+  dw2_asm_output_data_uleb128_raw (val1->v.val_unsigned);
+  break;
+case DW_OP_AARCH64_paciasp_deref:
+  fputc (',', asm_out_file);
+  dw2_asm_output_data_sleb128_raw (val1->v.val_int);
+  break;
+
 default:
   /* Other codes have no operands.  */
   break;



[1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-11 Thread Jiong Wang

This patch introduces three AARCH64 private DWARF operations in vendor extension
space.

DW_OP_AARCH64_pauth 0xea
===
  Takes one unsigned LEB 128 Pointer Authentication Description. Bits [3:0] of
  the description contain the Authentication Action Code. All unused bits are
  initialized to 0. The operation then proceeds according to the value of the
  action code as described in the Action Code Table.

DW_OP_AARCH64_paciasp 0xeb
===
  Authenticates the contents in X30/LR register as per A key for instruction
  pointer using current CFA as salt. The result is pushed onto the stack.

DW_OP_AARCH64_paciasp_deref 0xec
===
  Takes one signed LEB128 offset and retrieves 8-byte contents from the address
  calculated by CFA plus this offset, the contents then authenticated as per A
  key for instruction pointer using current CFA as salt. The result is pushed
  onto the stack.

Action Code Table
==
Action Code  | Note
--
0|  Pops a single 8-byte operand from the stack representing a
 |  signed instruction pointer, "drops" the authentication
 |  signature and pushes the value  onto stack.
--
1|  Pops a single 8-byte operand from the stack representing a
 |  signed data pointer, "drops" the authentication signature
 |  and pushes the value on to stack.
--
2|  Bits [7:4] of the Pointer Authentication Description contain
 |  an Authentication Key Index. The operation then pops the top
 |  two stack entries. The first is an 8-byte value to be
 |  authenticated. The second is an 8-byte salt. The first value
 |  is then authenticated as per the Authentication Key Index
 |  using the salt. The result is pushed onto stack.

Authentication Key Index
=
0|  A key for instruction pointer.
-
1|  B key for instruction pointer.
-
2|  A key for data pointer.
-
3|  B key for data pointer.
-
4|  A key for general pointer.

DW_OP_AARCH64_pauth is designed to offer general description for all scenarios.

DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref are two shortcut
operations for return address signing.  They offer more compact debug frame
encoding.

For DWARF operation vendor extension space between DW_OP_lo_user and
DW_OP_hi_user, I think vendor is free to reserve any number and numbers for one
vender can overlap with the other, as operations for different vendors are
not supposed to co-exist.

One exception is that GNU toolchain have reserved some numbers inside this space
(DW_OP_GNU*), so vendor's numbers need to avoid overlapping with them.

For these three numbers,  they are not used in LLVM's implementation.

NOTE: the assigned values are provisional, we may need to change them if they 
are
found to be in conflict with on other toolchains.

Please review, Thanks.


include/
2016-11-09  Richard Earnshaw<richard.earns...@arm.com>
Jiong Wang<jiong.w...@arm.com>

* dwarf2.def (DW_OP_AARCH64_pauth): Reserve the number 0xea.
(DW_OP_AARCH64_paciasp): Reserve the number 0xeb.
(Dw_OP_AARCH64_paciasp_deref): Reserve the number 0xec.

diff --git a/include/dwarf2.def b/include/dwarf2.def
index 5241fe8615e0e3b288fee80c08a67723686ef411..8eaa90c3b4748ecfc025a6c2dd6afcd5fd80be28 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -631,6 +631,16 @@ DW_OP (DW_OP_HP_unmod_range, 0xe5)
 DW_OP (DW_OP_HP_tls, 0xe6)
 /* PGI (STMicroelectronics) extensions.  */
 DW_OP (DW_OP_PGI_omp_thread_num, 0xf8)
+/* ARM extension for pointer authentication
+   DW_OP_AARCH64_pauth: takes one uleb128 operand which is authentication
+   descriptor.  Perform actions indicated by the descriptor.
+   DW_OP_AARCH64_paciasp: no operand.  Authenticate value in X30/LR using A key
+   and CFA as salt.
+   DW_OP_AARCH64_paciasp_deref: takes one sleb128 operand as offset.
+   Authenticate value in [CFA + offset] using A key and salt is CFA.  */
+DW_OP (DW_OP_AARCH64_pauth, 0xea)
+DW_OP (DW_OP_AARCH64_paciasp, 0xeb)
+DW_OP (DW_OP_AARCH64_paciasp_deref, 0xec)
 DW_END_OP
 
 DW_FIRST_ATE (DW_ATE_void, 0x0)



[0/9] Support ARMv8.3-A Pointer Authentication Extension

2016-11-11 Thread Jiong Wang

As the introduction at

  
https://community.arm.com/groups/processors/blog/2016/10/27/armv8-a-architecture-2016-additions

ARMv8.3-A includes a new hardware feature called "Pointer Authentication".
This new extension support some new instructions which can sign and
authenticate pointer value.

One utilization of this feature is we can implement Return-Oriented-Programming
protections.  For example, we can sign the return address register at function
start then authenticate it before return.  If the content is modified
unexpectedly, then exception will happen thus prevent redirecting program's
execution flow.

This type of prevention however requires the original content of return address
be signed that unwinders (C++ EH unwinder, GDB unwinder etc...) can no longer
do backtrace correctly without understanding how to restore the original value
of return address.

Therefore we need to describe any return address or frame related register
mangling through DWARF information.

This patchset includes implementation of such return address siging protection
and AArch64 DWARF operation extension.

Below are comparision of codesize overhead between standard gcc
-fstack-protector-strong and -msign-return-address on AArch64.

   linux kernel   opensslProtection Scope
   (libcrypto + libssl)
  ---
  ssp-strong (gcc)   + 2.93% + 2.98% Overflow protect on risky functions
  ---
  sign LR+ 1.82% + 2.18% LR protect on All


Please review this patchset.
Thanks.

Jiong Wang (9):
  [RFC] Reserve three DW_OP number in vendor extension space
  Encoding support for AArch64 DWARF operations
  Add commandline support for -march=armv8.3-a
  Return address protection support on AArch64
  Generate dwarf information for -msign-return-address
  Add builtins support for pac/aut/xpac
  libgcc, let AArch64 use customized unwinder file
  libgcc, runtime support for AArch64 DWARF operations
  [RFC] Accelerate -fstack-protector through pointer authentication

 gcc/config/aarch64/aarch64-arches.def  |1 +
 gcc/config/aarch64/aarch64-builtins.c  |  110 ++
 gcc/config/aarch64/aarch64-opts.h  |   32 +
 gcc/config/aarch64/aarch64-protos.h|1 +
 gcc/config/aarch64/aarch64.c   |  296 +++-
 gcc/config/aarch64/aarch64.h   |   54 +
 gcc/config/aarch64/aarch64.md  |  128 +-
 gcc/config/aarch64/aarch64.opt |   45 +
 gcc/config/aarch64/predicates.md   |4 +
 gcc/defaults.h |8 +
 gcc/doc/extend.texi|   12 +
 gcc/doc/invoke.texi|   24 +-
 gcc/doc/tm.texi|8 +
 gcc/doc/tm.texi.in |8 +
 gcc.target/aarch64/return_address_sign_1.c |   57 +
 gcc.target/aarch64/return_address_sign_scope_1.c   |   57 +
 include/dwarf2.def |   10 +
 libgcc/config.host |2 +
 libgcc/config/aarch64/t-eh-aarch64 |3 +
 libgcc/config/aarch64/unwind-aarch64.c | 1820 



Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-07 Thread Jiong Wang



On 07/11/16 17:04, Bernd Schmidt wrote:

On 11/03/2016 03:00 PM, Eric Botcazou wrote:

FWIW here's a more complete version of my patch which I'm currently
testing. Let me know if you think it's at least a good enough
intermediate step to be installed.


It is, thanks.


Testing showed the same issue as Jiong found, so I've committed it 
with that extra tweak.


Thanks very much!  I have closed PR middle-end/78016

Regards,
Jiong


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Jiong Wang

On 03/11/16 13:01, Bernd Schmidt wrote:


Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 241233)
+++ gcc/emit-rtl.c  (working copy)
@@ -6169,17 +6169,18 @@ emit_copy_of_insn_after (rtx_insn *insn,
   which may be duplicated by the basic block reordering code.  */
RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
  
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */

+  rtx *ptail = _NOTES (new_rtx);
+  gcc_assert (*ptail == NULL_RTX);
+


Looks like new_rtx may contain it's own REG_NOTES when reached here thus
triggered ICE, I guess mark_jump_label may generate REG_LABEL_OPERAND as the
comments says.

After replace the gcc_assert with the following loop, this patch passed
bootstrap on both AArch64 and X86-64, and regression OK on gcc and g++.

+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);

Regards,
Jiong



Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Jiong Wang

On 03/11/16 12:06, Eric Botcazou wrote:

What's your decision on this?

I think that we ought to standardize on a single order for note copying in the
RTL middle-end and the best way to enforce it is to have a single primitive in
rtlanal.c, with an optional filtering.  Bernd's patch is a step in the right
direction, but doesn't enforce the single order.  Maybe something based on a
macro calling duplicate_reg_note, but not clear whether it's really better.


Thanks for the feedback,  I'll try to work through this.

Regards,
Jiong


Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-02 Thread Jiong Wang

On 02/11/16 13:42, Jakub Jelinek wrote:

On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote:

-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  
*/


Too long line.


Hmm, it shows 80 columns under my editor.  I guess '+' is counted in?




+/* RTL sequences inside PARALLEL are raw expression representation.
+
+   mem_loc_descriptor can be used to build generic DWARF expressions for
+   DW_CFA_expression and DW_CFA_val_expression where the expression may can
+   not be represented using normal RTL sequences.  In this case, group all
+   expression operations (DW_OP_*) inside a PARALLEL.  For those DW_OP 
which
+   doesn't have RTL mapping, wrap it using UNSPEC.  The logic for parsing
+   PARALLEL sequences is:
+
+   foreach elem inside PARALLEL
+ if (elem is UNSPEC)
+   dw_op =  XINT (elem, 1) (DWARF operation is kept as UNSPEC number)
+   oprnd1 = XVECEXP (elem, 0, 0)
+   oprnd2 = XVECEXP (elem, 0, 1)
+ else
+   call mem_loc_descriptor  */


Not sure if it is a good idea to document in weirdly formatted
pseudo-language what the code actually does a few lines below.  IMHO either
express it in words, or don't express it at all.


OK, fixed. I replaced these comments as some brief words.




+   exp_result =
+ new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1,
+oprnd2);


Wrong formatting, = should be on the next line.


+ }
+   else
+ exp_result =
+   mem_loc_descriptor (elem, mode, mem_mode,
+   VAR_INIT_STATUS_INITIALIZED);


Likewise.


Both fixed. Patch updated, please review.

Thanks.

gcc/
2016-11-02  Jiong Wang  <jiong.w...@arm.com>

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function.
(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.

commit 36de0173c17efcc30c56ef10304377e71313e8bc
Author: Jiong Wang <jiong.w...@arm.com>
Date:   Wed Oct 19 15:42:04 2016 +0100

dwarf val expression

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 6491d5a..b8c88fb 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set)
   reg_save (sregno, dregno, 0);
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  */
 
 static void
 dwarf2out_frame_debug_cfa_expression (rtx set)
@@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   update_row_reg_save (cur_row, regno, cfi);
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_val_expression (rtx set)
+{
+  rtx dest = SET_DEST (set);
+  gcc_assert (REG_P (dest));
+
+  rtx span = targetm.dwarf_register_span (dest);
+  gcc_assert (!span);
+
+  rtx src = SET_SRC (set);
+  dw_cfi_ref cfi = new_cfi ();
+  cfi->dw_cfi_opc = DW_CFA_val_expression;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest);
+  cfi->dw_cfi_oprnd2.dw_cfi_loc
+= mem_loc_descriptor (src, GET_MODE (src),
+			  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
+  add_cfi (cfi);
+  update_row_reg_save (cur_row, dwf_regno (dest), cfi);
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
@@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	break;
 
   case REG_CFA_EXPRESSION:
+  case REG_CFA_VAL_EXPRESSION:
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n);
+
+	if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION)
+	  dwarf2out_frame_debug_cfa_expression (n);
+	else
+	  dwarf2out_frame_debug_cfa_val_expression (n);
+
 	handled_one = true;
 	break;
 
@@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh);
@@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_C

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-02 Thread Jiong Wang

On 01/11/16 16:48, Jason Merrill wrote:
> It seems to me that a CFA_*expression note would never use target
> UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a
> note, so we don't need to worry about conflicts.

Indeed.

DWARF UNSPEC is deeper inside DW_CFA_*expression note.  My worry of conflict
makes no sense.

I updated the patch to put DWARF operation in to UNSPEC number field.

x86-64 bootstrap OK,  no regression on gcc/g++.

Please review.

Thanks.

gcc/
2016-11-02  Jiong Wang  <jiong.w...@arm.com>

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New 
function.

(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set)
   reg_save (sregno, dregno, 0);
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  */
 
 static void
 dwarf2out_frame_debug_cfa_expression (rtx set)
@@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   update_row_reg_save (cur_row, regno, cfi);
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_val_expression (rtx set)
+{
+  rtx dest = SET_DEST (set);
+  gcc_assert (REG_P (dest));
+
+  rtx span = targetm.dwarf_register_span (dest);
+  gcc_assert (!span);
+
+  rtx src = SET_SRC (set);
+  dw_cfi_ref cfi = new_cfi ();
+  cfi->dw_cfi_opc = DW_CFA_val_expression;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest);
+  cfi->dw_cfi_oprnd2.dw_cfi_loc
+= mem_loc_descriptor (src, GET_MODE (src),
+			  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
+  add_cfi (cfi);
+  update_row_reg_save (cur_row, dwf_regno (dest), cfi);
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
@@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	break;
 
   case REG_CFA_EXPRESSION:
+  case REG_CFA_VAL_EXPRESSION:
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n);
+
+	if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION)
+	  dwarf2out_frame_debug_cfa_expression (n);
+	else
+	  dwarf2out_frame_debug_cfa_val_expression (n);
+
 	handled_one = true;
 	break;
 
@@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh);
@@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
@@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh)
 
 	case DW_CFA_def_cfa_expression:
 	case DW_CFA_expression:
+	case DW_CFA_val_expression:
 	  output_cfa_loc (cfi, for_eh);
 	  break;
 
@@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi)
   break;
 
 case DW_CFA_def_cfa_expression:
-  if (f != asm_out_file)
-	{
-	  fprintf (f, "\t.cfi_def_cfa_expression ...\n");
-	  break;
-	}
-  /* FALLTHRU */
 case DW_CFA_expression:
+case DW_CFA_val_expression:
   if (f != asm_out_file)
 	{
-	  fprintf (f, "\t.cfi_cfa_expression ...\n");
+	  fprintf (f, "\t.cfi_%scfa_%sexpression ...\n",
+		   cfi->dw_cfi_opc == DW_CFA_def_cfa_expression ? "def_" : "",
+		   cfi->dw_cfi_opc == DW_CFA_val_expression ? "val_" : "");
 	  break;
 	}
   fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4a3df339df2c6a6816ac8b8dbdb2466a7492c592..7dac70d7392f2c457ffd3f677e07decb1ba488a1 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -518,6 +518,7 @@ dw_cfi_oprnd1_desc (enum dwarf_call_frame_info cfi)
 case DW_CFA_def_cfa_register:
 case DW_CFA_register:
 case DW

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-01 Thread Jiong Wang



On 01/11/16 16:48, Jason Merrill wrote:

On Tue, Nov 1, 2016 at 11:59 AM, Jiong Wang <jiong.w...@foss.arm.com> wrote:

On 01/11/16 15:24, Jason Merrill wrote:

On Tue, Nov 1, 2016 at 11:12 AM, Jiong Wang <jiong.w...@foss.arm.com> wrote:

On 31/10/16 19:50, Jason Merrill wrote:

On 10/21/2016 04:30 AM, Jiong Wang wrote:

All DW_OP_* of the expression are grouped together inside the PARALLEL,
and those operations which don't have RTL mapping are wrapped by
UNSPEC.  The parsing algorithm is simply something like:

foreach elem inside PARALLEL
  if (UNSPEC)
{
  dw_op_code = INTVAL (XVECEXP (elem, 0, 0));
  oprnd1 = INTVAL (XVECEXP (elem, 0, 1));
  oprnd2 = INTVAL (XVECEXP (elem, 0, 2));
}
  else
call standard RTL parser.

Any comments on the approach?

If you're going to use UNSPEC, why not put the DWARF operator in the
second operand?

Thanks for the review, but I still don't understand your meaning.

Do you mean I should simply put the DWARF operator at XVECEXP
(UNSPEC_RTX, 0, 2) instead of at XVECEXP (UNSPEC_RTX, 0, 0)

No, at XINT (UNSPEC_RTX, 1).  The documentation of UNSPEC says,

/* A machine-specific operation.
 1st operand is a vector of operands being used by the operation so
that any needed reloads can be done.
 2nd operand is a unique value saying which of a number of
machine-specific operations is to be performed.

Aha, understood now, thanks for the clarification.

You mean we simply reuse the UNSPEC number field, so the RTX will be

   (UNSPEC
  [((reg) (reg)]
DW_OP_XXX)

Yeah, I do have tried to do that, but later give up, one reason I remember is 
suppose we
want to push two value on the stack, the second value is an address, which we 
want a
follow up DW_OP_deref to operate on that. then the value expression will be

(set (reg A)
 (parallel
   [(reg A)

 (UNSPEC
   [DW_OP_deref, const0_rtx, const0_rtx]
 UNSPEC_PRIVATE_DW);

(UNSPEC
  [DW_OP_XXX (const0_rtx) (const0_rtx)]
 UNSPEC_PRIVATE_DW))

And there might be some other expressions we need some complex RAW encoding,

Why can't you do this putting the OP in the number field of both UNSPECs?


I was demoing the RTX based on my current approach, and simplfy want to 
say we only need to
define one unspec number (UNSPEC_PRIVATE_DW), while if we putting the OP 
in the number field
of both UNSPECs, we need two unspec number, and we might need more for 
other similar expressions.


If we don't need to worry about the conflicts, then your suggestion is 
definitely better.  I will do more tests

on this.

Besides this issue, do you think the PARALLEL + UNSPEC based approach to 
represent DWARF RAW expression is acceptable?


Thanks.

Regards,
Jiong




so it seems to me if we want to offer user the most general way to do this, 
then it's
better to encode the DWARF operation inside UNSPEC as reuse the UNSPEC number 
then you need to make
sure there is no overlap with other backend UNSPEC enumeration number.

It seems to me that a CFA_*expression note would never use target
UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a
note, so we don't need to worry about conflicts.

Jason




Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-01 Thread Jiong Wang

On 01/11/16 15:24, Jason Merrill wrote:

On Tue, Nov 1, 2016 at 11:12 AM, Jiong Wang <jiong.w...@foss.arm.com> wrote:

On 31/10/16 19:50, Jason Merrill wrote:

On 10/21/2016 04:30 AM, Jiong Wang wrote:

All DW_OP_* of the expression are grouped together inside the PARALLEL,
and those operations which don't have RTL mapping are wrapped by
UNSPEC.  The parsing algorithm is simply something like:

   foreach elem inside PARALLEL
 if (UNSPEC)
   {
 dw_op_code = INTVAL (XVECEXP (elem, 0, 0));
 oprnd1 = INTVAL (XVECEXP (elem, 0, 1));
 oprnd2 = INTVAL (XVECEXP (elem, 0, 2));
   }
 else
   call standard RTL parser.

Any comments on the approach?


If you're going to use UNSPEC, why not put the DWARF operator in the
second operand?

   Thanks for the review, but I still don't understand your meaning.

   Do you mean I should simply put the DWARF operator at XVECEXP (UNSPEC_RTX,
0, 2) instead of at XVECEXP (UNSPEC_RTX, 0, 0)

No, at XINT (UNSPEC_RTX, 1).  The documentation of UNSPEC says,

/* A machine-specific operation.
1st operand is a vector of operands being used by the operation so
that
  any needed reloads can be done.
2nd operand is a unique value saying which of a number of
machine-specific
  operations is to be performed.


Aha, understood now, thanks for the clarification.

You mean we simply reuse the UNSPEC number field, so the RTX will be

  (UNSPEC
 [((reg) (reg)]
   DW_OP_XXX)

Yeah, I do have tried to do that, but later give up, one reason I 
remember is suppose we
want to push two value on the stack, the second value is an address, 
which we want a

follow up DW_OP_deref to operate on that. then the value expression will be

   (set (reg A)
(parallel
  [(reg A)

(UNSPEC
  [DW_OP_deref, const0_rtx, const0_rtx]
UNSPEC_PRIVATE_DW);

   (UNSPEC
 [DW_OP_XXX (const0_rtx) (const0_rtx)]
UNSPEC_PRIVATE_DW))

And there might be some other expressions we need some complex RAW 
encoding, so it seems to
me if we want to offer user the most general way to do this, then it's 
better to encode the DWARF
operation inside UNSPEC as reuse the UNSPEC number then you need to make 
sure there is no

overlap with other backend UNSPEC enumeration number.

Does this explanation make sense to you?

Thanks.


Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-01 Thread Jiong Wang

On 31/10/16 19:50, Jason Merrill wrote:

On 10/21/2016 04:30 AM, Jiong Wang wrote:

All DW_OP_* of the expression are grouped together inside the PARALLEL,
and those operations which don't have RTL mapping are wrapped by
UNSPEC.  The parsing algorithm is simply something like:

  foreach elem inside PARALLEL
if (UNSPEC)
  {
dw_op_code = INTVAL (XVECEXP (elem, 0, 0));
oprnd1 = INTVAL (XVECEXP (elem, 0, 1));
oprnd2 = INTVAL (XVECEXP (elem, 0, 2));
  }
else
  call standard RTL parser.

Any comments on the approach?


If you're going to use UNSPEC, why not put the DWARF operator in the 
second operand?


Hi Jason,

  Thanks for the review, but I still don't understand your meaning.

  Do you mean I should simply put the DWARF operator at XVECEXP 
(UNSPEC_RTX, 0, 2)
  instead of at XVECEXP (UNSPEC_RTX, 0, 0), and the new parsing 
algorithm will be

  the following ?

  foreach elem inside PARALLEL
if (UNSPEC)
  {
oprnd1 = INTVAL (XVECEXP (elem, 0, 0));
oprnd2 = INTVAL (XVECEXP (elem, 0, 1));
dw_op_code = INTVAL (XVECEXP (elem, 0, 2));
  }
else
  call standard RTL parser.

  I actually don't see the benefit of this change, could you please 
give more

  comments on this?

  For this patch, suppose the unwinding rule for register A is poping two
  values from dwarf evalutaion stack, do some complex processing based on
  the two values, then push back the result on to stack.

  We can generate the dwarf value expression description like:

   (set (reg A)
(parallel
  [(reg A) (reg B)
   (UNSPEC
 [(const_int DW_OP_XXX) (const0_rtx) (const0_rtx)]
UNSPEC_NUM)

   then readelf dump will be something like
   ===
   DW_CFA_val_expression: A (DW_OP_bregB: 0; DW_OP_bregC: 0; DW_OP_XXX)

We can't do such description based on current GCC dwarf code, right?




Re: [PATCH][AArch64] Add function comments to some prologue/epilogue helpers

2016-11-01 Thread Jiong Wang

On 31/10/16 12:10, Kyrill Tkachov wrote:

Ping.

Thanks,
Kyrill

On 24/10/16 12:30, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00839.html

Thanks,
Kyrill

On 12/10/16 11:23, Kyrill Tkachov wrote:

Hi all,

I'm looking at the aarch64 prologue and epilogue generation code and 
I noticed many of the helper
functions don't have function comments so it makes it harder than it 
has to to understand what's going on.
This patch adds function comments to some of them. I hope I 
understood the functions correctly.


Is this ok for trunk?

Thanks,
Kyrill

2016-10-12  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_register_saved_on_entry): Add
function comment.
(aarch64_next_callee_save): Likewise.
(aarch64_pushwb_single_reg): Likewise.
(aarch64_gen_storewb_pair): Likewise.
(aarch64_push_regs): Likewise.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_pop_regs): Likewise.
(aarch64_gen_store_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
(aarch64_save_callee_saves): Likewise.
(aarch64_restore_callee_saves): Likewise.


I "contributed" some of these functions without comments...
The new added comments looks good to me though I can't approve.

Thanks for fixing these.

Regards,
Jiong


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-31 Thread Jiong Wang

On 21/10/16 13:30, Bernd Schmidt wrote:



On 10/21/2016 02:04 PM, Jiong Wang wrote:

+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = _NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);


I was thinking along the lines of something like this (untested, 
emit-rtl.c part omitted). Eric can choose whether he likes either of 
these or wants something else.


Hi Eric,

  What's your decision on this?

  Thanks.

Regards,
Jiong




Bernd

Index: gcc/rtl.h
===
--- gcc/rtl.h(revision 241233)
+++ gcc/rtl.h(working copy)
@@ -3008,6 +3008,7 @@ extern rtx alloc_reg_note (enum reg_note
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx, enum reg_note, int);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
+extern rtx duplicate_reg_note (rtx_insn *, rtx);
 extern void remove_note (rtx, const_rtx);
 extern void remove_reg_equal_equiv_notes (rtx_insn *);
 extern void remove_reg_equal_equiv_notes_for_regno (unsigned int);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c(revision 241233)
+++ gcc/rtlanal.c(working copy)
@@ -2304,6 +2304,21 @@ add_shallow_copy_of_reg_note (rtx_insn *
 add_reg_note (insn, REG_NOTE_KIND (note), XEXP (note, 0));
 }

+/* Duplicate NOTE and return the copy.  */
+rtx
+duplicate_reg_note (rtx note)
+{
+  rtx n;
+  reg_note_kind kind = REG_NOTE_KIND (note);
+
+  if (GET_CODE (note) == INT_LIST)
+return gen_rtx_INT_LIST ((machine_mode) kind, XINT (note, 0), 
NULL_RTX);

+  else if (GET_CODE (note) == EXPR_LIST)
+return alloc_reg_note (kind, copy_insn_1 (XEXP (note, 0)), 
NULL_RTX);

+  else
+return alloc_reg_note (kind, XEXP (note, 0), NULL_RTX);
+}
+
 /* Remove register note NOTE from the REG_NOTES of INSN.  */

 void
Index: gcc/sel-sched-ir.c
===
--- gcc/sel-sched-ir.c(revision 241233)
+++ gcc/sel-sched-ir.c(working copy)
@@ -5762,6 +5762,11 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);

+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = _NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are 
created

  there too, but are supposed to be sticky, so we copy them. */
@@ -5770,11 +5775,8 @@ create_copy_of_insn_rtx (rtx insn_rtx)
 && REG_NOTE_KIND (link) != REG_EQUAL
 && REG_NOTE_KIND (link) != REG_EQUIV)
   {
-if (GET_CODE (link) == EXPR_LIST)
-  add_reg_note (res, REG_NOTE_KIND (link),
-copy_insn_1 (XEXP (link, 0)));
-else
-  add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+*ptail = duplicate_reg_note (link);
+ptail =  (*ptail, 1);
   }

   return res;




[Ping][gcc] Enable DW_CFA_val_expression support in dwarf module

2016-10-31 Thread Jiong Wang

On 21/10/16 09:30, Jiong Wang wrote:

Currently, GCC only support DW_CFA_expression in dwarf module, this patch
extend the support to DW_CFA_val_expression which share the same code
mostly the same code with DW_CFA_expression.

Meanwhile the existed dwarf expression parser only allows expressions
which can be represented using GCC RTL.  If one operation doesn't have
a correspondent GCC RTL operator, then there is no way to attach that
information in reg-note.

This patch extends the current dwarf expression support to unlimited
number of operations by using PARALLEL, and unlimited type of operations
by using UNSPEC.

All DW_OP_* of the expression are grouped together inside the PARALLEL,
and those operations which don't have RTL mapping are wrapped by
UNSPEC.  The parsing algorithm is simply something like:

  foreach elem inside PARALLEL
if (UNSPEC)
  {
dw_op_code = INTVAL (XVECEXP (elem, 0, 0));
oprnd1 = INTVAL (XVECEXP (elem, 0, 1));
oprnd2 = INTVAL (XVECEXP (elem, 0, 2));
  }
else
  call standard RTL parser.

Any comments on the approach?


Ping ~



Thanks.

gcc/
2016-10-20  Jiong Wang  <jiong.w...@arm.com>

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New 
function.

(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support 
DW_CFA_val_expression.

(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.





Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-10-24 Thread Jiong Wang

On 24/10/16 16:22, Jeff Law wrote:

On 10/20/2016 01:46 PM, Jiong Wang wrote:

2016-10-20 19:50 GMT+01:00 Jeff Law <l...@redhat.com>:


On 10/20/2016 09:28 AM, Jiong Wang wrote:


The current code suppose targetm.stack_protect_fail always generate
something.
But in case one target start to generate NULL_TREE, there will be ICE.
This
patch adds a simple sanity check to only call expand if it's not 
NULL_TREE.


OK for trunk?

gcc/
2016-10-20  Jiong Wang  <jiong.w...@arm.com>

* function.c (stack_protect_epilogue): Only expands
targetm.stack_protect_fail if it's not NULL_TREE.


Is there some reason we don't want to issue an error here and stop 
compilation?  I'm not at all comfortable silently ignoring failure 
to generate stack protector code.


jeff



Hi Jeff,

  That's because I am doing some work where I will borrow
stack-protector's analysis infrastructure but for those
stack-protector standard rtl insn, they just need to be expanded into
empty, for example stack_protect_set/test just need to be expanded
into NOTE_INSN_DELETED.   The same for targetm.stack_protect_fail ()
which I want to simply return NULL_TREE.  but it's not an error.
Right.  But your change could mask backend problems.  Specifically if 
their expander for stack_protect_fail did fail and returned NULL_TREE.


That would cause it to silently ignore stack protector failures, which 
seems inadvisable.


Is there another way you can re-use the analysis code without 
resorting to something like this?


In my case, I only want the canary variable which is 
"crtl->stack_protect_guard", then I don't want the current runtime 
support which GCC will always generate once crl->stack_protect_guard is 
initialized.


I was thinking to let stack_protect_fail to generate a tree that 
expand_call will expand into NULL_RTX unconditionally under any 
optimization level, but it seems impossible.  Really appreicate for any 
idea on this.







  This do seems affect other targets (x86, rs6000) if NULL_TREE should
never be returned for them.  Currently I can see all of them use the
either default_external_stack_protect_fail or
default_hidden_stack_protect_fail, both of which are "return
build_call_expr (..", so I should also assert the the return value of
build_call_expr?
Asserting couldn't hurt.  I'd much rather have the compiler issue an 
error, ICE or somesuch than silently not generate a call to the stack 
protector fail routine.




Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-21 Thread Jiong Wang

On 21/10/16 11:13, Bernd Schmidt wrote:

On 10/21/2016 09:43 AM, Eric Botcazou wrote:
I disagree: there are currently n ways of copying NOTEs in the RTL 
middle-end,
with different properties each time.  We need only one primitive in 
rtlanal.c.


I feel the fact that they have different properties means we shouldn't 
try to unify them: we'll just end up with a long list of boolean 
parameters, with no way of quickly telling what a given function call 
is doing. A copy loop is short enough that it can be implemented 
in-place and people can quickly tell what is going on by looking at it.


Maybe the inner if statement could be a small helper function 
(append_copy_of_reg_note).



Bernd


Hi Bernd, Eric,

  How does the attached patch looks to you?  x86_64 bootstrap & regression OK.

  I borrowed Bernd' code to write the tail pointer directly.


2016-10-21  Bernd Schmidt  <bschm...@redhat.com>
    Jiong Wang  <jiong.w...@arm.com>
  
gcc/


PR middle-end/78016
* emit-rtl.c (emit_copy_of_insn_after): Copy REG_NOTES in order instead
of in reverse order.
* sel-sched-ir.c (create_copy_of_insn_rtx): Likewise.


diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 2d6d1eb6c1311871f15dbed13d7c084ed3845a86..4d849ca6e64273bedc5bf8b9a62a5cc5d4606129 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6168,17 +6168,31 @@ emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after)
  which may be duplicated by the basic block reordering code.  */
   RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = _NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label
  will make them.  REG_LABEL_TARGETs are created there too, but are
  supposed to be sticky, so we copy them.  */
   for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
 if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (new_rtx, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
+	rtx new_node;
+
+	if (GET_CODE (link) == INT_LIST)
+	  new_node = gen_rtx_INT_LIST ((machine_mode) REG_NOTE_KIND (link),
+   XINT (link, 0), NULL_RTX);
 	else
-	  add_shallow_copy_of_reg_note (new_rtx, link);
+	  new_node = alloc_reg_note (REG_NOTE_KIND (link),
+ (GET_CODE (link) == EXPR_LIST
+  ? copy_insn_1 (XEXP (link, 0))
+  : XEXP (link ,0)),
+ NULL_RTX);
+
+	*ptail = new_node;
+	ptail =  (new_node, 1);
   }
 
   INSN_CODE (new_rtx) = INSN_CODE (insn);
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 210b1e4edfb359a161cda4826704005ae9ab5a24..324ae8cf05209757a3a3f3dee97c9274876c7ed7 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -5761,6 +5761,11 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);
 
+  /* Locate the end of existing REG_NOTES in RES.  */
+  rtx *ptail = _NOTES (res);
+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are created
  there too, but are supposed to be sticky, so we copy them.  */
@@ -5769,11 +5774,12 @@ create_copy_of_insn_rtx (rtx insn_rtx)
 	&& REG_NOTE_KIND (link) != REG_EQUAL
 	&& REG_NOTE_KIND (link) != REG_EQUIV)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (res, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+	rtx new_node = alloc_reg_note (REG_NOTE_KIND (link),
+   (GET_CODE (link) == EXPR_LIST
+	? copy_insn_1 (XEXP (link, 0))
+	: XEXP (link ,0)), NULL_RTX);
+	*ptail = new_node;
+	ptail =  (new_node, 1);
   }
 
   return res;


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-21 Thread Jiong Wang

On 21/10/16 08:43, Eric Botcazou wrote:

That's also overcomplicated.

Yes, I agree that's too heavy.


rtx *ptail = _NOTES (to_insn);
while (*ptail != NULL_RTX)
ptail =  (*ptail, 1);


Thanks very much Bernd, yes, this is better.  And through manipulating 
pointer directly, those bidirectional new functions are unnecessary.




gives you a pointer to the end which you can then use to append,
unconditionally. As mentioned above, I think it would be simpler to keep
this logic in the caller functions and avoid introducing
append_insn_reg_notes.

I disagree: there are currently n ways of copying NOTEs in the RTL middle-end,
with different properties each time.  We need only one primitive in rtlanal.c.
That's my view,  those duplicated code in emit-rtl.c and sel-sched-ir.c 
really can be shared and append all REG_NOTES from one insn to another 
seems qualify one primitive in rtlanal.c


I will come up with a patch much lighter.

Thanks.




[gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-10-21 Thread Jiong Wang

Currently, GCC only support DW_OP_EXPRESSION in dwarf module, this patch
extend the support to DW_OP_VAL_EXPRESSION which share the same code
mostly the same code with DW_OP_EXPRESSION.

Meanwhile the existed dwarf expression parser only allows expressions
which can be represented using GCC RTL.  If one operation doesn't have
a correspondent GCC RTL operator, then there is no way to attach that
information in reg-note.

This patch extends the current dwarf expression support to unlimited
number of operations by using PARALLEL, and unlimited type of operations
by using UNSPEC.

All DW_OP_* of the expression are grouped together inside the PARALLEL,
and those operations which don't have RTL mapping are wrapped by
UNSPEC.  The parsing algorithm is simply something like:

  foreach elem inside PARALLEL
if (UNSPEC)
  {
dw_op_code = INTVAL (XVECEXP (elem, 0, 0));
oprnd1 = INTVAL (XVECEXP (elem, 0, 1));
oprnd2 = INTVAL (XVECEXP (elem, 0, 2));
  }
else
  call standard RTL parser.

Any comments on the approach?

Thanks.

gcc/
2016-10-20  Jiong Wang  <jiong.w...@arm.com>

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function.
(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set)
   reg_save (sregno, dregno, 0);
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  */
 
 static void
 dwarf2out_frame_debug_cfa_expression (rtx set)
@@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   update_row_reg_save (cur_row, regno, cfi);
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_val_expression (rtx set)
+{
+  rtx dest = SET_DEST (set);
+  gcc_assert (REG_P (dest));
+
+  rtx span = targetm.dwarf_register_span (dest);
+  gcc_assert (!span);
+
+  rtx src = SET_SRC (set);
+  dw_cfi_ref cfi = new_cfi ();
+  cfi->dw_cfi_opc = DW_CFA_val_expression;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest);
+  cfi->dw_cfi_oprnd2.dw_cfi_loc
+= mem_loc_descriptor (src, GET_MODE (src),
+			  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
+  add_cfi (cfi);
+  update_row_reg_save (cur_row, dwf_regno (dest), cfi);
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
@@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	break;
 
   case REG_CFA_EXPRESSION:
+  case REG_CFA_VAL_EXPRESSION:
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n);
+
+	if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION)
+	  dwarf2out_frame_debug_cfa_expression (n);
+	else
+	  dwarf2out_frame_debug_cfa_val_expression (n);
+
 	handled_one = true;
 	break;
 
@@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh);
@@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
@@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh)
 
 	case DW_CFA_def_cfa_expression:
 	case DW_CFA_expression:
+	case DW_CFA_val_expression:
 	  output_cfa_loc (cfi, for_eh);
 	  break;
 
@@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi)
   break;
 
 case DW_CFA_def_cfa_expression:
-  if (f != asm_out_file)
-	{
-	  fprintf (f, "\t.cfi_def_cfa_expression ...\n");
-	  break;
-	}
-  /* FALLTHRU */
 case DW_CFA_expression:
+case DW_CFA_val_expression:
   if (f != asm_out_file)
 	{
-	  fprintf (f, "\t.cfi_cfa_expression ...\n");
+	  fprintf (f, "\t.cfi_%scfa_%sexpression ...\n",
+		   cfi->dw_c

Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-10-20 Thread Jiong Wang
2016-10-20 19:50 GMT+01:00 Jeff Law <l...@redhat.com>:
>
> On 10/20/2016 09:28 AM, Jiong Wang wrote:
>>
>> The current code suppose targetm.stack_protect_fail always generate
>> something.
>> But in case one target start to generate NULL_TREE, there will be ICE.
>> This
>> patch adds a simple sanity check to only call expand if it's not NULL_TREE.
>>
>> OK for trunk?
>>
>> gcc/
>> 2016-10-20  Jiong Wang  <jiong.w...@arm.com>
>>
>> * function.c (stack_protect_epilogue): Only expands
>> targetm.stack_protect_fail if it's not NULL_TREE.
>
> Is there some reason we don't want to issue an error here and stop 
> compilation?  I'm not at all comfortable silently ignoring failure to 
> generate stack protector code.
>
> jeff


Hi Jeff,

  That's because I am doing some work where I will borrow
stack-protector's analysis infrastructure but for those
stack-protector standard rtl insn, they just need to be expanded into
empty, for example stack_protect_set/test just need to be expanded
into NOTE_INSN_DELETED.   The same for targetm.stack_protect_fail ()
which I want to simply return NULL_TREE.  but it's not an error.

  This do seems affect other targets (x86, rs6000) if NULL_TREE should
never be returned for them.  Currently I can see all of them use the
either default_external_stack_protect_fail or
default_hidden_stack_protect_fail, both of which are "return
build_call_expr (..", so I should also assert the the return value of
build_call_expr?

  Thanks.


[Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE

2016-10-20 Thread Jiong Wang

The current code suppose targetm.stack_protect_fail always generate something.
But in case one target start to generate NULL_TREE, there will be ICE.  This
patch adds a simple sanity check to only call expand if it's not NULL_TREE.

OK for trunk?

gcc/
2016-10-20  Jiong Wang  <jiong.w...@arm.com>

* function.c (stack_protect_epilogue): Only expands
targetm.stack_protect_fail if it's not NULL_TREE.

diff --git a/gcc/function.c b/gcc/function.c
index cdd2721cdf904be6457d090fe20345d3dee0b4dd..304c32ed2b1ace06139786680f30502d8483a8ed 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5077,7 +5077,9 @@ stack_protect_epilogue (void)
   if (JUMP_P (tmp))
 predict_insn_def (tmp, PRED_NORETURN, TAKEN);
 
-  expand_call (targetm.stack_protect_fail (), NULL_RTX, /*ignore=*/true);
+  tree fail_check = targetm.stack_protect_fail ();
+  if (fail_check != NULL_TREE)
+expand_call (fail_check, NULL_RTX, /*ignore=*/true);
   free_temp_slots ();
   emit_label (label);
 }


[Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-20 Thread Jiong Wang

As discussed on PR middle-end/78016, here is the patch.

This patch makes EXPR_LIST/INST_LIST/INT_LIST insertion bi-directional, the new
node can be inserted to either the start or the end of the given list.

The existed alloc_EXPR_LIST, alloc_INSN_LIST becomes wrapper of new
bi-directional function, there is no functional change on them, callers of them
are *not affected*.

This patch then factor out those REG_NOTES copy code in emit-rtl.c and
sel-sched-ir.c into a function append_insn_reg_notes in rtlanal.c, it use those
new bi-directional interfaces to make sure the order of REG_NOTES are not
changed during insn copy.  Redundant code in emit-rtl.c and sel-sched-ir.c are
deleted also.

x86_64/aarch64 bootstrap OK. c/c++ regression OK.

OK for trunk?

gcc/
2016-10-20  Jiong Wang  <jiong.w...@arm.com>

PR middle-end/78016
* lists.c (alloc_INSN_LIST_bidirection): New function.  The function
body is cloned from alloc_INSN_LIST with minor changes to make it
support bi-directional insertion.
(alloc_EXPR_LIST_bidirection): Likewise.
(alloc_INT_LIST_bidirection): New function.  Alloc INT_LIST node, and
support bi-directional insertion into given list.
(alloc_INSN_LIST): Call alloc_INSN_LIST_bidirection.
(alloc_EXPR_LIST): Call alloc_EXPR_LIST_bidirection.
* rtl.h (append_insn_reg_notes): New declaration.
(alloc_INSN_LIST_bidirection): New declaration.
(alloc_EXPR_LIST_bidirection): New declaration.
(alloc_INT_LIST_bidirection): New declaration.
* rtlanal.c (alloc_reg_note_bidirection): New static function.  Function
body is cloned from alloc_reg_note with minor changes to make it support
bi-directional insertion.
(alloc_reg_note): Call alloc_reg_note_bidirection.
(append_insn_reg_notes): New function.
* emit-rtl.c (emit_copy_of_insn_after): Use append_insn_reg_notes.
* sel-sched-ir.c (create_copy_of_insn_rtx): Likewise.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 2d6d1eb..87eb1e3 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6125,7 +6125,6 @@ rtx_insn *
 emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after)
 {
   rtx_insn *new_rtx;
-  rtx link;
 
   switch (GET_CODE (insn))
 {
@@ -6171,15 +6170,7 @@ emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after)
   /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label
  will make them.  REG_LABEL_TARGETs are created there too, but are
  supposed to be sticky, so we copy them.  */
-  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
-if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND)
-  {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (new_rtx, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_shallow_copy_of_reg_note (new_rtx, link);
-  }
+  append_insn_reg_notes (new_rtx, insn, true, false);
 
   INSN_CODE (new_rtx) = INSN_CODE (insn);
   return new_rtx;
diff --git a/gcc/lists.c b/gcc/lists.c
index 96b4bc7..cd30b7c 100644
--- a/gcc/lists.c
+++ b/gcc/lists.c
@@ -98,11 +98,14 @@ remove_list_elem (rtx elem, rtx *listp)
 
 /* This call is used in place of a gen_rtx_INSN_LIST. If there is a cached
node available, we'll use it, otherwise a call to gen_rtx_INSN_LIST
-   is made.  */
+   is made.  The new node will be appended at the end of LIST if APPEND_P is
+   TRUE, otherwise list is appended to the new node.  */
+
 rtx_insn_list *
-alloc_INSN_LIST (rtx val, rtx next)
+alloc_INSN_LIST_bidirection (rtx val, rtx list, bool append_p)
 {
   rtx_insn_list *r;
+  rtx next = append_p ? NULL_RTX : list;
 
   if (unused_insn_list)
 {
@@ -117,16 +120,33 @@ alloc_INSN_LIST (rtx val, rtx next)
   else
 r = gen_rtx_INSN_LIST (VOIDmode, val, next);
 
+  if (append_p)
+{
+  gcc_assert (list != NULL_RTX);
+  XEXP (list, 1) = r;
+}
+
   return r;
 }
 
+/* Allocate new INSN_LIST node for VAL, append NEXT to it.  */
+
+rtx_insn_list *
+alloc_INSN_LIST (rtx val, rtx next)
+{
+  return alloc_INSN_LIST_bidirection (val, next, false);
+}
+
 /* This call is used in place of a gen_rtx_EXPR_LIST. If there is a cached
node available, we'll use it, otherwise a call to gen_rtx_EXPR_LIST
-   is made.  */
+   is made.  The new node will be appended at the end of LIST if APPEND_P is
+   TRUE, otherwise list is appended to the new node.  */
+
 rtx_expr_list *
-alloc_EXPR_LIST (int kind, rtx val, rtx next)
+alloc_EXPR_LIST_bidirection (int kind, rtx val, rtx list, bool append_p)
 {
   rtx_expr_list *r;
+  rtx next = append_p ? NULL_RTX : list;
 
   if (unused_expr_list)
 {
@@ -139,9 +159,23 @@ alloc_EXPR_LIST (int kind, rtx val, rtx next)
   else
 r = gen_rtx_EXPR_LIST ((machine_mode) kind, val, next);
 
+  if (append_p)
+{
+  gcc_assert (list != NULL_RTX);
+  XEXP (list, 1) = r;
+}
+
   return r;
 }
 
+/* Allocate new EXPR_LIST node for KIND and VAL, append NEXT to it.  */
+
+rtx_exp

Re: [PATCH v2] aarch64: Add split-stack initial support

2016-10-14 Thread Jiong Wang

Hi Adhemerval,

On 06/10/16 22:54, Adhemerval Zanella wrote:

+  bool split_stack_arg_pointer_used = split_stack_arg_pointer_used_p ();
  
if (flag_stack_usage_info)

  current_function_static_stack_size = frame_size;
@@ -3220,6 +3264,10 @@ aarch64_expand_prologue (void)
aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size);
  }
  
+  /* Save split-stack argument pointer before stack adjustment.  */

+  if (split_stack_arg_pointer_used)
+emit_move_insn (gen_rtx_REG (Pmode, R10_REGNUM), stack_pointer_rtx);
+
aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -initial_adjust, true);
  
if (callee_adjust != 0)

@@ -3243,6 +3291,30 @@ aarch64_expand_prologue (void)
 callee_adjust != 0 || frame_pointer_needed);
aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, -final_adjust,
!frame_pointer_needed);
+
+  if (split_stack_arg_pointer_used_p ())


Redundant call? can use split_stack_arg_pointer_use"

+
+  /* Always emit two insns to calculate the requested stack, so the linker
+ can edit them when adjusting size for calling non-split-stack code.  */
+  ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true,
+ Pmode);
+  gcc_assert (ninsn == 1 || ninsn == 2);
+  if (ninsn == 1)
+emit_insn (gen_nop ());

If you expect the nop is kept together with the other
I am still seeing nop scheduled away from the addtition.

mov x10, -4144
add x10, sp, x10
nop


+
+#define BACKOFF0x2000


The BACKOFF value is 0x2000 here while in morestack-c.c it is 0x1000, is 
this deliberate?

+
+   # Calculate requested stack size.
+   sub x12, sp, x10
+   # Save parameters
+   stp x29, x30, [sp, -MORESTACK_FRAMESIZE]!
+   .cfi_def_cfa_offset MORESTACK_FRAMESIZE
+   .cfi_offset 29, -MORESTACK_FRAMESIZE
+   .cfi_offset 30, -MORESTACK_FRAMESIZE+8
+   add x29, sp, 0
+   .cfi_def_cfa_register 29
+   # Adjust the requested stack size for the frame pointer save.
+   add x12, x12, 16
+   stp x0, x1, [sp, 16]
+   stp x2, x3, [sp, 32]
+   add x12, x12, BACKOFF
+   stp x4, x5, [sp, 48]
+   stp x6, x7, [sp, 64]
+   stp x28, x12, [sp, 80]
+
+   # Setup on x28 the function initial frame pointer.  Its value will
+   # copied to function argument pointer.
+   add x28, sp, MORESTACK_FRAMESIZE + 16
+
+   # void __morestack_block_signals (void)
+   bl  __morestack_block_signals
+
+   # void *__generic_morestack (size_t *pframe_size,
+   #void *old_stack,
+   #size_t param_size)
+   # pframe_size: is the size of the required stack frame (the function
+   #  amount of space remaining on the allocated stack).

s/pframe_size: is the size/pframe_size: points at the size/


+
+   # Set up for a call to the target function.
+   ldr x30, [x28, STACKFRAME_BASE + 8]
+   ldp x0, x1, [x28, STACKFRAME_BASE + 16]
+   ldp x2, x3, [x28, STACKFRAME_BASE + 32]
+   ldp x4, x5, [x28, STACKFRAME_BASE + 48]
+   ldp x6, x7, [x28, STACKFRAME_BASE + 64]
+   add x9, x30, 8
+   cmp x30, x9

We can remove this "cmp" by using "adds x9, x30, 8"?
I am thinkg "adds" will set "c" bit in conditional flag to zero, then 
the bcs
check in function prologue will fail, thus the argument pointer 
initialization

will always be executed if the execution flow is from __morestack.

bcs .L8
mov x10, x28


+   blr x9
+
+   stp x0, x1, [x28, STACKFRAME_BASE + 16]
+   stp x2, x3, [x28, STACKFRAME_BASE + 32]
+   stp x4, x5, [x28, STACKFRAME_BASE + 48]
+   stp x6, x7, [x28, STACKFRAME_BASE + 64]
+



Re: [AArch64][0/14] ARMv8.2-A FP16 extension support

2016-10-05 Thread Jiong Wang

On 27/09/16 17:03, Jiong Wang wrote:
>
> Now as ARM patches have gone in around r240427, I have done a quick 
confirmation

> on the status of these four pending testsuite patches:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00337.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00338.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00339.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00340.html
>
> The result is they applies cleanly on gcc trunk, and there is no 
regression on
> AArch64 native regression test.  Testcases enabled without 
requirement of FP16

> all passed.
>
> I will give a final run on ARM native board and AArch64 emulation 
environment
> with ARMv8.2-A FP16 enabled. (Have done this before, just in case 
something

> changed during these days)
>
> OK for trunk if there is no regression?
>
> Thanks

Finished the final tests on emulator with FP16 enabled.

  * No regression on AARCH64, all new testcases passed.
  * No regression on AARCH32, part of these new testcases UNRESOLVED 
because

they should be skipped on AARCH32, fixed by the attached trivial patch
which I will merge into the 4th patch (no affect on changelog).

OK to commit these patches?


diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
index f8c8c79..0bebec7 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
index 23c11a4..68ce599 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
index ae4c8b5..1b5a09b 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
index 56a6533..766c783 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
index fb54e96..8f5c14b 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
index 57c765c..ccfecf4 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
index f9a5bbe..161c7a0 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-skip-if &

Re: [AArch64][0/14] ARMv8.2-A FP16 extension support

2016-09-27 Thread Jiong Wang

On 25/07/16 12:26, James Greenhalgh wrote:

On Thu, Jul 07, 2016 at 05:12:48PM +0100, Jiong Wang wrote:

Hello,

As a follow up of

https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html,

This patch set adds ARMv8.2-A FP16 scalar and vector intrinsics support,
gcc middle-end will also be aware of some standard operations that some
instructions can be auto-generated.

According to ACLE, ARMv8.2-A FP16 intrinsics for AArch64 is superset of
intrinsics for AArch32, so all those intrinsic related testcases,
particularly those under the directory advsimd-intrinsics, are also
appliable to AArch64.  This patch set has only included those testcases
that are exclusive for AArch64.

Jiong Wang (14)
   ARMv8.2-A FP16 data processing intrinsics
   ARMv8.2-A FP16 one operand vector intrinsics
   ARMv8.2-A FP16 two operands vector intrinsics
   ARMv8.2-A FP16 three operands vector intrinsics
   ARMv8.2-A FP16 lane vector intrinsics
   ARMv8.2-A FP16 reduction vector intrinsics
   ARMv8.2-A FP16 one operand scalar intrinsics
   ARMv8.2-A FP16 two operands scalar intrinsics
   ARMv8.2-A FP16 three operands scalar intrinsics
   ARMv8.2-A FP16 lane scalar intrinsics

At this point, I've OKed the first 10 patches in the series, these represent
the functional changes to the compiler. I'm leaving the testsuite patches
for now, as they depend on testsuite changes that have yet to be approved
for the ARM port.

To save you from having to continue to rebase the functional parts of this
patch while you wait for review of the ARM changes, I would be OK with you
committing them now, on the understanding that you'll continue to check
the testsuite in the time between now and the testsuite changes are approved,
and that you'll fix any issues that you find.


   ARMv8.2-A FP16 testsuite selector
   ARMv8.2-A testsuite for new data movement intrinsics
   ARMv8.2-A testsuite for new vector intrinsics
   ARMv8.2-A testsuite for new scalar intrinsics
  
I've taken a brief look through these testsuite changes and they look OK

to me. I'll revisit them properly once I've seen the ARM patches go in.


Now as ARM patches have gone in around r240427, I have done a quick 
confirmation

on the status of these four pending testsuite patches:

  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00337.html
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00338.html
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00339.html
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00340.html

The result is they applies cleanly on gcc trunk, and there is no 
regression on
AArch64 native regression test.  Testcases enabled without requirement 
of FP16

all passed.

I will give a final run on ARM native board and AArch64 emulation 
environment

with ARMv8.2-A FP16 enabled. (Have done this before, just in case something
changed during these days)

OK for trunk if there is no regression?

Thanks



[COMMITTED, aarch64] Delete one redundant word in target-supports.exp comment

2016-09-27 Thread Jiong Wang

This patch deletes one redundant word in target-supports.exp function comment
for "check_effective_target_arm_v8_2a_fp16_scalar_hw".

   s/instructions floating point instructions/floating point instructions/

The comment is re-indented.  No other changes.

Committed as obivious as r240551.

gcc/testsuite/
2016-09-27  Jiong Wang<jiong.w...@arm.com>

 * lib/target-supports.exp
 (check_effective_target_arm_v8_2a_fp16_scalar_hw): Delete redundant 
word
 in function comment.

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 3d11e28..50723de 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4015,9 +4015,8 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
 } [add_options_for_arm_v8_1a_neon ""]]
 }
 
-# Return 1 if the target supports executing instructions floating point
-# instructions from ARMv8.2 with the FP16 extension, 0 otherwise.  The
-# test is valid for ARM.
+# Return 1 if the target supports executing floating point instructions from
+# ARMv8.2 with the FP16 extension, 0 otherwise.  The test is valid for ARM.
 
 proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
 if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {


Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-08-26 Thread Jiong Wang

Wilco Dijkstra writes:

> Ping
>
> I noticed it would still be a good idea to add an extra barrier in the epilog 
> as the
> scheduler doesn't appear to handle aliases of frame accesses properly.
>
> This patch simplifies the handling of the EH return value.  We force the use 
> of the
> frame pointer so the return location is always at FP + 8.  This means we can 
> emit
> a simple volatile access in EH_RETURN_HANDLER_RTX without needing md
> patterns, splitters and frame offset calculations.  The new implementation 
> also
> fixes various bugs in aarch64_final_eh_return_addr, which does not work with
> -fomit-frame-pointer, alloca or outgoing arguments.

The -fomit-frame-pointer is really broken on aarch64_find_eh_return_addr

-  return gen_frame_mem (DImode,
-   plus_constant (Pmode,
-  stack_pointer_rtx,
-  fp_offset
-  + cfun->machine->frame.saved_regs_size
-  - 2 * UNITS_PER_WORD));

the saved_regs_size includes both general and vector register saving
area, while LR should be saved on top of general register
area. Meanwhile saved_regs_size contains alignment amount.

Given EH unwind code will invoke __builtin_unwind_init which pushes all
callee-saved, both general and vector, the current function will always
get wrong offset.

I think the correct offset when -fomit-frame-pointer should be:

  "cfun->machine->frame.reg_offset[LR_REGNUM]"

I have done a quick check on _Unwind_RaiseException which is the only
code affected by this change.  Without frame pointer, the exception
handler's address is installed in different, thus wrong, stack slot.

...
str x30, [sp, 112]
...
str x19, [sp, 176]

This approach used in this patch looks good to me.

> 2016-08-10  Wilco Dijkstra  
> gcc/
> * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
> * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
> (EH_RETURN_HANDLER_RTX): New define.
> * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
> Force frame pointer in EH return functions.
> (aarch64_expand_epilogue): Add barrier for eh_return.
> (aarch64_final_eh_return_addr): Remove.
> (aarch64_eh_return_handler_rtx): New function.
> * config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr):
> Remove.
> (aarch64_eh_return_handler_rtx): New prototype.

-- 
Regards,
Jiong


Re: [PATCH] aarch64: Add split-stack initial support

2016-08-23 Thread Jiong Wang

Adhemerval Zanella writes:

> On 08/08/2016 07:58, Jiong Wang wrote:
>> 
>> Adhemerval Zanella writes:
>> 
>
> Below it the last iteration patch, however I now seeing some similar issue
> s390 hit when building libgo:
>
> ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: error: flow control 
> insn inside a basic block
> (jump_insn 90 89 91 14 (set (pc)
> (if_then_else (geu (reg:CC 66 cc)
> (const_int 0 [0]))
> (label_ref 92)
> (pc))) ../../../gcc-git/libgo/go/syscall/socket_linux.go:90 -1
>  (nil)
>  -> 92)
> ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: internal compiler 
> error: in rtl_verify_bb_insns, at cfgrtl.c:2658
> 0xac35af _fatal_insn(char const*, rtx_def const*, char const*, int, char 
> const*)
>
> It shows only with -O2, which I think it due how the block is reorganized
> internally and regarding the pseudo-return instruction inserted by 
> split-stack.
> I am still debugging the issue and how to proper fix it, so if you have any
> advice I open to suggestions.
> ...
> ...
> +void
> +aarch64_split_stack_space_check (rtx size, rtx label)
> +{
> +  rtx mem, ssvalue, compare, jump, temp;
> +  rtx requested = gen_reg_rtx (Pmode);
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = 0x10;
> +
> +  /* Load __private_ss from TCB.  */
> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
> +  emit_move_insn (ssvalue, mem);
> +
> +  /* And compare it with frame pointer plus required stack.  */
> +  if (CONST_INT_P (size))
> + emit_insn (gen_add3_insn (requested, stack_pointer_rtx,
> +GEN_INT (-INTVAL (size;

If the constant size doesn't fit into an add instruction, this statement
will generates NULL, then the following comparision is wrong I guess.

I am not sure if this is reason for the ICE you mentioned above.

Meanwhile for the nop scheduling issue, I do see the following
instruction sequences generated.  "add" is scheduled before nop.

mov x10, -4160
add x10, sp, x10
nop

I currently don't have good idea on tie the "nop" with "mov", for TLS
relaxation which require similar instruction tie, we are simply use
single RTL pattern to output multiply instructions at final assembly
output stage.

> +  else
> +{
> +  size = force_reg (Pmode, size);
> +  emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx,
> + size));
> +}
> +
> +  /* Jump to __morestack call if current __private_ss is not suffice.  */
> +  compare = aarch64_gen_compare_reg (LT, requested, ssvalue);
> +  temp = gen_rtx_IF_THEN_ELSE (VOIDmode,
> +gen_rtx_GEU (VOIDmode, compare, const0_rtx),
> +gen_rtx_LABEL_REF (VOIDmode, label),
> +pc_rtx);

-- 
Regards,
Jiong


Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-08 Thread Jiong Wang

Jiong Wang writes:

> Andrew Pinski writes:
>
>> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>> <james.greenha...@arm.com> wrote:
>>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>>> >
>>>> > Andrew Pinski writes:
>>>> >
>>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>>> >>>
>>>> >>> James Greenhalgh writes:
>>>> >>>
>>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>>> >>>>> Current IRA still use both target macros in a few places.
>>>> >>>>>
>>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>>> >>>>>
>>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>>> >>>>> pure caller saved when used in leaf function.
>>>> >>>>>
>>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping 
>>>> >>>>> some
>>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>>> >>>>> number
>>>> >>>>> is smaller.
>>>> >>>>>
>>>> >>>>> OK for trunk?
>>>> >>>>
>>>> >>>> OK, sorry for the delay.
>>>> >>>>
>>>> >>>> It might be mail client mangling, but please check that the trailing 
>>>> >>>> slashes
>>>> >>>> line up in the version that gets committed.
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> James
>>>> >>>>
>>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.w...@arm.com>
>>>> >>>>>
>>>> >>>>> gcc/
>>>> >>>>>  PR 63521
>>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>>> >>>
>>>> >>> Patch reverted.
>>>> >>
>>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>>> >> missing an email or something.
>>>> >
>>>> > There are several execution regressions under gcc testsuite, although as
>>>> > far as I can see it's this patch exposed hidding bugs in those
>>>> > testcases, but there might be one other issue, so to be conservative, I
>>>> > temporarily reverted this patch.
>>>>
>>>> If you are talking about:
>>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>>> Etc.
>>>>
>>>> These test cases are too dependent on the original register allocation 
>>>> order
>>>> and really can be safely ignored. Really these three tests should be moved 
>>>> or
>>>> written in a more sane way.
>>>
>>> Yup, completely agreed - but the testcases do throw up something
>>> interesting. If we are allocating registers to hold 128-bit values, and
>>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>>> I think we probably see the same thing if the first thing we do in a
>>> function is a structure copy through a back-end expanded movmem, which
>>> will likely begin with a 128-bit LDP using x7, x8.
>>>
>>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>>> followed by x8, then we've potentially made a sub-optimal decision, our
>>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>>
>>> My hunch is that we *might* get better code generation in this corner case
>>> out of some permutation of the allocation order for argument
>>> registers. I'm thinking something along the lines of
>>>
>>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>>
>>> I asked Jiong to take a look at that, and I agree with his decision to
>>> reduce the churn on trunk and just revert the patch until we've come to
>>> a c

Re: [PATCH] aarch64: Add split-stack initial support

2016-08-08 Thread Jiong Wang

Adhemerval Zanella writes:

>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index e56398a..2cf239f 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3227,6 +3227,34 @@ aarch64_expand_prologue (void)
>>RTX_FRAME_RELATED_P (insn) = 1;
>>  }
>>  }
>> +
>> +  if (flag_split_stack && offset)
>> +{
>> +  /* Setup the argument pointer (x10) for -fsplit-stack code.  If
>> + __morestack was called, it will left the arg pointer to the
>> + old stack in x28.  Otherwise, the argument pointer is the top
>> + of current frame.  */
>> +  rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM);
>> +  rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);
>> +  rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);
>> +  rtx x29 = gen_rtx_REG (Pmode, R29_REGNUM);
>> +  rtx not_more = gen_label_rtx ();
>> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
>> +  rtx jump;
>> +
>> +  emit_move_insn (x11, GEN_INT (hard_fp_offset));
>> +  emit_insn (gen_add3_insn (x10, x29, x11));
>> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
>> +   gen_rtx_GEU (VOIDmode, cc_reg,
>> +const0_rtx),
>> +   gen_rtx_LABEL_REF (VOIDmode, not_more),
>> +   pc_rtx);
>> +  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
>> +  JUMP_LABEL (jump) = not_more;
>> +  LABEL_NUSES (not_more) += 1;
>> +  emit_move_insn (x10, x28);
>> +  emit_label (not_more);
>> +}
>>  }

This part needs rebase, there are major changes in AArch64 prologue code
recently.

>>  
>>  /* Return TRUE if we can use a simple_return insn.
>> @@ -3303,6 +3331,7 @@ aarch64_expand_epilogue (bool for_sibcall)
>>offset = offset - fp_offset;
>>  }
>>  
>> +

Unncessary new line.

>>if (offset > 0)
>>  {
>>unsigned reg1 = cfun->machine->frame.wb_candidate1;
>> @@ -9648,7 +9677,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
>> nextarg ATTRIBUTE_UNUSED)
>>/* Emit code to initialize STACK, which points to the next varargs stack
>>   argument.  CUM->AAPCS_STACK_SIZE gives the number of stack words used
>>   by named arguments.  STACK is 8-byte aligned.  */
>> -  t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);
>> +  t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);
>>if (cum->aapcs_stack_size > 0)
>>  t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * 
>> UNITS_PER_WORD);
>>t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);
>> @@ -14010,6 +14039,196 @@ aarch64_optab_supported_p (int op, machine_mode 
>> mode1, machine_mode,
>>  }
>>  }
>>  
>> +/* -fsplit-stack support.  */
>> +
>> +/* A SYMBOL_REF for __morestack.  */
>> +static GTY(()) rtx morestack_ref;
>> +
>> +/* Emit -fsplit-stack prologue, which goes before the regular function
>> +   prologue.  */
>> +void
>> +aarch64_expand_split_stack_prologue (void)
>> +{
>> +  HOST_WIDE_INT frame_size, args_size;
>> +  rtx_code_label *ok_label = NULL;
>> +  rtx mem, ssvalue, compare, jump, insn, call_fusage;
>> +  rtx reg11, reg30, temp;
>> +  rtx new_cfa, cfi_ops = NULL;
>> +  /* Offset from thread pointer to __private_ss.  */
>> +  int psso = 0x10;
>> +  int ninsn;
>> +
>> +  gcc_assert (flag_split_stack && reload_completed);
>> +
>> +  /* It limits total maximum stack allocation on 2G so its value can be
>> + materialized with two instruction at most (movn/movk).  It might be
>> + used by the linker to add some extra space for split calling non split
>> + stack functions.  */
>> +  frame_size = cfun->machine->frame.frame_size;
>> +  if (frame_size > ((HOST_WIDE_INT) 1 << 31))
>> +{
>> +  sorry ("Stack frame larger than 2G is not supported for 
>> -fsplit-stack");
>> +  return;
>> +}
>> +
>> +  if (morestack_ref == NULL_RTX)
>> +{
>> +  morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
>> +  SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
>> +   | SYMBOL_FLAG_FUNCTION);
>> +}
>> +
>> +  /* Load __private_ss from TCB.  */
>> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
>> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
>> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
>> +  emit_move_insn (ssvalue, mem);
>> +
>> +  temp = gen_rtx_REG (Pmode, R10_REGNUM);
>> +
>> +  /* Always emit two insns to calculate the requested stack, so the linker
>> + can edit them when adjusting size for calling non-split-stack code.  */
>> +  ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true,
>> +  Pmode);
>> +  gcc_assert (ninsn == 1 || ninsn == 2);
>> +  if (ninsn == 1)
>> +emit_insn (gen_nop ());

there will be trouble to linker if the following add is scheduled before
the nop?

>> diff --git 

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-05 Thread Jiong Wang

Andrew Pinski writes:

> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
> <james.greenha...@arm.com> wrote:
>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>> >
>>> > Andrew Pinski writes:
>>> >
>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.w...@arm.com> wrote:
>>> >>>
>>> >>> James Greenhalgh writes:
>>> >>>
>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> >>>>> Current IRA still use both target macros in a few places.
>>> >>>>>
>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>> >>>>>
>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>> >>>>> pure caller saved when used in leaf function.
>>> >>>>>
>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>> >>>>> number
>>> >>>>> is smaller.
>>> >>>>>
>>> >>>>> OK for trunk?
>>> >>>>
>>> >>>> OK, sorry for the delay.
>>> >>>>
>>> >>>> It might be mail client mangling, but please check that the trailing 
>>> >>>> slashes
>>> >>>> line up in the version that gets committed.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> James
>>> >>>>
>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.w...@arm.com>
>>> >>>>>
>>> >>>>> gcc/
>>> >>>>>  PR 63521
>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>> >>>
>>> >>> Patch reverted.
>>> >>
>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>> >> missing an email or something.
>>> >
>>> > There are several execution regressions under gcc testsuite, although as
>>> > far as I can see it's this patch exposed hidding bugs in those
>>> > testcases, but there might be one other issue, so to be conservative, I
>>> > temporarily reverted this patch.
>>>
>>> If you are talking about:
>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>> Etc.
>>>
>>> These test cases are too dependent on the original register allocation order
>>> and really can be safely ignored. Really these three tests should be moved 
>>> or
>>> written in a more sane way.
>>
>> Yup, completely agreed - but the testcases do throw up something
>> interesting. If we are allocating registers to hold 128-bit values, and
>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>> I think we probably see the same thing if the first thing we do in a
>> function is a structure copy through a back-end expanded movmem, which
>> will likely begin with a 128-bit LDP using x7, x8.
>>
>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>> followed by x8, then we've potentially made a sub-optimal decision, our
>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>
>> My hunch is that we *might* get better code generation in this corner case
>> out of some permutation of the allocation order for argument
>> registers. I'm thinking something along the lines of
>>
>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>
>> I asked Jiong to take a look at that, and I agree with his decision to
>> reduce the churn on trunk and just revert the patch until we've come to
>> a conclusion based on some evidence - rather than just my hunch! I agree
>> that it would be harmless on trunk from a testing point of view, but I
>> think Jiong is right to revert the patch until we better understand the
>> code-generation implications.
>>
>> Of course, it might be that I am completely wrong! If you've already taken
>> a look at using a register allocation order like th

Re: [5.0 Backport][AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-29 Thread Jiong Wang

Jiong Wang writes:

> On 07/07/16 10:34, James Greenhalgh wrote:
>>
>> To make backporting easier, could you please write a very simple
>> standalone test that exposes this bug, and submit this patch with just
>> that simple test? I've already OKed the functional part of this patch, and
>> I'm happy to pre-approve a simple testcase.
>>
>> With that committed to trunk, this needs to go to all active release
>> branches please.
>
> Committed attached patch to trunk as r238166, fmax/fmin pattern were
> introduced by [1] which is available since gcc 6, so backported to
> gcc 6 branch as r238167.

Here is the gcc 5 backport patch, it's slightly different from gcc 6
backport patch as fmin/fmax are not introduced yet.

OK to backport?

gcc/
2016-07-29  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64-simd-builtins.def (smax, smin): Don't
register float variants.
(fmax, fmin): New builtins for VDQF modes.
* config/aarch64/arm_neon.h (vmaxnm_f32): Use
__builtin_aarch64_fmaxv2sf.
(vmaxnmq_f32): Likewise.
(vmaxnmq_f64): Likewise.
(vminnm_f32): Likewise.
(vminnmq_f32): Likewise.
(vminnmq_f64): Likewise.
* config/aarch64/iterators.md (UNSPEC_FMAXNM, UNSPEC_FMINNM): New.
(FMAXMIN_UNS): Support UNSPEC_FMAXNM and UNSPEC_FMINNM.
    (maxmin_uns, maxmin_uns_op): Likewise. 

gcc/testsuite/
2016-07-29  Jiong Wang  <jiong.w...@arm.com>

* gcc.target/aarch64/simd/vminmaxnm_1.c: New.

-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index dd2bc47..446d826 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -240,15 +240,16 @@
   BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10)
   BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10)
 
-  /* Implemented by 3.
- smax variants map to fmaxnm,
- smax_nan variants map to fmax.  */
-  BUILTIN_VDQIF (BINOP, smax, 3)
-  BUILTIN_VDQIF (BINOP, smin, 3)
+  /* Implemented by 3.  */
+  BUILTIN_VDQ_BHSI (BINOP, smax, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
+  /* Implemented by 3.  */
   BUILTIN_VDQF (BINOP, smax_nan, 3)
   BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VDQF (BINOP, fmax, 3)
+  BUILTIN_VDQF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 4c15312..283000e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17733,19 +17733,19 @@ vpminnms_f32 (float32x2_t a)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_smaxv2sf (__a, __b);
+  return __builtin_aarch64_fmaxv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_smaxv4sf (__a, __b);
+  return __builtin_aarch64_fmaxv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_smaxv2df (__a, __b);
+  return __builtin_aarch64_fmaxv2df (__a, __b);
 }
 
 /* vmaxv  */
@@ -17963,19 +17963,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vminnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_sminv2sf (__a, __b);
+  return __builtin_aarch64_fminv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vminnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_sminv4sf (__a, __b);
+  return __builtin_aarch64_fminv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vminnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_sminv2df (__a, __b);
+  return __builtin_aarch64_fminv2df (__a, __b);
 }
 
 /* vminv  */
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 2efbfab..c7e1d0c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -186,9 +186,11 @@
 UNSPEC_ASHIFT_UNSIGNED	; Used in aarch64-simd.md.
 UNSPEC_ABS		; Used in aarch64-simd.md.
 UNSPEC_FMAX		; Used in aarch64-simd.md.
+UNSPEC_FMAXNM	; Used in aarch64-simd.md.
 UNSPEC_FMAXNMV	; Used in aarch64-simd.md.
 UNSPEC_FMAXV	; Used in aarch64-simd.md.
 UNSPEC_FMIN		; Used in aarch64-simd.md.
+UNSPEC_FMINNM	; Used in aarch64-simd.md.
 UNSPEC_FMINNMV	; Used in aarch64-simd.md.
 UNSPEC_FMINV	; Used in aarch64-simd.md.
 UNSPEC_FADDV	; Used in aarch64-

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-25 Thread Jiong Wang

On 21/07/16 11:08, Richard Earnshaw (lists) wrote:

On 20/07/16 16:02, Jiong Wang wrote:

Richard,
   Thanks for the review, yes, I believe using aarch64_add_constant is
unconditionally
safe here.  Because we have generated a stack tie to clobber the whole
memory thus
prevent any instruction which access stack be scheduled after that.

   The access to deallocated stack issue was there and fixed by

   https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.

  aarch64_add_constant itself is generating the same instruction
sequences as the
original code, except for a few cases, it will prefer

   move scratch_reg, #imm
   add sp, sp, scratch_reg

than:
   add sp, sp, #imm_part1
   add sp, sp, #imm_part2





OK, I've had another look at this and I'm happy that we don't
(currently) run into the problem I'm concerned about.

However, this new usage does impose a constraint on aarch64_add_constant
that will need to be respected in future, so please can you add the
following to the comment that precedes that function:

/* ...

This function is sometimes used to adjust the stack pointer, so we
must ensure that it can never cause transient stack deallocation
by writing an invalid value into REGNUM.  */



+  bool frame_related_p = (regnum == SP_REGNUM);

I think it would be better to make the frame-related decision be an
explicit parameter passed to the routine (don't forget SP is not always
the frame pointer).  Then the new uses would pass 'true' and the
existing uses 'false'.

R.


Thanks, attachment is the updated patch which:

  * Added above new comments for aarch64_add_constant.
  * One new parameter "frame_related_p" for aarch64_add_constant.
I thought adding new gcc assertion for sanity check of
frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c
is doing that.

OK for trunk?

gcc/
2016-07-25  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64.c (aarch64_add_constant): New
 parameter "frame_related_p".  Generate CFA annotation when
 it's necessary.
 (aarch64_expand_prologue): Use aarch64_add_constant.
 (aarch64_expand_epilogue): Likewise.
 (aarch64_output_mi_thunk): Pass "false" when calling
 aarch64_add_constant.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 41844a1..ca93f6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if
-   it is necessary.  */
+   it is necessary.
+
+   This function is sometimes used to adjust the stack pointer, so we must
+   ensure that it can never cause transient stack deallocation by writing an
+   invalid value into REGNUM.  */
 
 static void
 aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
-		  HOST_WIDE_INT delta)
+		  HOST_WIDE_INT delta, bool frame_related_p)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
+  rtx_insn *insn;
 
   /* Do nothing if mdelta is zero.  */
   if (!mdelta)
@@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   /* We only need single instruction if the offset fit into add/sub.  */
   if (aarch64_uimm12_shift (mdelta))
 {
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
@@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
   low_off = delta < 0 ? -low_off : low_off;
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
   aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (frame_related_p)
+{
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  rtx adj = plus_constant (mode, this_rtx, delta);
+  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+}
 }
 
 static bool
@@ -3038,36 +3052,7 @@ aarch64_expand_prologue (void)
   frame_size -= (offset + crtl->outgoing_args_size);
   fp_offset = 0;
 
-  if (frame_size >

Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics

2016-07-20 Thread Jiong Wang

On 07/07/16 17:17, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 two operands scalar intrinsics.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64.md (hf3): 
New.
(hf3): Likewise.
(add3): Likewise.
(sub3): Likewise.
(mul3): Likewise.
(div3): Likewise.
(*div3): Likewise.
(3): Extend to HF.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
false for HFmode.
* config/aarch64/aarch64-simd.md (aarch64_rsqrts): Likewise.
(fabd3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_fmulx): Likewise.
(aarch64_fac): Likewise.
(aarch64_frecps): Likewise.
(hfhi3): New.
(hihf3): Likewise.
* config/aarch64/iterators.md (VHSDF_SDF): Delete.
(VSDQ_HSDI): Support HI.
(fcvt_target, FCVT_TARGET): Likewise.
* config/aarch64/arm_fp16.h: (vaddh_f16): New.
(vsubh_f16): Likewise.
(vabdh_f16): Likewise.
(vcageh_f16): Likewise.
(vcagth_f16): Likewise.
(vcaleh_f16): Likewise.
(vcalth_f16): Likewise.(vcleh_f16): Likewise.
(vclth_f16): Likewise.
(vcvth_n_f16_s16): Likewise.
(vcvth_n_f16_s32): Likewise.
(vcvth_n_f16_s64): Likewise.
(vcvth_n_f16_u16): Likewise.
(vcvth_n_f16_u32): Likewise.
(vcvth_n_f16_u64): Likewise.
(vcvth_n_s16_f16): Likewise.
(vcvth_n_s32_f16): Likewise.
(vcvth_n_s64_f16): Likewise.
(vcvth_n_u16_f16): Likewise.
(vcvth_n_u32_f16): Likewise.
(vcvth_n_u64_f16): Likewise.
(vdivh_f16): Likewise.
(vmaxh_f16): Likewise.
(vmaxnmh_f16): Likewise.
(vminh_f16): Likewise.
(vminnmh_f16): Likewise.
(vmulh_f16): Likewise.
(vmulxh_f16): Likewise.
(vrecpsh_f16): Likewise.
(vrsqrtsh_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6f50d8405d3ee8c4823037bb2022a4f2f08b72fe..31abc077859254e3696adacb3f8f2b9b2da0647f 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -393,13 +393,12 @@
   /* Implemented by
  aarch64_frecp.  */
   BUILTIN_GPF_F16 (UNOP, frecpe, 0)
-  BUILTIN_GPF (BINOP, frecps, 0)
   BUILTIN_GPF_F16 (UNOP, frecpx, 0)
 
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, frecpe, 0)
-  BUILTIN_VHSDF (BINOP, frecps, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
 
   /* Implemented by a mixture of abs2 patterns.  Note the DImode builtin is
  only ever used for the int64x1_t intrinsic, there is no scalar version.  */
@@ -496,17 +495,23 @@
   /* Implemented by <*><*>3.  */
   BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
   BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3)
-  BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
+  BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
+  VAR1 (SHIFTIMM, scvtfsi, 3, hf)
+  VAR1 (SHIFTIMM, scvtfdi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
+  VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
+  BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
+  BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
 
   /* Implemented by aarch64_rsqrte.  */
   BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
 
   /* Implemented by aarch64_rsqrts.  */
-  BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0)
+  BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
 
   /* Implemented by fabd3.  */
-  BUILTIN_VHSDF_SDF (BINOP, fabd, 3)
+  BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
 
   /* Implemented by aarch64_faddp.  */
   BUILTIN_VHSDF (BINOP, faddp, 0)
@@ -522,10 +527,10 @@
   BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
 
   /* Implemented by aarch64_fac.  */
-  BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0)
-  BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
+  BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
 
   /* Implemented by sqrt2.  */
   VAR1 (UNOP, sqrt, 2, hf)
@@ -543,3 +548,7 @@
   BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
   BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+
+  /* Implemented by 3.  */
+  VAR1 (BINOP, fmax, 3, hf)
+  VAR1 (BINOP, fmin, 3, hf)
diff --git a/

Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics

2016-07-20 Thread Jiong Wang

On 07/07/16 17:17, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 one operand scalar intrinsics

Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

The change is to let aarch64_emit_approx_sqrt return false for HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.w...@arm.com>

* config.gcc (aarch64*-*-*): Install arm_fp16.h.
* config/aarch64/aarch64-builtins.c (hi_UP): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to HF 
mode.
(aarch64_frecp): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/aarch64.md (2): Likewise.
(l2): Likewise.
(fix_trunc2): Likewise.
(sqrt2): Likewise.
(*sqrt2): Likewise.
(abs2): Likewise.
(hf2): New pattern for HF mode.
(hihf2): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
for HF mode.
* config/aarch64/arm_neon.h: Include arm_fp16.h.
* config/aarch64/iterators.md (GPF_F16): New.
(GPI_F16): Likewise.
(VHSDF_HSDF): Likewise.
(w1): Support HF mode.
(w2): Likewise.
(v): Likewise.
(s): Likewise.
(q): Likewise.
(Vmtype): Likewise.
(V_cmp_result): Likewise.
(fcvt_iesize): Likewise.
(FCVT_IESIZE): Likewise.
* config/aarch64/arm_fp16.h: New file.
(vabsh_f16): New.
(vceqzh_f16): Likewise.
(vcgezh_f16): Likewise.
(vcgtzh_f16): Likewise.
(vclezh_f16): Likewise.
(vcltzh_f16): Likewise.
(vcvth_f16_s16): Likewise.
(vcvth_f16_s32): Likewise.
(vcvth_f16_s64): Likewise.
(vcvth_f16_u16): Likewise.
(vcvth_f16_u32): Likewise.
(vcvth_f16_u64): Likewise.
(vcvth_s16_f16): Likewise.
(vcvth_s32_f16): Likewise.
(vcvth_s64_f16): Likewise.
(vcvth_u16_f16): Likewise.
(vcvth_u32_f16): Likewise.
(vcvth_u64_f16): Likewise.
(vcvtah_s16_f16): Likewise.
(vcvtah_s32_f16): Likewise.
(vcvtah_s64_f16): Likewise.
(vcvtah_u16_f16): Likewise.
(vcvtah_u32_f16): Likewise.
(vcvtah_u64_f16): Likewise.
(vcvtmh_s16_f16): Likewise.
(vcvtmh_s32_f16): Likewise.
(vcvtmh_s64_f16): Likewise.
(vcvtmh_u16_f16): Likewise.
(vcvtmh_u32_f16): Likewise.
(vcvtmh_u64_f16): Likewise.
(vcvtnh_s16_f16): Likewise.
(vcvtnh_s32_f16): Likewise.
(vcvtnh_s64_f16): Likewise.
(vcvtnh_u16_f16): Likewise.
(vcvtnh_u32_f16): Likewise.
(vcvtnh_u64_f16): Likewise.
(vcvtph_s16_f16): Likewise.
(vcvtph_s32_f16): Likewise.
(vcvtph_s64_f16): Likewise.
(vcvtph_u16_f16): Likewise.
(vcvtph_u32_f16): Likewise.
(vcvtph_u64_f16): Likewise.
(vnegh_f16): Likewise.
(vrecpeh_f16): Likewise.
(vrecpxh_f16): Likewise.
(vrndh_f16): Likewise.
(vrndah_f16): Likewise.
(vrndih_f16): Likewise.
(vrndmh_f16): Likewise.
(vrndnh_f16): Likewise.
(vrndph_f16): Likewise.
(vrndxh_f16): Likewise.
(vrsqrteh_f16): Likewise.
(vsqrth_f16): Likewise.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17877334c2bb61cd16b69539ec7514db8ae..8827dc830d374c2512be5713d6dd143913f53c7d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -300,7 +300,7 @@ m32c*-*-*)
 ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_neon.h arm_acle.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index af5fac5b29cf5373561d9bf9a69c401d2bec5cec..ca91d9108ead3eb83c21ee86d9e6ed44c8f4ad2d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
 #define si_UPSImode
 #define sf_UPSFmode
 #define hi_UPHImode
+#define hf_UPHFmode
 #define qi_UPQImode
 #define UP(X) X##_UP
 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 363e131327d6be04dd94e664ef839e46f26940e4..6f50d8405d3ee8c4823037bb2022a4f2f08b72fe 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -274,6 +274,14 @@
   BUILTIN_VHSDF (UNOP, round, 2)
   BUILTIN_VHSDF_DF (UNOP, frintn, 2)
 
+  VAR1 (UNOP, btrunc, 2, hf)
+  VAR1 (UNOP, ceil, 2, hf)
+  VAR1 (UNOP, floor, 2, hf)
+  VAR1 (UNOP, frintn, 2, hf)
+  VAR1 (UNOP, nearbyint, 2, hf)
+  VAR1 (UNOP, rint, 2, hf)
+  VAR1 (UNOP, round, 2, hf)
+

Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics

2016-07-20 Thread Jiong Wang

On 07/07/16 17:15, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 two operands vector intrinsics.


The updated patch resolve the conflict with

   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html

The change is to let aarch64_emit_approx_div return false for
V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md
(aarch64_rsqrts): Extend to HF modes.
(fabd3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_p): Likewise.
(3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_faddp): Likewise.
(aarch64_fmulx): Likewise.
(aarch64_frecps): Likewise.
(*aarch64_fac): Rename to aarch64_fac.
(add3): Extend to HF modes.
(sub3): Likewise.
(mul3): Likewise.
(div3): Likewise.
(*div3): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Return
false for V4HF and V8HF.
* config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
iterator.
* config/aarch64/arm_neon.h (vadd_f16): Likewise.
(vaddq_f16): Likewise.
(vabd_f16): Likewise.
(vabdq_f16): Likewise.
(vcage_f16): Likewise.
(vcageq_f16): Likewise.
(vcagt_f16): Likewise.
(vcagtq_f16): Likewise.
(vcale_f16): Likewise.
(vcaleq_f16): Likewise.
(vcalt_f16): Likewise.
(vcaltq_f16): Likewise.
(vceq_f16): Likewise.
(vceqq_f16): Likewise.
(vcge_f16): Likewise.
(vcgeq_f16): Likewise.
(vcgt_f16): Likewise.
(vcgtq_f16): Likewise.
(vcle_f16): Likewise.
(vcleq_f16): Likewise.
(vclt_f16): Likewise.
(vcltq_f16): Likewise.
(vcvt_n_f16_s16): Likewise.
(vcvtq_n_f16_s16): Likewise.
(vcvt_n_f16_u16): Likewise.
(vcvtq_n_f16_u16): Likewise.
(vcvt_n_s16_f16): Likewise.
(vcvtq_n_s16_f16): Likewise.
(vcvt_n_u16_f16): Likewise.
(vcvtq_n_u16_f16): Likewise.
(vdiv_f16): Likewise.
(vdivq_f16): Likewise.
(vdup_lane_f16): Likewise.
(vdup_laneq_f16): Likewise.
(vdupq_lane_f16): Likewise.
(vdupq_laneq_f16): Likewise.
(vdups_lane_f16): Likewise.
(vdups_laneq_f16): Likewise.
(vmax_f16): Likewise.
(vmaxq_f16): Likewise.
(vmaxnm_f16): Likewise.
(vmaxnmq_f16): Likewise.
(vmin_f16): Likewise.
(vminq_f16): Likewise.
(vminnm_f16): Likewise.
(vminnmq_f16): Likewise.
(vmul_f16): Likewise.
(vmulq_f16): Likewise.
(vmulx_f16): Likewise.
(vmulxq_f16): Likewise.
(vpadd_f16): Likewise.
(vpaddq_f16): Likewise.
(vpmax_f16): Likewise.
(vpmaxq_f16): Likewise.
(vpmaxnm_f16): Likewise.
(vpmaxnmq_f16): Likewise.
(vpmin_f16): Likewise.
(vpminq_f16): Likewise.
(vpminnm_f16): Likewise.
(vpminnmq_f16): Likewise.
(vrecps_f16): Likewise.
(vrecpsq_f16): Likewise.
(vrsqrts_f16): Likewise.
(vrsqrtsq_f16): Likewise.
(vsub_f16): Likewise.
(vsubq_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 22c87be429ba1aac2bbe77f1119d16b6b8bd6e80..007dad60b6999158a1c9c1cf2a501a9f0712af54 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VALLF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -248,22 +248,22 @@
   BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
-  BUILTIN_VDQF (BINOP, smax_nan, 3)
-  BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VHSDF (BINOP, smax_nan, 3)
+  BUILTIN_VHSDF (BINOP, smin_nan, 3)
 
   /* Implemented by 3.  */
-  BUILTIN_VDQF (BINOP, fmax, 3)
-  BUILTIN_VDQF (BINOP, fmin, 3)
+  BUILTIN_VHSDF (BINOP, fmax, 3)
+  BUILTIN_VHSDF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
   BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
-  BUILTIN_VDQF (BINOP, smaxp, 0)
-  BUILTIN_VDQF (BINOP, sminp, 0)
-  BUILTIN_VDQF (BINOP, smax_nanp, 0)
-  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smaxp, 0)
+  BUILTIN_VHSDF (BINOP, sminp, 0)
+  BUILTIN_VHSDF (BINOP, smax_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smin_nanp, 0)
 
   /* Implemented by 2.  */
   BUILTIN_VHSDF (UNOP, btrunc, 2)
@@ -383,7 +383,7 @@
   BUILTIN_VDQ_SI (UNOP, urecpe, 0)
 
   BUILTIN_VHSDF (UNOP, fre

Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics

2016-07-20 Thread Jiong Wang

On 07/07/16 17:14, Jiong Wang wrote:

This patch add ARMv8.2-A FP16 one operand vector intrinsics.

We introduced new mode iterators to cover HF modes, qualified patterns
which was using old mode iterators are switched to new ones.

We can't simply extend old iterator like VDQF to conver HF modes,
because not all patterns using VDQF are with new FP16 support, thus we
introduced new, temperary iterators, and only apply new iterators on
those patterns which do have FP16 supports.


I noticed the patchset at

  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html

has some modifications on the standard name "div" and "sqrt", thus there
are minor conflicts as this patch touch "sqrt" as well.

This patch resolve the conflict and the change is to let
aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode.

gcc/
2016-07-20  Jiong Wang<jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF 
modes.
(neg2): Likewise.
(abs2): Likewise.
(2): Likewise.
(l2): Likewise.
(2): Likewise.
(2): Likewise.
(ftrunc2): Likewise.
(2): Likewise.
(sqrt2): Likewise.
(*sqrt2): Likewise.
(aarch64_frecpe): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return
false for V4HF and V8HF.
* config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
(VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to 
HF modes.
(stype): New.
* config/aarch64/arm_neon.h (vdup_n_f16): New.
(vdupq_n_f16): Likewise.
(vld1_dup_f16): Use vdup_n_f16.
(vld1q_dup_f16): Use vdupq_n_f16.
(vabs_f16): New.
(vabsq_f16): Likewise.
(vceqz_f16): Likewise.
(vceqzq_f16): Likewise.
(vcgez_f16): Likewise.
(vcgezq_f16): Likewise.
(vcgtz_f16): Likewise.
(vcgtzq_f16): Likewise.
(vclez_f16): Likewise.
(vclezq_f16): Likewise.
(vcltz_f16): Likewise.
(vcltzq_f16): Likewise.
(vcvt_f16_s16): Likewise.
(vcvtq_f16_s16): Likewise.
(vcvt_f16_u16): Likewise.
(vcvtq_f16_u16): Likewise.
(vcvt_s16_f16): Likewise.
(vcvtq_s16_f16): Likewise.
(vcvt_u16_f16): Likewise.
(vcvtq_u16_f16): Likewise.
(vcvta_s16_f16): Likewise.
(vcvtaq_s16_f16): Likewise.
(vcvta_u16_f16): Likewise.
(vcvtaq_u16_f16): Likewise.
(vcvtm_s16_f16): Likewise.
(vcvtmq_s16_f16): Likewise.
(vcvtm_u16_f16): Likewise.
(vcvtmq_u16_f16): Likewise.
(vcvtn_s16_f16): Likewise.
(vcvtnq_s16_f16): Likewise.
(vcvtn_u16_f16): Likewise.
(vcvtnq_u16_f16): Likewise.
(vcvtp_s16_f16): Likewise.
(vcvtpq_s16_f16): Likewise.
(vcvtp_u16_f16): Likewise.
(vcvtpq_u16_f16): Likewise.
(vneg_f16): Likewise.
(vnegq_f16): Likewise.
(vrecpe_f16): Likewise.
(vrecpeq_f16): Likewise.
(vrnd_f16): Likewise.
(vrndq_f16): Likewise.
(vrnda_f16): Likewise.
(vrndaq_f16): Likewise.
(vrndi_f16): Likewise.
(vrndiq_f16): Likewise.
(vrndm_f16): Likewise.
(vrndmq_f16): Likewise.
(vrndn_f16): Likewise.
(vrndnq_f16): Likewise.
(vrndp_f16): Likewise.
(vrndpq_f16): Likewise.
(vrndx_f16): Likewise.
(vrndxq_f16): Likewise.
(vrsqrte_f16): Likewise.
(vrsqrteq_f16): Likewise.
(vsqrt_f16): Likewise.
(vsqrtq_f16): Likewise.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b90b2af5e9d2b5e7f48569ec1ebcb0ef16314ee..af5fac5b29cf5373561d9bf9a69c401d2bec5cec 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f1ad325f464f89c981cbdee8a8f6afafa938639a..22c87be429ba1aac2bbe77f1119d16b6b8bd6e80 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/conf

Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Jiong Wang

On 20/07/16 15:18, Richard Earnshaw (lists) wrote:

On 20/07/16 14:03, Jiong Wang wrote:

Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
are doing exactly what's aarch64_add_constant offered, except they also
need to be aware of dwarf generation.

This patch teach existed aarch64_add_constant about dwarf generation and
currently SP register is supported.  Whenever SP is updated, there
should be CFA update, we then mark these instructions as frame related,
and if the update is too complex for gcc to guess the adjustment, we
attach explicit annotation.

Both dwarf frame info size and pro/epilogue scheduling are improved after
this patch as aarch64_add_constant has better utilization of scratch
register.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  <jiong.w...@arm.com>

 * config/aarch64/aarch64.c (aarch64_add_constant): Mark
 instruction as frame related when it is.  Generate CFA
 annotation when it's necessary.
 (aarch64_expand_prologue): Use aarch64_add_constant.
 (aarch64_expand_epilogue): Likewise.


Are you sure using aarch64_add_constant is unconditionally safe?  Stack
adjustments need to be done very carefully to ensure that we never
transiently deallocate part of the stack.


Richard,

  Thanks for the review, yes, I believe using aarch64_add_constant is 
unconditionally
safe here.  Because we have generated a stack tie to clobber the whole 
memory thus

prevent any instruction which access stack be scheduled after that.

  The access to deallocated stack issue was there and fixed by

  https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html.

 aarch64_add_constant itself is generating the same instruction 
sequences as the

original code, except for a few cases, it will prefer

  move scratch_reg, #imm
  add sp, sp, scratch_reg

than:
  add sp, sp, #imm_part1
  add sp, sp, #imm_part2






[AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant

2016-07-20 Thread Jiong Wang

Those stack adjustment sequences inside aarch64_expand_prologue/epilogue
are doing exactly what's aarch64_add_constant offered, except they also
need to be aware of dwarf generation.

This patch teach existed aarch64_add_constant about dwarf generation and
currently SP register is supported.  Whenever SP is updated, there
should be CFA update, we then mark these instructions as frame related,
and if the update is too complex for gcc to guess the adjustment, we
attach explicit annotation.

Both dwarf frame info size and pro/epilogue scheduling are improved after
this patch as aarch64_add_constant has better utilization of scratch register.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64.c (aarch64_add_constant): Mark
instruction as frame related when it is.  Generate CFA
annotation when it's necessary.
(aarch64_expand_prologue): Use aarch64_add_constant.
(aarch64_expand_epilogue): Likewise.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 41844a101247c939ecb31f8a8c17cf79759255aa..b38f3f1e8f85a5f3191d0c96080327dac7b2eaed 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1874,6 +1874,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
+  bool frame_related_p = (regnum == SP_REGNUM);
+  rtx_insn *insn;
 
   /* Do nothing if mdelta is zero.  */
   if (!mdelta)
@@ -1882,7 +1884,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   /* We only need single instruction if the offset fit into add/sub.  */
   if (aarch64_uimm12_shift (mdelta))
 {
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
@@ -1895,15 +1898,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
   low_off = delta < 0 ? -low_off : low_off;
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
-  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
   return;
 }
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
   aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (frame_related_p)
+{
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  rtx adj = plus_constant (mode, this_rtx, delta);
+  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+}
 }
 
 static bool
@@ -3038,36 +3049,7 @@ aarch64_expand_prologue (void)
   frame_size -= (offset + crtl->outgoing_args_size);
   fp_offset = 0;
 
-  if (frame_size >= 0x100)
-	{
-	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
-	  emit_move_insn (op0, GEN_INT (-frame_size));
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
-
-	  add_reg_note (insn, REG_CFA_ADJUST_CFA,
-			gen_rtx_SET (stack_pointer_rtx,
- plus_constant (Pmode, stack_pointer_rtx,
-		-frame_size)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-  else if (frame_size > 0)
-	{
-	  int hi_ofs = frame_size & 0xfff000;
-	  int lo_ofs = frame_size & 0x000fff;
-
-	  if (hi_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (-hi_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-	  if (lo_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (-lo_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	}
-	}
+  aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -frame_size);
 }
   else
 frame_size = -1;
@@ -3287,31 +3269,7 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (need_barrier_p)
 	emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
 
-  if (frame_size >= 0x100)
-	{
-	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
-	  emit_move_insn (op0, GEN_INT (frame_size));
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
-	}
-  else
-	{
-  int hi_ofs = frame_size & 0xfff000;
-  int lo_ofs = frame_size & 0x000fff;
-
-	  if (hi_ofs && lo_ofs)
-	{
-	  insn = emit_insn (gen_add2_insn
-(stack_pointer_rtx, GEN_INT (hi_ofs)));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	  frame_size = lo_ofs;
-	}
-	  insn = emit_insn (gen_add2_ins

[AArch64][2/3] Optimize aarch64_add_constant to generate better addition sequences

2016-07-20 Thread Jiong Wang

This patch optimize immediate addition sequences generated by
aarch64_add_constant.

The current addition sequences generated are:

  * If immediate fit into unsigned 12bit range, generate single add/sub.

  * Otherwise if it fit into unsigned 24bit range, generate two add/sub.


  * Otherwise invoke general constant build function.


This haven't considered the situation where immedate can't fit into
unsigned 12bit range, but can fit into single mov instruction for which
case we generate one move and one addition.  The move won't touch the
destination register thus the sequences is better than two additions
which both touch the destination register.


This patch thus optimize the addition sequences into:

  * If immediate fit into unsigned 12bit range, generate single add/sub.
 
  * Otherwise if it fit into unsigned 24bit range, generate two add/sub.

And don't do this if it fit into single move instruction, in which case
move the immedaite to scratch register firstly, then generate one
addition to add the scratch register to the destination register.

  * Otherwise invoke general constant build function.



OK for trunk?

gcc/
2016-07-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64.c (aarch64_add_constant): Optimize
instruction sequences.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aeea3b3ebc514663043ac8d7cd13361f06f78502..41844a101247c939ecb31f8a8c17cf79759255aa 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1865,6 +1865,47 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest));
 }
 
+/* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if
+   it is necessary.  */
+
+static void
+aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
+		  HOST_WIDE_INT delta)
+{
+  HOST_WIDE_INT mdelta = abs_hwi (delta);
+  rtx this_rtx = gen_rtx_REG (mode, regnum);
+
+  /* Do nothing if mdelta is zero.  */
+  if (!mdelta)
+return;
+
+  /* We only need single instruction if the offset fit into add/sub.  */
+  if (aarch64_uimm12_shift (mdelta))
+{
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  return;
+}
+
+  /* We need two add/sub instructions, each one perform part of the
+ addition/subtraction, but don't this if the addend can be loaded into
+ register by single instruction, in that case we prefer a move to scratch
+ register following by addition.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+{
+  HOST_WIDE_INT low_off = mdelta & 0xfff;
+
+  low_off = delta < 0 ? -low_off : low_off;
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  return;
+}
+
+  /* Otherwise use generic function to handle all other situations.  */
+  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
+  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
+  emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+}
+
 static bool
 aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
  tree exp ATTRIBUTE_UNUSED)
@@ -3337,44 +3378,6 @@ aarch64_final_eh_return_addr (void)
    - 2 * UNITS_PER_WORD));
 }
 
-static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
-		  HOST_WIDE_INT delta)
-{
-  HOST_WIDE_INT mdelta = delta;
-  rtx this_rtx = gen_rtx_REG (mode, regnum);
-  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-
-  if (mdelta < 0)
-mdelta = -mdelta;
-
-  if (mdelta >= 4096 * 4096)
-{
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
-}
-  else if (mdelta > 0)
-{
-  if (mdelta >= 4096)
-	{
-	  emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
-	  rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
-	  if (delta < 0)
-	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_MINUS (mode, this_rtx, shift)));
-	  else
-	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_PLUS (mode, this_rtx, shift)));
-	}
-  if (mdelta % 4096 != 0)
-	{
-	  scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096));
-	  emit_insn (gen_rtx_SET (this_rtx,
-  gen_rtx_PLUS (mode, this_rtx, scratch_rtx)));
-	}
-}
-}
-
 /* Output code to add DELTA to the first argument, and then jump
to FUNCTION.  Used for C++ multiple inheritance.  */
 static void


[AArch64][1/3] Migrate aarch64_add_constant to new interface & kill aarch64_build_constant

2016-07-20 Thread Jiong Wang

Currently aarch64_add_constant is using aarch64_build_constant to move
an immediate into the destination register.

It has considered the following situations:

  * immediate can fit into bitmask pattern that only needs single
instruction.
  * immediate can fit into single movz/movn.
  * immediate needs single movz/movn, and multiply movk.


Actually we have another constant building helper function
"aarch64_internal_mov_immediate" which cover all these situations and
more.

This patch thus migrate aarch64_add_constant to
aarch64_internal_mov_immediate so that we can kill the old
aarch64_build_constant.

OK for trunk?

gcc/
2016-07-20  Jiong Wang  <jiong.w...@arm.com>

* config/aarch64/aarch64.c (aarch64_add_constant): New
parameter "mode".  Use aarch64_internal_mov_immediate
instead of aarch64_build_constant.
(aarch64_build_constant): Delete.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 512ef10d158d2eaa1384d28c43b9a8f90387099d..aeea3b3ebc514663043ac8d7cd13361f06f78502 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3337,98 +3337,20 @@ aarch64_final_eh_return_addr (void)
    - 2 * UNITS_PER_WORD));
 }
 
-/* Possibly output code to build up a constant in a register.  For
-   the benefit of the costs infrastructure, returns the number of
-   instructions which would be emitted.  GENERATE inhibits or
-   enables code generation.  */
-
-static int
-aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate)
-{
-  int insns = 0;
-
-  if (aarch64_bitmask_imm (val, DImode))
-{
-  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
-  insns = 1;
-}
-  else
-{
-  int i;
-  int ncount = 0;
-  int zcount = 0;
-  HOST_WIDE_INT valp = val >> 16;
-  HOST_WIDE_INT valm;
-  HOST_WIDE_INT tval;
-
-  for (i = 16; i < 64; i += 16)
-	{
-	  valm = (valp & 0x);
-
-	  if (valm != 0)
-	++ zcount;
-
-	  if (valm != 0x)
-	++ ncount;
-
-	  valp >>= 16;
-	}
-
-  /* zcount contains the number of additional MOVK instructions
-	 required if the constant is built up with an initial MOVZ instruction,
-	 while ncount is the number of MOVK instructions required if starting
-	 with a MOVN instruction.  Choose the sequence that yields the fewest
-	 number of instructions, preferring MOVZ instructions when they are both
-	 the same.  */
-  if (ncount < zcount)
-	{
-	  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			GEN_INT (val | ~(HOST_WIDE_INT) 0x));
-	  tval = 0x;
-	  insns++;
-	}
-  else
-	{
-	  if (generate)
-	emit_move_insn (gen_rtx_REG (Pmode, regnum),
-			GEN_INT (val & 0x));
-	  tval = 0;
-	  insns++;
-	}
-
-  val >>= 16;
-
-  for (i = 16; i < 64; i += 16)
-	{
-	  if ((val & 0x) != tval)
-	{
-	  if (generate)
-		emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
-	   GEN_INT (i),
-	   GEN_INT (val & 0x)));
-	  insns++;
-	}
-	  val >>= 16;
-	}
-}
-  return insns;
-}
-
 static void
-aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
+aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
+		  HOST_WIDE_INT delta)
 {
   HOST_WIDE_INT mdelta = delta;
-  rtx this_rtx = gen_rtx_REG (Pmode, regnum);
-  rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
+  rtx this_rtx = gen_rtx_REG (mode, regnum);
+  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
 
   if (mdelta < 0)
 mdelta = -mdelta;
 
   if (mdelta >= 4096 * 4096)
 {
-  (void) aarch64_build_constant (scratchreg, delta, true);
+  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
   emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
 }
   else if (mdelta > 0)
@@ -3436,19 +3358,19 @@ aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
   if (mdelta >= 4096)
 	{
 	  emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
-	  rtx shift = gen_rtx_ASHIFT (Pmode, scratch_rtx, GEN_INT (12));
+	  rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12));
 	  if (delta < 0)
 	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_MINUS (Pmode, this_rtx, shift)));
+gen_rtx_MINUS (mode, this_rtx, shift)));
 	  else
 	emit_insn (gen_rtx_SET (this_rtx,
-gen_rtx_PLUS (Pmode, this_rtx, shift)));
+gen_rtx_PLUS (mode, this_rtx, shift)));
 	}
   if (mdelta % 4096 != 0)
 	{
 	  scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096));
 	  emit_insn (gen_rtx_SET (this_rtx,
-  gen_rtx_PLUS (Pmode, this_rtx, scratch_rtx)));
+  gen_rtx_PLUS (mode, this_rtx, scratch_rtx)));
 	}
 }
 }
@@ -3473,7 +3395,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
   emit_note (NOTE_INSN_PROLOG

[COMMITTED][AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-08 Thread Jiong Wang

On 07/07/16 10:34, James Greenhalgh wrote:


To make backporting easier, could you please write a very simple
standalone test that exposes this bug, and submit this patch with just
that simple test? I've already OKed the functional part of this patch, and
I'm happy to pre-approve a simple testcase.

With that committed to trunk, this needs to go to all active release
branches please.


Committed attached patch to trunk as r238166, fmax/fmin pattern were
introduced by [1] which is available since gcc 6, so backported to
gcc 6 branch as r238167.

--
[1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02654.html
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3e4740c..f1ad325 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -244,13 +244,17 @@
   /* Implemented by 3.
  smax variants map to fmaxnm,
  smax_nan variants map to fmax.  */
-  BUILTIN_VDQIF (BINOP, smax, 3)
-  BUILTIN_VDQIF (BINOP, smin, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smax, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
   BUILTIN_VDQF (BINOP, smax_nan, 3)
   BUILTIN_VDQF (BINOP, smin_nan, 3)
 
+  /* Implemented by 3.  */
+  BUILTIN_VDQF (BINOP, fmax, 3)
+  BUILTIN_VDQF (BINOP, fmin, 3)
+
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ed24b59..b0ab1d3 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17588,19 +17588,19 @@ vpminnms_f32 (float32x2_t a)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_smaxv2sf (__a, __b);
+  return __builtin_aarch64_fmaxv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_smaxv4sf (__a, __b);
+  return __builtin_aarch64_fmaxv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_smaxv2df (__a, __b);
+  return __builtin_aarch64_fmaxv2df (__a, __b);
 }
 
 /* vmaxv  */
@@ -17818,19 +17818,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vminnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_sminv2sf (__a, __b);
+  return __builtin_aarch64_fminv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vminnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_sminv4sf (__a, __b);
+  return __builtin_aarch64_fminv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vminnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_sminv2df (__a, __b);
+  return __builtin_aarch64_fminv2df (__a, __b);
 }
 
 /* vminv  */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
new file mode 100644
index 000..8333f03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
@@ -0,0 +1,82 @@
+/* Test the `v[min|max]nm{q}_f*' AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+extern void abort ();
+
+#define CHECK(T, N, R, E) \
+  {\
+int i = 0;\
+for (; i < N; i++)\
+  if (* (T *) [i] != * (T *) [i])\
+	abort ();\
+  }
+
+int
+main (int argc, char **argv)
+{
+  float32x2_t f32x2_input1 = vdup_n_f32 (-1.0);
+  float32x2_t f32x2_input2 = vdup_n_f32 (0.0);
+  float32x2_t f32x2_exp_minnm  = vdup_n_f32 (-1.0);
+  float32x2_t f32x2_exp_maxnm  = vdup_n_f32 (0.0);
+  float32x2_t f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
+  float32x2_t f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
+
+  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
+  f32x2_input1 = vdup_n_f32 (__builtin_nanf (""));
+  f32x2_input2 = vdup_n_f32 (1.0);
+  f32x2_exp_minnm  = vdup_n_f32 (1.0);
+  f32x2_exp_maxnm  = vdup_n_f32 (1.0);
+  f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
+  f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
+
+  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
+  float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0);
+  float32x4_t f32x4_input2 = vdupq_n_f32 (77.0);
+  float32x4_t f32x4_exp_minnm  = vdupq_n_f32 (-1024.0);
+  float32x4_t f32x4_exp_maxnm  = vdupq_n_f32 (77.0);
+  float32x4_t f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
+  float32x4_t f32x4_ret_maxnm 

[AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics

2016-07-07 Thread Jiong Wang

This patch add ARMv8.2-A FP16 three operands vector intrinsics.

Three operands intrinsics only contain fma and fms.

2016-07-07  Jiong Wang <jiong.w...@arm.com>

gcc/
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (fma4): Extend to HF modes.
(fnma4): Likewise.
* config/aarch64/arm_neon.h (vfma_f16): New.
(vfmaq_f16): Likewise.
(vfms_f16): Likewise.
(vfmsq_f16): Likewise.

>From dc2121d586b759b864d9653e188a14d1f7296f25 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.w...@arm.com>
Date: Wed, 8 Jun 2016 10:21:25 +0100
Subject: [PATCH 04/14] [4/14] ARMv8.2 FP16 three operands vector intrinsics

---
 gcc/config/aarch64/aarch64-simd-builtins.def |  4 +++-
 gcc/config/aarch64/aarch64-simd.md   | 28 ++--
 gcc/config/aarch64/arm_neon.h| 26 ++
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index fe17298..6ff5063 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -405,7 +405,9 @@
   BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fma4.  */
-  BUILTIN_VDQF (TERNOP, fma, 4)
+  BUILTIN_VHSDF (TERNOP, fma, 4)
+  /* Implemented by fnma4.  */
+  BUILTIN_VHSDF (TERNOP, fnma, 4)
 
   /* Implemented by aarch64_simd_bsl.  */
   BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 0a80adb..576ad3c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1526,13 +1526,13 @@
 )
 
 (define_insn "fma4"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-   (fma:VDQF (match_operand:VDQF 1 "register_operand" "w")
-(match_operand:VDQF 2 "register_operand" "w")
-(match_operand:VDQF 3 "register_operand" "0")))]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+   (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+		  (match_operand:VHSDF 2 "register_operand" "w")
+		  (match_operand:VHSDF 3 "register_operand" "0")))]
   "TARGET_SIMD"
  "fmla\\t%0., %1., %2."
-  [(set_attr "type" "neon_fp_mla_")]
+  [(set_attr "type" "neon_fp_mla_")]
 )
 
 (define_insn "*aarch64_fma4_elt"
@@ -1599,15 +1599,15 @@
 )
 
 (define_insn "fnma4"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(fma:VDQF
-	  (match_operand:VDQF 1 "register_operand" "w")
-  (neg:VDQF
-	(match_operand:VDQF 2 "register_operand" "w"))
-	  (match_operand:VDQF 3 "register_operand" "0")))]
-  "TARGET_SIMD"
- "fmls\\t%0., %1., %2."
-  [(set_attr "type" "neon_fp_mla_")]
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(fma:VHSDF
+	  (match_operand:VHSDF 1 "register_operand" "w")
+  (neg:VHSDF
+	(match_operand:VHSDF 2 "register_operand" "w"))
+	  (match_operand:VHSDF 3 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "fmls\\t%0., %1., %2."
+  [(set_attr "type" "neon_fp_mla_")]
 )
 
 (define_insn "*aarch64_fnma4_elt"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e78ff43..ad5b6fa 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26458,6 +26458,32 @@ vsubq_f16 (float16x8_t __a, float16x8_t __b)
   return __a - __b;
 }
 
+/* ARMv8.2-A FP16 three operands vector intrinsics.  */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_aarch64_fmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_aarch64_fmav8hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+  return __builtin_aarch64_fnmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+  return __builtin_aarch64_fnmav8hf (__b, __c, __a);
+}
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
-- 
2.5.0







[AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics

2016-07-07 Thread Jiong Wang

This patch add ARMv8.2-A FP16 one operand scalar intrinsics

Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h.

gcc/
2016-07-07  Jiong Wang <jiong.w...@arm.com>

* config.gcc (aarch64*-*-*): Install arm_fp16.h.
* config/aarch64/aarch64-builtins.c (hi_UP): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_frsqrte): 
Extend to HF mode.

(aarch64_frecp): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/aarch64.md (2): Likewise.
(l2): Likewise.
(fix_trunc2): Likewise.
(sqrt2): Likewise.
(abs2): Likewise.
(hf2): New pattern for HF mode.
(hihf2): Likewise.
* config/aarch64/arm_neon.h: Include arm_fp16.h.
* config/aarch64/iterators.md (GPF_F16): New.
(GPI_F16): Likewise.
(VHSDF_HSDF): Likewise.
(w1): Support HF mode.
(w2): Likewise.
(v): Likewise.
(s): Likewise.
(q): Likewise.
(Vmtype): Likewise.
(V_cmp_result): Likewise.
(fcvt_iesize): Likewise.
(FCVT_IESIZE): Likewise.
* config/aarch64/arm_fp16.h: New file.
(vabsh_f16): New.
(vceqzh_f16): Likewise.
(vcgezh_f16): Likewise.
(vcgtzh_f16): Likewise.
(vclezh_f16): Likewise.
(vcltzh_f16): Likewise.
(vcvth_f16_s16): Likewise.
(vcvth_f16_s32): Likewise.
(vcvth_f16_s64): Likewise.
(vcvth_f16_u16): Likewise.
(vcvth_f16_u32): Likewise.
(vcvth_f16_u64): Likewise.
(vcvth_s16_f16): Likewise.
(vcvth_s32_f16): Likewise.
(vcvth_s64_f16): Likewise.
(vcvth_u16_f16): Likewise.
(vcvth_u32_f16): Likewise.
(vcvth_u64_f16): Likewise.
(vcvtah_s16_f16): Likewise.
(vcvtah_s32_f16): Likewise.
(vcvtah_s64_f16): Likewise.
(vcvtah_u16_f16): Likewise.
(vcvtah_u32_f16): Likewise.
(vcvtah_u64_f16): Likewise.
(vcvtmh_s16_f16): Likewise.
(vcvtmh_s32_f16): Likewise.
(vcvtmh_s64_f16): Likewise.
(vcvtmh_u16_f16): Likewise.
(vcvtmh_u32_f16): Likewise.
(vcvtmh_u64_f16): Likewise.
(vcvtnh_s16_f16): Likewise.
(vcvtnh_s32_f16): Likewise.
(vcvtnh_s64_f16): Likewise.
(vcvtnh_u16_f16): Likewise.
(vcvtnh_u32_f16): Likewise.
(vcvtnh_u64_f16): Likewise.
(vcvtph_s16_f16): Likewise.
(vcvtph_s32_f16): Likewise.
(vcvtph_s64_f16): Likewise.
(vcvtph_u16_f16): Likewise.
(vcvtph_u32_f16): Likewise.
(vcvtph_u64_f16): Likewise.
(vnegh_f16): Likewise.
(vrecpeh_f16): Likewise.
(vrecpxh_f16): Likewise.
(vrndh_f16): Likewise.
(vrndah_f16): Likewise.
(vrndih_f16): Likewise.
(vrndmh_f16): Likewise.
(vrndnh_f16): Likewise.
(vrndph_f16): Likewise.
(vrndxh_f16): Likewise.
(vrsqrteh_f16): Likewise.
(vsqrth_f16): Likewise.
>From f5f32c0867397594ae4e914acc69bc30d9b15ce9 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.w...@arm.com>
Date: Wed, 8 Jun 2016 10:31:40 +0100
Subject: [PATCH 07/14] [7/14] ARMv8.2 FP16 one operand scalar intrinsics

---
 gcc/config.gcc   |   2 +-
 gcc/config/aarch64/aarch64-builtins.c|   1 +
 gcc/config/aarch64/aarch64-simd-builtins.def |  54 +++-
 gcc/config/aarch64/aarch64-simd.md   |  42 ++-
 gcc/config/aarch64/aarch64.md|  52 ++--
 gcc/config/aarch64/arm_fp16.h| 365 +++
 gcc/config/aarch64/arm_neon.h|   2 +
 gcc/config/aarch64/iterators.md  |  32 ++-
 8 files changed, 495 insertions(+), 55 deletions(-)
 create mode 100644 gcc/config/aarch64/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e47535b..13fefee 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -307,7 +307,7 @@ m32c*-*-*)
 ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_neon.h arm_acle.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index af5fac5..ca91d91 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
 #define si_UPSImode
 #define sf_UPSFmode
 #define hi_UPHImode
+#define hf_UPHFmode
 #define qi_UPQImode
 #define UP(X) X##_UP
 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 64c5f86..6a74daa 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -274,6 +274,14 @@
   BUILTIN_VHSDF

[AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics

2016-07-07 Thread Jiong Wang

This patch add ARMv8.2-A FP16 lane vector intrinsics.

Lane intrinsics are generally derivatives of multiply intrinsics,
including multiply accumulate.  All necessary backend support for them
are there already except fmulx, the implementions are largely a
combination of existed multiply intrinsics with vdup intrinsics

2016-07-07  Jiong Wang <jiong.w...@arm.com>

gcc/
* config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_to_64v2df): 
Rename to

"*aarch64_mulx_elt_from_dup".
(*aarch64_mul3_elt): Update schedule type.
(*aarch64_mul3_elt_from_dup): Likewise.
(*aarch64_fma4_elt_from_dup): Likewise.
(*aarch64_fnma4_elt_from_dup): Likewise.
* config/aarch64/iterators.md (VMUL): Supprt half precision 
float modes.

(f, fp): Support HF modes.
* config/aarch64/arm_neon.h (vfma_lane_f16): New.
(vfmaq_lane_f16): Likewise.
(vfma_laneq_f16): Likewise.
(vfmaq_laneq_f16): Likewise.
(vfma_n_f16): Likewise.
(vfmaq_n_f16): Likewise.
(vfms_lane_f16): Likewise.
(vfmsq_lane_f16): Likewise.
(vfms_laneq_f16): Likewise.
(vfmsq_laneq_f16): Likewise.
(vfms_n_f16): Likewise.
(vfmsq_n_f16): Likewise.
(vmul_lane_f16): Likewise.
(vmulq_lane_f16): Likewise.
(vmul_laneq_f16): Likewise.
(vmulq_laneq_f16): Likewise.
(vmul_n_f16): Likewise.
(vmulq_n_f16): Likewise.
(vmulx_lane_f16): Likewise.
(vmulxq_lane_f16): Likewise.
(vmulx_laneq_f16): Likewise.
(vmulxq_laneq_f16): Likewise.


>From 25ed161255c4f0155f3c69c1ee4ec0e071ed115c Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.w...@arm.com>
Date: Wed, 8 Jun 2016 10:22:38 +0100
Subject: [PATCH 05/14] [5/14] ARMv8.2 FP16 lane vector intrinsics

---
 gcc/config/aarch64/aarch64-simd.md |  28 ---
 gcc/config/aarch64/arm_neon.h  | 154 +
 gcc/config/aarch64/iterators.md|   7 +-
 3 files changed, 173 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 576ad3c..c0600df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -351,7 +351,7 @@
 operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 return "mul\\t%0., %3., %1.[%2]";
   }
-  [(set_attr "type" "neon_mul__scalar")]
+  [(set_attr "type" "neon_mul__scalar")]
 )
 
 (define_insn "*aarch64_mul3_elt_"
@@ -379,7 +379,7 @@
   (match_operand:VMUL 2 "register_operand" "w")))]
   "TARGET_SIMD"
   "mul\t%0., %2., %1.[0]";
-  [(set_attr "type" "neon_mul__scalar")]
+  [(set_attr "type" "neon_mul__scalar")]
 )
 
 (define_insn "aarch64_rsqrte"
@@ -1579,7 +1579,7 @@
   (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
   "fmla\t%0., %2., %1.[0]"
-  [(set_attr "type" "neon_mla__scalar")]
+  [(set_attr "type" "neon_mla__scalar")]
 )
 
 (define_insn "*aarch64_fma4_elt_to_64v2df"
@@ -1657,7 +1657,7 @@
   (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
   "fmls\t%0., %2., %1.[0]"
-  [(set_attr "type" "neon_mla__scalar")]
+  [(set_attr "type" "neon_mla__scalar")]
 )
 
 (define_insn "*aarch64_fnma4_elt_to_64v2df"
@@ -3044,20 +3044,18 @@
   [(set_attr "type" "neon_fp_mul_")]
 )
 
-;; vmulxq_lane_f64
+;; vmulxq_lane
 
-(define_insn "*aarch64_mulx_elt_to_64v2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(unspec:V2DF
-	 [(match_operand:V2DF 1 "register_operand" "w")
-	  (vec_duplicate:V2DF
-	(match_operand:DF 2 "register_operand" "w"))]
+(define_insn "*aarch64_mulx_elt_from_dup"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF
+	 [(match_operand:VHSDF 1 "register_operand" "w")
+	  (vec_duplicate:VHSDF
+	(match_operand: 2 "register_operand" "w"))]
 	 UNSPEC_FMULX))]
   "TARGET_SIMD"
-  {
-return "fmulx\t%0.2d, %1.2d, %2.d[0]";
-  }
-  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+  "fmulx\t%0., %1., %2.[0]";
+  [(set_attr "type" "neon_mul__scalar")]
 )
 
 ;; vmulxs_lane_f32, vmulxs_laneq_f32
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ad5b6fa..b09a3a7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -26484,6 +26484,160 @@ vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
   return __builtin_aarch64_fnmav8hf

[AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics

2016-07-07 Thread Jiong Wang

This patch add ARMv8.2-A FP16 one operand vector intrinsics.

We introduced new mode iterators to cover HF modes, qualified patterns
which was using old mode iterators are switched to new ones.

We can't simply extend old iterator like VDQF to conver HF modes,
because not all patterns using VDQF are with new FP16 support, thus we
introduced new, temperary iterators, and only apply new iterators on
those patterns which do have FP16 supports.

gcc/
2016-07-07  Jiong Wang <jiong.w...@arm.com>

* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend 
to HF modes.

(neg2): Likewise.
(abs2): Likewise.
(2): Likewise.
(l2): Likewise.
(2): Likewise.
(2): Likewise.
(ftrunc2): Likewise.
(2): Likewise.
(sqrt2): Likwise.
(aarch64_frecpe): Likewise.
(aarch64_cm): Likewise.
* config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
(VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode 
attribute to HF modes.

(stype): New.
* config/aarch64/arm_neon.h (vdup_n_f16): New.
(vdupq_n_f16): Likewise.
(vld1_dup_f16): Use vdup_n_f16.
(vld1q_dup_f16): Use vdupq_n_f16.
(vabs_f16): New.
(vabsq_f16): Likewise.
(vceqz_f16): Likewise.
(vceqzq_f16): Likewise.
(vcgez_f16): Likewise.
(vcgezq_f16): Likewise.
(vcgtz_f16): Likewise.
(vcgtzq_f16): Likewise.
(vclez_f16): Likewise.
(vclezq_f16): Likewise.
(vcltz_f16): Likewise.
(vcltzq_f16): Likewise.
(vcvt_f16_s16): Likewise.
(vcvtq_f16_s16): Likewise.
(vcvt_f16_u16): Likewise.
(vcvtq_f16_u16): Likewise.
(vcvt_s16_f16): Likewise.
(vcvtq_s16_f16): Likewise.
(vcvt_u16_f16): Likewise.
(vcvtq_u16_f16): Likewise.
(vcvta_s16_f16): Likewise.
(vcvtaq_s16_f16): Likewise.
(vcvta_u16_f16): Likewise.
(vcvtaq_u16_f16): Likewise.
(vcvtm_s16_f16): Likewise.
(vcvtmq_s16_f16): Likewise.
(vcvtm_u16_f16): Likewise.
(vcvtmq_u16_f16): Likewise.
(vcvtn_s16_f16): Likewise.
(vcvtnq_s16_f16): Likewise.
(vcvtn_u16_f16): Likewise.
(vcvtnq_u16_f16): Likewise.
(vcvtp_s16_f16): Likewise.
(vcvtpq_s16_f16): Likewise.
(vcvtp_u16_f16): Likewise.
(vcvtpq_u16_f16): Likewise.
(vneg_f16): Likewise.
(vnegq_f16): Likewise.
(vrecpe_f16): Likewise.
(vrecpeq_f16): Likewise.
(vrnd_f16): Likewise.
(vrndq_f16): Likewise.
(vrnda_f16): Likewise.
(vrndaq_f16): Likewise.
(vrndi_f16): Likewise.
(vrndiq_f16): Likewise.
(vrndm_f16): Likewise.
(vrndmq_f16): Likewise.
(vrndn_f16): Likewise.
(vrndnq_f16): Likewise.
(vrndp_f16): Likewise.
(vrndpq_f16): Likewise.
(vrndx_f16): Likewise.
(vrndxq_f16): Likewise.
(vrsqrte_f16): Likewise.
(vrsqrteq_f16): Likewise.
(vsqrt_f16): Likewise.
(vsqrtq_f16): Likewise.
>From 3ab3e91e81aa1aa01894a07083e226779145ec88 Mon Sep 17 00:00:00 2001
From: Jiong Wang <jiong.w...@arm.com>
Date: Wed, 8 Jun 2016 09:30:16 +0100
Subject: [PATCH 02/14] [2/14] ARMv8.2 FP16 one operand vector intrinsics

---
 gcc/config/aarch64/aarch64-builtins.c|   4 +
 gcc/config/aarch64/aarch64-simd-builtins.def |  56 -
 gcc/config/aarch64/aarch64-simd.md   |  78 +++---
 gcc/config/aarch64/arm_neon.h| 361 ++-
 gcc/config/aarch64/iterators.md  |  37 ++-
 5 files changed, 478 insertions(+), 58 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b90b2a..af5fac5 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index df0a7d8..3e48046 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -42,7 +42,7 @@
   BUILTIN_VDC (COMBINE, combine, 0)
   BU

[AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics

2016-07-07 Thread Jiong Wang

This patch add ARMv8.2-A FP16 two operands vector intrinsics.

gcc/
2016-07-07  Jiong Wang <jiong.w...@arm.com>

* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md
(aarch64_rsqrts): Extend to HF modes.
(fabd3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_p): Likewise.
(3): Likewise.
(3): Likewise.
(3): Likewise.
(aarch64_faddp): Likewise.
(aarch64_fmulx): Likewise.
(aarch64_frecps): Likewise.
(*aarch64_fac): Rename to aarch64_fac.
(add3): Extend to HF modes.
(sub3): Likewise.
(mul3): Likewise.
(div3): Likewise.
* config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode
iterator.
* config/aarch64/arm_neon.h (vadd_f16): Likewise.
(vaddq_f16): Likewise.
(vabd_f16): Likewise.
(vabdq_f16): Likewise.
(vcage_f16): Likewise.
(vcageq_f16): Likewise.
(vcagt_f16): Likewise.
(vcagtq_f16): Likewise.
(vcale_f16): Likewise.
(vcaleq_f16): Likewise.
(vcalt_f16): Likewise.
(vcaltq_f16): Likewise.
(vceq_f16): Likewise.
(vceqq_f16): Likewise.
(vcge_f16): Likewise.
(vcgeq_f16): Likewise.
(vcgt_f16): Likewise.
(vcgtq_f16): Likewise.
(vcle_f16): Likewise.
(vcleq_f16): Likewise.
(vclt_f16): Likewise.
(vcltq_f16): Likewise.
(vcvt_n_f16_s16): Likewise.
(vcvtq_n_f16_s16): Likewise.
(vcvt_n_f16_u16): Likewise.
(vcvtq_n_f16_u16): Likewise.
(vcvt_n_s16_f16): Likewise.
(vcvtq_n_s16_f16): Likewise.
(vcvt_n_u16_f16): Likewise.
(vcvtq_n_u16_f16): Likewise.
(vdiv_f16): Likewise.
(vdivq_f16): Likewise.
(vdup_lane_f16): Likewise.
(vdup_laneq_f16): Likewise.
(vdupq_lane_f16): Likewise.
(vdupq_laneq_f16): Likewise.
(vdups_lane_f16): Likewise.
(vdups_laneq_f16): Likewise.
(vmax_f16): Likewise.
(vmaxq_f16): Likewise.
(vmaxnm_f16): Likewise.
(vmaxnmq_f16): Likewise.
(vmin_f16): Likewise.
(vminq_f16): Likewise.
(vminnm_f16): Likewise.
(vminnmq_f16): Likewise.
(vmul_f16): Likewise.
(vmulq_f16): Likewise.
(vmulx_f16): Likewise.
(vmulxq_f16): Likewise.
(vpadd_f16): Likewise.
(vpaddq_f16): Likewise.
(vpmax_f16): Likewise.
(vpmaxq_f16): Likewise.
(vpmaxnm_f16): Likewise.
(vpmaxnmq_f16): Likewise.
(vpmin_f16): Likewise.
(vpminq_f16): Likewise.
(vpminnm_f16): Likewise.
(vpminnmq_f16): Likewise.
(vrecps_f16): Likewise.
(vrecpsq_f16): Likewise.
(vrsqrts_f16): Likewise.
(vrsqrtsq_f16): Likewise.
(vsub_f16): Likewise.
(vsubq_f16): Likewise.

commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b
Author: Jiong Wang <jiong.w...@arm.com>
Date:   Wed Jun 8 10:10:28 2016 +0100

[3/14] ARMv8.2 FP16 two operands vector intrinsics

 gcc/config/aarch64/aarch64-simd-builtins.def |  40 +--
 gcc/config/aarch64/aarch64-simd.md   | 152 +--
 gcc/config/aarch64/arm_neon.h| 362 +++
 gcc/config/aarch64/iterators.md  |  10 +
 4 files changed, 473 insertions(+), 91 deletions(-)
commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b
Author: Jiong Wang <jiong.w...@arm.com>
Date:   Wed Jun 8 10:10:28 2016 +0100

[3/14] ARMv8.2 FP16 two operands vector intrinsics

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3e48046..fe17298 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,7 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
-  BUILTIN_VALLF (BINOP, fmulx, 0)
+  BUILTIN_VHSDF_SDF (BINOP, fmulx, 0)
   BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
@@ -248,22 +248,22 @@
   BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
-  BUILTIN_VDQF (BINOP, smax_nan, 3)
-  BUILTIN_VDQF (BINOP, smin_nan, 3)
+  BUILTIN_VHSDF (BINOP, smax_nan, 3)
+  BUILTIN_VHSDF (BINOP, smin_nan, 3)
 
   /* Implemented by 3.  */
-  BUILTIN_VDQF (BINOP, fmax, 3)
-  BUILTIN_VDQF (BINOP, fmin, 3)
+  BUILTIN_VHSDF (BINOP, fmax, 3)
+  BUILTIN_VHSDF (BINOP, fmin, 3)
 
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
   BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
-  BUILTIN_VDQF (BINOP, smaxp, 0)
-  BUILTIN_VDQF (BINOP, sminp, 0)
-  BUILTIN_VDQF (BINOP, smax_nanp, 0)
-  BUILTIN_VDQF (BINOP, smin_nanp, 0)
+  BUILTIN_VHSDF (BINOP, smaxp, 0)
+  BUILTIN_VHSDF (BINOP, sminp, 0

  1   2   3   4   5   >