Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-10 Thread Richard Henderson via Gcc-patches

On 8/10/23 02:50, Wilco Dijkstra wrote:

Hi Richard,


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


It doesn't indeed. However most cores support atomic 128-bit load/store
(part of LSE2), so we can still use the LSE2 ifunc for those cores. Since there
isn't a feature bit for this in the CPU or HWCAP, I check the CPUID register.


That would be a really nice bit to add to HWCAP, then, to consolidate this knowledge in 
one place.  Certainly I would use it in QEMU as well.



r~



Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/9/23 19:11, Richard Henderson wrote:

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr)    (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr)    (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+    return true;
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+    return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


r~



Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+return true;
+  if (!(hwcap & HWCAP_CPUID))
+return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


r~


[PATCH] MAINTAINERS: Update my email address.

2022-04-19 Thread Richard Henderson via Gcc-patches
2022-04-19  Richard Henderson  

* MAINTAINERS: Update my email address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 30f81b3dd52..15973503722 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -53,7 +53,7 @@ aarch64 port  Richard Earnshaw

 aarch64 port   Richard Sandiford   
 aarch64 port   Marcus Shawcroft
 aarch64 port   Kyrylo Tkachov  
-alpha port Richard Henderson   
+alpha port Richard Henderson   
 amdgcn portJulian Brown
 amdgcn portAndrew Stubbs   
 arc port   Joern Rennecke  
-- 
2.34.1



Re: [PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags

2020-04-09 Thread Richard Henderson
On 4/9/20 2:52 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Apr 02, 2020 at 11:53:47AM -0700, Richard Henderson wrote:
>> The rtl description of signed/unsigned overflow from subtract
>> was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
>> that indicate that only those particular bits are valid.
>>
>> However, it's not clear how to extend that description to
>> handle signed comparison, where N == V (GE) N != V (LT) are
>> the only valid bits.
>>
>> Using an UNSPEC means that we can unify all 3 usages without
>> fear that combine will try to infer anything from the rtl.
>> It also means we need far fewer variants when various inputs
>> have constants propagated in, and the rtl folds.
>>
>> Accept -1 for the second input by using ADCS.
> 
> If you use an unspec anyway, why do you need a separate UNSPEC_SBCS?
> It is just the same as UNSPEC_ADCS, with one of the inputs inverted?
> 
> Is there any reason to pretend borrows are different from carries?

Good point.  If we go this way, I'll make sure and merge them.
But I've also just sent v4 that does away with the unspecs and
uses the forms that Earnshaw used for config/arm.


r~


[PATCH v4 11/12] aarch64: Accept 0 as first argument to compares

2020-04-09 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fb1a39a3886..2b5a6eb510d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,r,r")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,r,r")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3902,14 +3902,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,r")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v4 12/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64-modes.def (CC_NV): New.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand
all of the comparisons for TImode, not just NE.
(aarch64_select_cc_mode): Recognize cmp_carryin.
(aarch64_get_condition_code_1): Handle CC_NVmode.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
(ccmp_iorne): New.
(cmp_carryin): New.
(*cmp_carryin): New.
(*cmp_carryin_z1): New.
(*cmp_carryin_z2): New.
(*cmp_carryin_m2, *ucmp_carryin_m2): New.
* config/aarch64/iterators.md (CC_EXTEND): New.
* config/aarch64/predicates.md (const_dword_umax): New.
---
 gcc/config/aarch64/aarch64.c | 164 ---
 gcc/config/aarch64/aarch64-modes.def |   1 +
 gcc/config/aarch64/aarch64.md| 113 ++
 gcc/config/aarch64/iterators.md  |   3 +
 gcc/config/aarch64/predicates.md |   9 ++
 5 files changed, 277 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 837ee6a5e37..6c825b341a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2731,32 +2731,143 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
+  rtx x_lo, x_hi, y_lo, y_hi, tmp;
+  struct expand_operand ops[2];
 
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+  x_lo = operand_subword (x, 0, 0, TImode);
+  x_hi = operand_subword (x, 1, 0, TImode);
 
-  rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+  if (CONST_SCALAR_INT_P (y))
+   {
+ wide_int y_wide = rtx_mode_t (y, TImode);
 
-  rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than the cmp+ccmp below.  Beware
+of the compare-and-swap post-reload split and use ccmp.  */
+ if (y_wide == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+ break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  */
+ if (wi::cmps(y_wide, wi::max_value (TImode, SIGNED)) < 0)
+   {
+ y = immed_wide_int_const (wi::add (y_wide, 1), TImode);
+ code = (code == LE ? LT : GE);
+   }
+ break;
+
+   case LEU:
+   case GTU:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  */
+ if (wi::cmpu(y_wide, wi::max_value (TImode, UNSIGNED)) < 0)
+   {
+ y = immed_wide_int_const (wi::add (y_wide, 1), TImode);
+ code = (code == LEU ? LTU : GEU);
+   }
+ break;
+
+   default:
+ break;
+   }
+   }
+
+  y_lo = simplify_gen_subreg (DImode, y, TImode,
+ subreg_lowpart_offset (DImode, TImode));
+  y_hi = simplify_gen_subreg (DImode, y, TImode,
+ subreg_highpart_offset (DImode, TImode));
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   case LTU:
+   case GEU:
+   case LT:
+   case GE:
+ /* If the low word of y is 0, then this is simply a normal
+compare of the upper words.  */
+ if (y_lo == const0_rtx)
+   {
+ if (!aarch64_plus_operand (y_hi, DImode))
+   y_hi = force_reg (DImode, y_hi);
+ return aarch64_gen_compare_reg (code, x_hi, y_hi);
+   }
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required.  */
+  create_input_operand ([0], x_lo, 

[PATCH v4 07/12] aarch64: Rename CC_ADCmode to CC_NOTCmode

2020-04-09 Thread Richard Henderson via Gcc-patches
We are about to use !C in more contexts than add-with-carry.
Choose a more generic name.

* config/aarch64/aarch64-modes.def (CC_NOTC): Rename CC_ADC.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Update.
(aarch64_get_condition_code_1): Likewise.
* config/aarch64/aarch64.md (addvti4): Likewise.
(add3_carryinC): Likewise.
(*add3_carryinC_zero): Likewise.
(*add3_carryinC): Likewise.
---
 gcc/config/aarch64/aarch64.c |  4 ++--
 gcc/config/aarch64/aarch64-modes.def |  5 +++--
 gcc/config/aarch64/aarch64.md| 14 +++---
 gcc/config/aarch64/predicates.md |  4 ++--
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cd4dc1ef6f9..c09b7bcb7f0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9530,7 +9530,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && code_x == PLUS
   && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
   && const_dword_umaxp1 (y, mode_x))
-return CC_ADCmode;
+return CC_NOTCmode;
 
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
@@ -9663,7 +9663,7 @@ aarch64_get_condition_code_1 (machine_mode mode, enum 
rtx_code comp_code)
}
   break;
 
-case E_CC_ADCmode:
+case E_CC_NOTCmode:
   switch (comp_code)
{
case GEU: return AARCH64_CS;
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index af972e8f72b..181b7b30dcd 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -29,7 +29,7 @@
CCmode is used for 'normal' compare (subtraction) operations.  For
ADC, the representation becomes more complex still, since we cannot
use the normal idiom of comparing the result to one of the input
-   operands; instead we use CC_ADCmode to represent this case.  */
+   operands; instead we use CC_NOTCmode to represent this case.  */
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
 CC_MODE (CC_SWP);
@@ -38,7 +38,8 @@ CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition 
flags are valid.
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* C represents unsigned overflow of a simple addition.  */
-CC_MODE (CC_ADC);   /* Unsigned overflow from an ADC (add with carry).  */
+CC_MODE (CC_NOTC);  /* !C represents unsigned overflow of subtraction,
+   as well as our representation of add-with-carry.  */
 CC_MODE (CC_V); /* Only V bit of condition flags is valid.  */
 
 /* Half-precision floating point for __fp16.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d51f6146c43..7d4a63f9a2a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2077,7 +2077,7 @@
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_carryinC);
-  aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (GEU, CC_NOTCmode, operands[3]);
   DONE;
 })
 
@@ -2580,7 +2580,7 @@
 (define_expand "add3_carryinC"
   [(parallel
  [(set (match_dup 3)
-  (compare:CC_ADC
+  (compare:CC_NOTC
 (plus:
   (plus:
 (match_dup 4)
@@ -2595,7 +2595,7 @@
 (match_dup 2)))])]
""
 {
-  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
+  operands[3] = gen_rtx_REG (CC_NOTCmode, CC_REGNUM);
   rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
   operands[4] = gen_rtx_LTU (mode, ccin, const0_rtx);
   operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
@@ -2605,8 +2605,8 @@
 })
 
 (define_insn "*add3_carryinC_zero"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (plus:
(match_operand: 2 "aarch64_carry_operation" "")
(zero_extend: (match_operand:GPI 1 "register_operand" "r")))
@@ -2620,8 +2620,8 @@
 )
 
 (define_insn "*add3_carryinC"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (plus:
(plus:
  (match_operand: 3 "aarch64_carry_operation" "")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 99c3bfbace4..e3572d2f60d 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -390,7 +390,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == LTU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CC_NOTCmode || ccmode == CCmode)
 return GET_CODE (op) == GEU;
   return false;
 })
@@ -408,7 +408,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return 

[PATCH v4 10/12] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-09 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d80afc36889..837ee6a5e37 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2726,7 +2726,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2757,7 +2757,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2783,7 +2783,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18980,7 +18980,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18991,7 +18992,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -19003,13 +19005,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -19084,10 +19086,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -19100,8 +19100,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, 

[PATCH v4 01/12] aarch64: Provide expander for sub3_compare1

2020-04-09 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the
more specific sub3_compare1_imm, and miss this special case
in other places.  Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
(usubv4): Use aarch64_plus_operand for operand2.
* config/aarch64/aarch64.c (aarch64_expand_subvti): Remove
call to gen_subdi3_compare1_imm.
---
 gcc/config/aarch64/aarch64.c  | 11 ++-
 gcc/config/aarch64/aarch64.md | 24 +---
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4af562a81ea..ce306a10de6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20797,16 +20797,9 @@ aarch64_expand_subvti (rtx op0, rtx low_dest, rtx 
low_in1,
 }
   else
 {
-  if (aarch64_plus_immediate (low_in2, DImode))
-   emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-   GEN_INT (-INTVAL (low_in2;
-  else
-   {
- low_in2 = force_reg (DImode, low_in2);
- emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-   }
-  high_in2 = force_reg (DImode, high_in2);
+  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
 
+  high_in2 = force_reg (DImode, high_in2);
   if (unsigned_p)
emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
   else
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..728c63bd8d6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2966,13 +2966,12 @@
 (define_expand "usubv4"
   [(match_operand:GPI 0 "register_operand")
(match_operand:GPI 1 "aarch64_reg_or_zero")
-   (match_operand:GPI 2 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_plus_operand")
(label_ref (match_operand 3 "" ""))]
   ""
 {
   emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2]));
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
-
   DONE;
 })
 
@@ -3119,7 +3118,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3131,6 +3130,25 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_plus_operand")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (CONST_SCALAR_INT_P (operands[2]))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



[PATCH v4 09/12] aarch64: Use CC_NOTCmode for double-word subtract

2020-04-09 Thread Richard Henderson via Gcc-patches
We have been using CCmode, which is not correct for this case.
Mirror the same code from the arm target.

* config/aarch64/aarch64.c (aarch64_select_cc_mode):
Recognize usub*_carryinC patterns.
* config/aarch64/aarch64.md (usubvti4): Use CC_NOTC.
(usub3_carryinC): Likewise.
(*usub3_carryinC_z1): Likewise.
(*usub3_carryinC_z2): Likewise.
(*usub3_carryinC): Likewise.
---
 gcc/config/aarch64/aarch64.c  |  9 +
 gcc/config/aarch64/aarch64.md | 18 +-
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c09b7bcb7f0..d80afc36889 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9532,6 +9532,15 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && const_dword_umaxp1 (y, mode_x))
 return CC_NOTCmode;
 
+  /* A test for unsigned overflow from a subtract with borrow.  */
+  if ((mode_x == DImode || mode_x == TImode)
+  && (code == GEU || code == LTU)
+  && code_x == ZERO_EXTEND
+  && ((GET_CODE (y) == PLUS
+  && aarch64_borrow_operation (XEXP (y, 0), mode_x))
+ || aarch64_borrow_operation (y, mode_x)))
+return CC_NOTCmode;
+
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
   && (code == NE || code == EQ)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 7d4a63f9a2a..a0a872c6d94 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2954,7 +2954,7 @@
   CODE_FOR_subdi3_compare1,
   CODE_FOR_subdi3_compare1,
   CODE_FOR_usubdi3_carryinC);
-  aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (LTU, CC_NOTCmode, operands[3]);
   DONE;
 })
 
@@ -3367,8 +3367,8 @@
 
 (define_expand "usub3_carryinC"
   [(parallel
- [(set (reg:CC CC_REGNUM)
-  (compare:CC
+ [(set (reg:CC_NOTC CC_REGNUM)
+  (compare:CC_NOTC
 (zero_extend:
   (match_operand:GPI 1 "aarch64_reg_or_zero"))
 (plus:
@@ -3383,8 +3383,8 @@
 )
 
 (define_insn "*usub3_carryinC_z1"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (const_int 0)
  (plus:
(zero_extend:
@@ -3400,8 +3400,8 @@
 )
 
 (define_insn "*usub3_carryinC_z2"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (zero_extend:
(match_operand:GPI 1 "register_operand" "r"))
  (match_operand: 2 "aarch64_borrow_operation" "")))
@@ -3415,8 +3415,8 @@
 )
 
 (define_insn "*usub3_carryinC"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
+  [(set (reg:CC_NOTC CC_REGNUM)
+   (compare:CC_NOTC
  (zero_extend:
(match_operand:GPI 1 "register_operand" "r"))
  (plus:
-- 
2.20.1



[PATCH v4 06/12] aarch64: Introduce aarch64_expand_addsubti

2020-04-09 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +--
 gcc/config/aarch64/aarch64.c| 129 +---
 gcc/config/aarch64/aarch64.md   | 125 ++-
 3 files changed, 67 insertions(+), 197 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0..787d67d62e0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 36e9ebb468a..cd4dc1ef6f9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20706,110 +20706,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   

[PATCH v4 08/12] arm: Merge CC_ADC and CC_B to CC_NOTC

2020-04-09 Thread Richard Henderson via Gcc-patches
These CC_MODEs are identical, merge them into a more generic name.

* config/arm/arm-modes.def (CC_NOTC): New.
(CC_ADC, CC_B): Remove.
* config/arm/arm.c (arm_select_cc_mode): Update to match.
(arm_gen_dicompare_reg): Likewise.
(maybe_get_arm_condition_code): Likewise.
* config/arm/arm.md (uaddvdi4): Likewise.
(addsi3_cin_cout_reg, addsi3_cin_cout_imm): Likewise.
(*addsi3_cin_cout_reg_insn): Likewise.
(*addsi3_cin_cout_imm_insn): Likewise.
(addsi3_cin_cout_0, *addsi3_cin_cout_0_insn): Likewise.
(usubvsi3_borrow, usubvsi3_borrow_imm): Likewise.
---
 gcc/config/arm/arm.c | 30 +++---
 gcc/config/arm/arm-modes.def | 12 
 gcc/config/arm/arm.md| 36 ++--
 gcc/config/arm/iterators.md  |  2 +-
 gcc/config/arm/predicates.md |  4 ++--
 5 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c38776fdad7..145345c2278 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15669,7 +15669,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && CONST_INT_P (y)
   && UINTVAL (y) == 0x8
   && (op == GEU || op == LTU))
-return CC_ADCmode;
+return CC_NOTCmode;
 
   if (GET_MODE (x) == DImode
   && (op == GE || op == LT)
@@ -15685,7 +15685,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && ((GET_CODE (y) == PLUS
   && arm_borrow_operation (XEXP (y, 0), DImode))
  || arm_borrow_operation (y, DImode)))
-return CC_Bmode;
+return CC_NOTCmode;
 
   if (GET_MODE (x) == DImode
   && (op == EQ || op == NE)
@@ -15879,18 +15879,18 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, 
rtx scratch)
 
rtx_insn *insn;
if (y_hi == const0_rtx)
- insn = emit_insn (gen_cmpsi3_0_carryin_CC_Bout (scratch, x_hi,
- cmp1));
+ insn = emit_insn (gen_cmpsi3_0_carryin_CC_NOTCout
+   (scratch, x_hi, cmp1));
else if (CONST_INT_P (y_hi))
  {
/* Constant is viewed as unsigned when zero-extended.  */
y_hi = GEN_INT (UINTVAL (y_hi) & 0xULL);
-   insn = emit_insn (gen_cmpsi3_imm_carryin_CC_Bout (scratch, x_hi,
- y_hi, cmp1));
+   insn = emit_insn (gen_cmpsi3_imm_carryin_CC_NOTCout
+ (scratch, x_hi, y_hi, cmp1));
  }
else
- insn = emit_insn (gen_cmpsi3_carryin_CC_Bout (scratch, x_hi, y_hi,
-   cmp1));
+ insn = emit_insn (gen_cmpsi3_carryin_CC_NOTCout
+   (scratch, x_hi, y_hi, cmp1));
return SET_DEST (single_set (insn));
   }
 
@@ -15911,8 +15911,8 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx 
scratch)
 arm_gen_compare_reg (LTU, y_lo, x_lo, scratch),
 const0_rtx);
y_hi = GEN_INT (0x & UINTVAL (y_hi));
-   rtx_insn *insn = emit_insn (gen_rscsi3_CC_Bout_scratch (scratch, y_hi,
-   x_hi, cmp1));
+   rtx_insn *insn = emit_insn (gen_rscsi3_CC_NOTCout_scratch
+   (scratch, y_hi, x_hi, cmp1));
return SET_DEST (single_set (insn));
   }
 
@@ -24511,7 +24511,7 @@ maybe_get_arm_condition_code (rtx comparison)
default: return ARM_NV;
}
 
-case E_CC_Bmode:
+case E_CC_NOTCmode:
   switch (comp_code)
{
case GEU: return ARM_CS;
@@ -24527,14 +24527,6 @@ maybe_get_arm_condition_code (rtx comparison)
default: return ARM_NV;
}
 
-case E_CC_ADCmode:
-  switch (comp_code)
-   {
-   case GEU: return ARM_CS;
-   case LTU: return ARM_CC;
-   default: return ARM_NV;
-   }
-
 case E_CCmode:
 case E_CC_RSBmode:
   switch (comp_code)
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 6e48223b63d..2495054e066 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -33,18 +33,15 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == 
ARM_FP16_FORMAT_ALTERNATIVE)
CC_Zmode should be used if only the Z flag is set correctly
CC_Cmode should be used if only the C flag is set correctly, after an
  addition.
+   CC_NOTCmode is the inverse of the C flag, after subtraction (borrow),
+ or for ADC where we cannot use the trick of comparing the sum
+ against one of the other operands.
CC_Nmode should be used if only the N (sign) flag is set correctly
CC_NVmode should be used if only the N and V bits are set correctly,
  (used for signed comparisons when the carry is propagated in).
CC_RSBmode should be used where the comparison is set by an 

[PATCH v4 05/12] aarch64: Improvements to aarch64_select_cc_mode from arm

2020-04-09 Thread Richard Henderson via Gcc-patches
The arm target has some improvements over aarch64 for
double-word arithmetic and comparisons.

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Check
for swapped operands to CC_Cmode; check for zero_extend to
CC_ADCmode; check for swapped operands to CC_Vmode.
---
 gcc/config/aarch64/aarch64.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f2c14818c79..36e9ebb468a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9521,21 +9521,25 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
-  && rtx_equal_p (XEXP (x, 0), y))
+  && (rtx_equal_p (XEXP (x, 0), y) || rtx_equal_p (XEXP (x, 1), y)))
 return CC_Cmode;
 
   /* A test for unsigned overflow from an add with carry.  */
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
+  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
   && const_dword_umaxp1 (y, mode_x))
 return CC_ADCmode;
 
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
-  && code == NE
-  && code_x == PLUS
-  && GET_CODE (y) == SIGN_EXTEND)
+  && (code == NE || code == EQ)
+  && (code_x == PLUS || code_x == MINUS)
+  && (GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+  || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
+  && GET_CODE (y) == SIGN_EXTEND
+  && GET_CODE (XEXP (y, 0)) == GET_CODE (x))
 return CC_Vmode;
 
   /* For everything else, return CCmode.  */
-- 
2.20.1



[PATCH v4 03/12] aarch64: Add cset, csetm, cinc patterns for carry/borrow

2020-04-09 Thread Richard Henderson via Gcc-patches
Some implementations have a higher cost for the csel insn
(and its specializations) than they do for adc/sbc.

* config/aarch64/aarch64.md (*cstore_carry): New.
(*cstoresi_carry_uxtw): New.
(*cstore_borrow): New.
(*cstoresi_borrow_uxtw): New.
(*csinc2_carry): New.
---
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |  3 +-
 gcc/config/aarch64/aarch64.md | 51 ++-
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
index 49901e59c38..b6c21fee306 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
@@ -21,7 +21,8 @@ void f(char *out)
 
 /* { dg-final { scan-assembler "cset.*, ne" } } */
 /* { dg-final { scan-assembler "cset.*, eq" } } */
-/* { dg-final { scan-assembler "cset.*, cs" } } */
+/* { dg-final { scan-assembler-not "cset.*, cs" } } */
+/* { dg-final { scan-assembler "adc.*, .zr, .zr" } } */
 /* { dg-final { scan-assembler "cset.*, cc" } } */
 /* { dg-final { scan-assembler "cset.*, mi" } } */
 /* { dg-final { scan-assembler "cset.*, pl" } } */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e65f46f0f74..d266a1edd64 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4086,6 +4086,15 @@
   "
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than adc.
+(define_insn "*cstore_carry"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_carry_operation"))]
+  ""
+  "adc\\t%0, zr, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "aarch64_cstore"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
(match_operator:ALLI 1 "aarch64_comparison_operator_mode"
@@ -4130,7 +4139,16 @@
   [(set_attr "type" "csel")]
 )
 
-;; zero_extend version of the above
+;; zero_extend versions of the above
+
+(define_insn "*cstoresi_carry_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI (match_operand:SI 1 "aarch64_carry_operation")))]
+  ""
+  "adc\\t%w0, wzr, wzr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
@@ -4141,6 +4159,15 @@
   [(set_attr "type" "csel")]
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than sbc.
+(define_insn "*cstore_borrow"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (neg:ALLI (match_operand:ALLI 1 "aarch64_borrow_operation")))]
+  ""
+  "sbc\\t%0, zr, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "cstore_neg"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
(neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator_mode"
@@ -4150,7 +4177,17 @@
   [(set_attr "type" "csel")]
 )
 
-;; zero_extend version of the above
+;; zero_extend versions of the above
+
+(define_insn "*cstoresi_borrow_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
+ (neg:SI (match_operand:SI 1 "aarch64_borrow_operation"]
+  ""
+  "sbc\\t%w0, wzr, wzr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*cstoresi_neg_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
@@ -4353,6 +4390,16 @@
   [(set_attr "type" "crc")]
 )
 
+;; On some implementations (e.g. tx1) csel is more expensive than adc.
+(define_insn "*csinc2_carry"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+   (plus:GPI (match_operand 2 "aarch64_carry_operation")
+  (match_operand:GPI 1 "register_operand" "r")))]
+  ""
+  "adc\\t%0, %1, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*csinc2_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
-- 
2.20.1



[PATCH v4 04/12] aarch64: Add const_dword_umaxp1

2020-04-09 Thread Richard Henderson via Gcc-patches
Rather than duplicating the rather verbose integral test,
pull it out to a predicate.

* config/aarch64/predicates.md (const_dword_umaxp1): New.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Use it.
* config/aarch64/aarch64.md (add*add3_carryinC): Likewise.
(*add3_carryinC_zero): Likewise.
(add3_carryinC): Use mode for constant, not TImode.
---
 gcc/config/aarch64/aarch64.c |  5 +
 gcc/config/aarch64/aarch64.md| 16 +++-
 gcc/config/aarch64/predicates.md |  9 +
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce306a10de6..f2c14818c79 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9528,10 +9528,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   if ((mode_x == DImode || mode_x == TImode)
   && (code == LTU || code == GEU)
   && code_x == PLUS
-  && CONST_SCALAR_INT_P (y)
-  && (rtx_mode_t (y, mode_x)
- == (wi::shwi (1, mode_x)
- << (GET_MODE_BITSIZE (mode_x).to_constant () / 2
+  && const_dword_umaxp1 (y, mode_x))
 return CC_ADCmode;
 
   /* A test for signed overflow.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d266a1edd64..6b21cc9c61b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2659,7 +2659,7 @@
   operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
   operands[6] = immed_wide_int_const (wi::shwi (1, mode)
  << GET_MODE_BITSIZE (mode),
- TImode);
+ mode);
 })
 
 (define_insn "*add3_carryinC_zero"
@@ -2668,13 +2668,12 @@
  (plus:
(match_operand: 2 "aarch64_carry_operation" "")
(zero_extend: (match_operand:GPI 1 "register_operand" "r")))
- (match_operand 4 "const_scalar_int_operand" "")))
+ (match_operand: 4 "const_dword_umaxp1" "")))
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (match_operand:GPI 3 "aarch64_carry_operation" "")
  (match_dup 1)))]
-  "rtx_mode_t (operands[4], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, zr"
+  ""
+  "adcs\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
@@ -2686,15 +2685,14 @@
  (match_operand: 3 "aarch64_carry_operation" "")
  (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
(zero_extend: (match_operand:GPI 2 "register_operand" "r")))
- (match_operand 5 "const_scalar_int_operand" "")))
+ (match_operand: 5 "const_dword_umaxp1" "")))
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI (match_operand:GPI 4 "aarch64_carry_operation" "")
(match_dup 1))
  (match_dup 2)))]
-  "rtx_mode_t (operands[5], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, %2"
+  ""
+  "adcs\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..99c3bfbace4 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -46,6 +46,15 @@
   return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3);
 })
 
+;; True for 1 << (GET_MODE_BITSIZE (mode) / 2)
+;; I.e UINT_MAX + 1 for a given mode, in the double-word mode.
+(define_predicate "const_dword_umaxp1"
+  (match_code "const_int,const_wide_int")
+{
+  unsigned bits = GET_MODE_BITSIZE (mode).to_constant () / 2;
+  return rtx_mode_t (op, mode) == (wi::shwi (1, mode) << bits);
+})
+
 (define_predicate "subreg_lowpart_operator"
   (ior (match_code "truncate")
(and (match_code "subreg")
-- 
2.20.1



[PATCH v4 00/12] aarch64: Implement TImode comparisons

2020-04-09 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

In v4, I attempt to bring over as many patterns from config/arm
as are applicable.  It's not too far away from what I had from v2.

In the process of checking all of the combinations (below), I
discovered that we could probably have a better represenation
for ccmp.  One that the optimizers can actually do something with,
rather than the if_then_else+unspec combo that we have now.

A special case of that is in the last patch: ccmp_iorne.  I think
it should be possible to come up with some sort of logical combo
that would apply to all cases, but haven't put enough thought
into the problem.


r~


Richard Henderson (12):
  aarch64: Provide expander for sub3_compare1
  aarch64: Match add3_carryin expander and insn
  aarch64: Add cset, csetm, cinc patterns for carry/borrow
  aarch64: Add const_dword_umaxp1
  aarch64: Improvements to aarch64_select_cc_mode from arm
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Rename CC_ADCmode to CC_NOTCmode
  arm: Merge CC_ADC and CC_B to CC_NOTC
  aarch64: Use CC_NOTCmode for double-word subtract
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Accept 0 as first argument to compares
  aarch64: Implement TImode comparisons

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 356 -
 gcc/config/arm/arm.c  |  30 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |   3 +-
 gcc/config/aarch64/aarch64-modes.def  |   6 +-
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 473 +++---
 gcc/config/aarch64/iterators.md   |   3 +
 gcc/config/aarch64/predicates.md  |  22 +-
 gcc/config/arm/arm-modes.def  |  12 +-
 gcc/config/arm/arm.md |  36 +-
 gcc/config/arm/iterators.md   |   2 +-
 gcc/config/arm/predicates.md  |   4 +-
 14 files changed, 580 insertions(+), 400 deletions(-)

---

typedef signed long long s64;
typedef unsigned long long u64;
typedef __uint128_t u128;
typedef __int128_t s128;

#define i128(hi,lo) (((u128)(hi) << 64) | (u64)(lo))

int eq(u128 a, u128 b)  { return a == b; }
int ne(u128 a, u128 b)  { return a != b; }
int ltu(u128 a, u128 b) { return a < b; }
int geu(u128 a, u128 b) { return a >= b; }
int leu(u128 a, u128 b) { return a <= b; }
int gtu(u128 a, u128 b) { return a > b; }
int lt(s128 a, s128 b) { return a < b; }
int ge(s128 a, s128 b) { return a >= b; }
int le(s128 a, s128 b) { return a <= b; }
int gt(s128 a, s128 b) { return a > b; }

int eqS(u128 a, u64 b)  { return a == b; }
int neS(u128 a, u64 b)  { return a != b; }
int ltuS(u128 a, u64 b) { return a < b; }
int geuS(u128 a, u64 b) { return a >= b; }
int leuS(u128 a, u64 b) { return a <= b; }
int gtuS(u128 a, u64 b) { return a > b; }
int ltS(s128 a, s64 b) { return a < b; }
int geS(s128 a, s64 b) { return a >= b; }
int leS(s128 a, s64 b) { return a <= b; }
int gtS(s128 a, s64 b) { return a > b; }

int eqSH(u128 a, u64 b)  { return a == (u128)b << 64; }
int neSH(u128 a, u64 b)  { return a != (u128)b << 64; }
int ltuSH(u128 a, u64 b) { return a < (u128)b << 64; }
int geuSH(u128 a, u64 b) { return a >= (u128)b << 64; }
int leuSH(u128 a, u64 b) { return a <= (u128)b << 64; }
int gtuSH(u128 a, u64 b) { return a > (u128)b << 64; }
int ltSH(s128 a, s64 b) { return a < (s128)b << 64; }
int geSH(s128 a, s64 b) { return a >= (s128)b << 64; }
int leSH(s128 a, s64 b) { return a <= (s128)b << 64; }
int gtSH(s128 a, s64 b) { return a > (s128)b << 64; }

int eqFFHS(u128 a, u64 b)  { return a == i128(-1,b); }
int neFFHS(u128 a, u64 b)  { return a != i128(-1,b); }
int ltuFFHS(u128 a, u64 b) { return a < i128(-1,b); }
int geuFFHS(u128 a, u64 b) { return a >= i128(-1,b); }
int leuFFHS(u128 a, u64 b) { return a <= i128(-1,b); }
int gtuFFHS(u128 a, u64 b) { return a > i128(-1,b); }
int ltFFHS(s128 a, s64 b) { return a < (s128)i128(-1,b); }
int geFFHS(s128 a, s64 b) { return a >= (s128)i128(-1,b); }
int leFFHS(s128 a, s64 b) { return a <= (s128)i128(-1,b); }
int gtFFHS(s128 a, s64 b) { return a > (s128)i128(-1,b); }

int eq0(u128 a) { return a == 0; }
int ne0(u128 a) { return a != 0; }
int ltu0(u128 a) { return a < 0; }
int geu0(u128 a) { return a >= 0; }
int leu0(u128 a) { return a <= 0; }
int gtu0(u128 a) { return a > 0; }
int lt0(s128 a) { return a < 0; }
int ge0(s128 a) { return a >= 0; }
int le0(s128 a) { return a <= 0; }
int gt0(s128 a) { return a > 0; }

int eq1(u128 a) { return a == 1; }
int ne1(u128 a) { return a != 1; }
int ltu1(u128 a) { return a < 1; }
int geu1(u128 a) { return a >= 1; }
int leu1(u128 a) { return a <= 1; }
int gtu1(u128 a) { return a &

[PATCH v4 02/12] aarch64: Match add3_carryin expander and insn

2020-04-09 Thread Richard Henderson via Gcc-patches
The expander and insn predicates do not match,
which can lead to insn recognition errors.

* config/aarch64/aarch64.md (add3_carryin):
Use register_operand instead of aarch64_reg_or_zero.
---
 gcc/config/aarch64/aarch64.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 728c63bd8d6..e65f46f0f74 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2600,8 +2600,8 @@
(plus:GPI
  (plus:GPI
(ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
-   (match_operand:GPI 1 "aarch64_reg_or_zero"))
- (match_operand:GPI 2 "aarch64_reg_or_zero")))]
+   (match_operand:GPI 1 "register_operand"))
+ (match_operand:GPI 2 "register_operand")))]
""
""
 )
-- 
2.20.1



Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 4:58 PM, Segher Boessenkool wrote:
>> I wonder if it would be helpful to have
>>
>>   (uoverflow_plus x y carry)
>>   (soverflow_plus x y carry)
>>
>> etc.
> 
> Those have three operands, which is nasty to express.

How so?  It's a perfectly natural operation.

> On rs6000 we have the carry bit as a separate register (it really is
> only one bit, XER[CA], but in GCC we model it as a separate register).
> We handle it as a fixed register (there is only one, and saving and
> restoring it is relatively expensive, so this worked out the best).

As for most platforms, more or less.

> Still, in the patterns (for insns like "adde") that take both a carry
> input and have it as output, the expression for the carry output but
> already the one for the GPR output become so unwieldy that nothing
> can properly work with it.  So, in the end, I have all such insns that
> take a carry input just clobber their carry output.  This works great!

Sure, right up until the point when you want to actually *use* that carry
output.  Which is exactly what we're talking about here.

> Expressing the carry setting for insns that do not take a carry in is
> much easier.  You get somewhat different patterns for various
> immediate inputs, but that is all.

It's not horrible, but it's certainly verbose.  If we add a shorthand for that
common operation, so much the better.

I would not expect optimizers to take a collection of inputs and introduce this
rtx code, but only operate with it when the backend emits it.

>> This does have the advantage of avoiding the extensions, so that constants 
>> can
>> be retained in the original mode.
> 
> But it won't ever survive simplification; or, it will be in the way of
> simplification.

How so?

It's clear that

  (set (reg:CC_C flags)
   (uoverflow_plus:CC_C
 (reg:SI x)
 (const_int 0)
 (const_int 0)))

cannot overflow.  Thus this expression as a whole would, in combination with
the user of the CC_MODE, e.g.

  (set (reg:SI y) (ne:SI (reg:CC_C flags) (const_int 0))

fold to

  (set (reg:SI y) (ne:SI (const_int 0) (const_int 0))
to
  (set (reg:SI y) (const_int 0))

just like any other (compare) + (condition) pair.

I don't see why this new rtx code is any more difficult than ones that we have
already.

>> Though of course if we go this way, there will be incentive to add
>> overflow codes for all __builtin_*_overflow_p.
> 
> Yeah, eww.  And where will it stop?  What muladd insns should we have
> special RTL codes for, for the high part?

Well, we don't have overflow builtins for muladd yet.  Only plus, minus, and
mul.  Only x86 and s390x have insns to support overflow from mul without also
computing the highpart.

But add/sub-with-carry are *very* common operations.  As are add/sub-with-carry
with signed overflow into flags.  It would be nice to make that as simple as
possible across all targets.


r~


Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson
On 4/7/20 1:27 PM, Segher Boessenkool wrote:
> On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
>> The reason I'm not keen on using special modes for this case is that
>> they'd describe one way in which the result can be used rather than
>> describing what the instruction actually does.  The instruction really
>> does set all four flags to useful values.  The "problem" is that they're
>> not the values associated with a compare between two values, so representing
>> them that way will always lose information.
> 
> CC modes describe how the flags are set, not how the flags are used.
> You cannot easily describe the V bit setting with a compare (it needs
> a mode bigger than the register), is that what you mean?

I think that is a good deal of the effort.

I wonder if it would be helpful to have

  (uoverflow_plus x y carry)
  (soverflow_plus x y carry)

etc.

(define_insn "uaddsi3_cout"
  [(set (reg:CC_C CC_REGNUM)
(uoverflow_plus:CC_C
  (match_operand:SI 1 "register_operand")
  (match_operand:SI 2 "plus_operand")
  (const_int 0)))
(set (match_operand:SI 0 "register_operand")
 (plus:SI (match_dup 1) (match_dup 2)))]
  ...
)

(define_insn "uaddsi4_cin_cout"
  [(set (reg:CC_C CC_REGNUM)
(uoverflow_plus:CC_C
  (match_operand:SI 1 "register_operand")
  (match_operand:SI 2 "reg_or_zero_operand")
  (match_operand:SI 3 "carry_operand")))
(set (match_operand:SI 0 "register_operand")
 (plus:SI
   (plus:SI (match_dup 3) (match_dup 1))
   (match_dup 2)))]
  ...
)

(define_insn "usubsi4_cin_cout"
  [(set (reg:CC_C CC_REGNUM)
(uoverflow_plus:CC_C
  (match_operand:SI 1 "register_operand")
  (not:SI (match_operand:SI 2 "reg_or_zero_operand"))
  (match_operand:SI 3 "carry_operand")))
(set (match_operand:SI 0 "register_operand")
 (minus:SI
   (minus:SI (match_dup 1) (match_dup 2))
   (match_operand:SI 4 "borrow_operand")))]
  ...
)

This does have the advantage of avoiding the extensions, so that constants can
be retained in the original mode.

Though of course if we go this way, there will be incentive to add
overflow codes for all __builtin_*_overflow_p.


r~


Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-07 Thread Richard Henderson via Gcc-patches
On 4/7/20 9:32 AM, Richard Sandiford wrote:
> It's not really reversibility that I'm after (at least not for its
> own sake).
> 
> If we had a three-input compare_cc rtx_code that described a comparison
> involving a carry input, we'd certainly be using it here, because that's
> what the instruction does.  Given that we don't have the rtx_code, three
> obvious choices are:
> 
> (1) Add it.
> 
> (2) Continue to represent what the instruction does using an unspec.
> 
> (3) Don't try to represent the "three-input compare_cc" operation and
> instead describe a two-input comparison that only yields a valid
> result for a subset of tests.
> 
> (1) seems like the best technical solution but would probably be
> a lot of work.  I guess the reason I like (2) is that it stays
> closest to (1).

Indeed, the biggest problem that I'm having with copying the arm solution to
aarch64 is the special cases of the constants.

The first problem is that (any_extend:M1 (match_operand:M2)) is invalid rtl for
a constant, so you can't share the same define_insn to handle both register and
immediate input.

The second problem is how unpredictable the canonical rtl of an expression can
be after constant folding.  Which again requires more and more define_insns.
Even the Arm target gets this wrong.  In particular,

> (define_insn "cmpsi3_carryin_out"
>   [(set (reg: CC_REGNUM)
> (compare:
>  (SE:DI (match_operand:SI 1 "s_register_operand" "0,r"))
>  (plus:DI (match_operand:DI 3 "arm_borrow_operation" "")
>   (SE:DI (match_operand:SI 2 "s_register_operand" "l,r")
>(clobber (match_scratch:SI 0 "=l,r"))]

is non-canonical according to combine.  It will only attempt the ordering

  (compare
(plus ...)
(sign_extend ...))

I have no idea why combine is attempting to reverse the sense of the comparison
here.  I can only presume it would also reverse the sense of the branch on
which the comparison is made, had the pattern matched.

This second problem is partially worked around by fwprop, in that it will try
to simply replace the operand without folding if that is recognizable.  Thus
cases like

  (compare (const_int 0) (plus ...))

can be produced from fwprop but not combine.  Which works well enough to not
bother with the CC_RSBmode that the arm target uses.

The third problem is the really quite complicated code that goes into
SELECT_CC_MODE.  This really should not be as difficult as it is, and is the
sort of thing for which we built recog.

Related to that is the insn costing, which also ought to use something akin to
recog.  We have all of the information there: if the insn is recognizable, the
type/length attributes can be used to provide a good value.


r~


Re: [PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson
On 4/2/20 11:53 AM, Richard Henderson via Gcc-patches wrote:
> This is attacking case 3 of PR 94174.
> 
> In v2, I unify the various subtract-with-borrow and add-with-carry
> patterns that also output flags with unspecs.  As suggested by
> Richard Sandiford during review of v1.  It does seem cleaner.

Hmph.  I miscounted -- this is actually v3.  :-P


r~


[PATCH v2 09/11] aarch64: Adjust result of aarch64_gen_compare_reg

2020-04-02 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8e54506bc3e..93658338041 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18487,7 +18487,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18498,7 +18499,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18510,13 +18512,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18591,10 +18593,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18607,8 +18607,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, 

[PATCH v2 04/11] aarch64: Introduce aarch64_expand_addsubti

2020-04-02 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +--
 gcc/config/aarch64/aarch64.c| 129 +---
 gcc/config/aarch64/aarch64.md   | 125 ++-
 3 files changed, 67 insertions(+), 197 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7a13a8e8ec4..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,110 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   

[PATCH v2 05/11] aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
The rtl description of signed/unsigned overflow from subtract
was fine, as far as it goes -- we have CC_Cmode and CC_Vmode
that indicate that only those particular bits are valid.

However, it's not clear how to extend that description to
handle signed comparison, where N == V (GE) N != V (LT) are
the only valid bits.

Using an UNSPEC means that we can unify all 3 usages without
fear that combine will try to infer anything from the rtl.
It also means we need far fewer variants when various inputs
have constants propagated in, and the rtl folds.

Accept -1 for the second input by using ADCS.

* config/aarch64/aarch64.md (UNSPEC_SBCS): New.
(cmp3_carryin): New expander.
(sub3_carryin_cmp): New expander.
(*cmp3_carryin): New pattern.
(*cmp3_carryin_0): New pattern.
(*sub3_carryin_cmp): New pattern.
(*sub3_carryin_cmp_0): New pattern.
(subvti4, usubvti4, negvti3): Use subdi3_carryin_cmp.
(negvdi_carryinV): Remove.
(usub3_carryinC): Remove.
(*usub3_carryinC): Remove.
(*usub3_carryinC_z1): Remove.
(*usub3_carryinC_z2): Remove.
(sub3_carryinV): Remove.
(*sub3_carryinV): Remove.
(*sub3_carryinV_z2): Remove.
* config/aarch64/predicates.md (aarch64_reg_zero_minus1): New.
---
 gcc/config/aarch64/aarch64.md| 217 +--
 gcc/config/aarch64/predicates.md |   7 +
 2 files changed, 94 insertions(+), 130 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 532c114a42e..564dea390be 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
 UNSPEC_GEN_TAG_RND ; Generate a random 4-bit MTE tag.
 UNSPEC_TAG_SPACE   ; Translate address to MTE tag address space.
 UNSPEC_LD1RO
+UNSPEC_SBCS
 ])
 
 (define_c_enum "unspecv" [
@@ -2942,7 +2943,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_subvdi_insn,
   CODE_FOR_subdi3_compare1,
-  CODE_FOR_subdi3_carryinV);
+  CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2957,7 +2958,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_subdi3_compare1,
   CODE_FOR_subdi3_compare1,
-  CODE_FOR_usubdi3_carryinC);
+  CODE_FOR_subdi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
   DONE;
 })
@@ -2968,12 +2969,14 @@
(label_ref (match_operand 2 "" ""))]
   ""
   {
-emit_insn (gen_negdi_carryout (gen_lowpart (DImode, operands[0]),
-  gen_lowpart (DImode, operands[1])));
-emit_insn (gen_negvdi_carryinV (gen_highpart (DImode, operands[0]),
-   gen_highpart (DImode, operands[1])));
-aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
+rtx op0l = gen_lowpart (DImode, operands[0]);
+rtx op1l = gen_lowpart (DImode, operands[1]);
+rtx op0h = gen_highpart (DImode, operands[0]);
+rtx op1h = gen_highpart (DImode, operands[1]);
 
+emit_insn (gen_negdi_carryout (op0l, op1l));
+emit_insn (gen_subdi3_carryin_cmp (op0h, const0_rtx, op1h));
+aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
 DONE;
   }
 )
@@ -2989,23 +2992,6 @@
   [(set_attr "type" "alus_sreg")]
 )
 
-(define_insn "negvdi_carryinV"
-  [(set (reg:CC_V CC_REGNUM)
-   (compare:CC_V
-(neg:TI (plus:TI
- (ltu:TI (reg:CC CC_REGNUM) (const_int 0))
- (sign_extend:TI (match_operand:DI 1 "register_operand" "r"
-(sign_extend:TI
- (neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-  (match_dup 1))
-   (set (match_operand:DI 0 "register_operand" "=r")
-   (neg:DI (plus:DI (ltu:DI (reg:CC CC_REGNUM) (const_int 0))
-(match_dup 1]
-  ""
-  "ngcs\\t%0, %1"
-  [(set_attr "type" "alus_sreg")]
-)
-
 (define_insn "*sub3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ (minus:GPI (match_operand:GPI 1 "register_operand" "rk")
@@ -3370,134 +3356,105 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "usub3_carryinC"
+(define_expand "sub3_carryin_cmp"
   [(parallel
- [(set (reg:CC CC_REGNUM)
-  (compare:CC
-(zero_extend:
-  (match_operand:GPI 1 "aarch64_reg_or_zero"))
-(plus:
-  (zero_extend:
-(match_operand:GPI 2 "register_operand"))
-  (ltu: (reg:CC CC_REGNUM) (const_int 0)
-  (set (match_operand:GPI 0 "register_operand")
-  (minus:GPI
-(minus:GPI (match_dup 1) (match_dup 2))
-(ltu:GPI (reg:CC CC_REGNUM) (const_int 0])]
+[(set (match_dup 3)
+

[PATCH v2 07/11] aarch64: Remove CC_ADCmode

2020-04-02 Thread Richard Henderson via Gcc-patches
Now that we're using UNSPEC_ADCS instead of rtl, there's
no reason to distinguish CC_ADCmode from CC_Cmode.  Both
examine only the C bit.  Within uaddvti4, using CC_Cmode
is clearer, since it's the carry-outthat's relevant.

* config/aarch64/aarch64-modes.def (CC_ADC): Remove.
* config/aarch64/aarch64.c (aarch64_select_cc_mode):
Do not look for unsigned overflow from add with carry.
* config/aarch64/aarch64.md (uaddvti4): Use CC_Cmode.
* config/aarch64/predicates.md (aarch64_carry_operation)
Remove check for CC_ADCmode.
(aarch64_borrow_operation): Likewise.
---
 gcc/config/aarch64/aarch64.c | 19 ---
 gcc/config/aarch64/aarch64-modes.def |  1 -
 gcc/config/aarch64/aarch64.md|  2 +-
 gcc/config/aarch64/predicates.md |  4 ++--
 4 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..8e54506bc3e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9094,16 +9094,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && rtx_equal_p (XEXP (x, 0), y))
 return CC_Cmode;
 
-  /* A test for unsigned overflow from an add with carry.  */
-  if ((mode_x == DImode || mode_x == TImode)
-  && (code == LTU || code == GEU)
-  && code_x == PLUS
-  && CONST_SCALAR_INT_P (y)
-  && (rtx_mode_t (y, mode_x)
- == (wi::shwi (1, mode_x)
- << (GET_MODE_BITSIZE (mode_x).to_constant () / 2
-return CC_ADCmode;
-
   /* A test for signed overflow.  */
   if ((mode_x == DImode || mode_x == TImode)
   && code == NE
@@ -9232,15 +9222,6 @@ aarch64_get_condition_code_1 (machine_mode mode, enum 
rtx_code comp_code)
}
   break;
 
-case E_CC_ADCmode:
-  switch (comp_code)
-   {
-   case GEU: return AARCH64_CS;
-   case LTU: return AARCH64_CC;
-   default: return -1;
-   }
-  break;
-
 case E_CC_Vmode:
   switch (comp_code)
{
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index af972e8f72b..32e4b6a35a9 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -38,7 +38,6 @@ CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition 
flags are valid.
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* C represents unsigned overflow of a simple addition.  */
-CC_MODE (CC_ADC);   /* Unsigned overflow from an ADC (add with carry).  */
 CC_MODE (CC_V); /* Only V bit of condition flags is valid.  */
 
 /* Half-precision floating point for __fp16.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99023494fa1..8d405b40173 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2079,7 +2079,7 @@
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_carryin_cmp);
-  aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
+  aarch64_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
   DONE;
 })
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 5f44ef7d672..42864cbf4dd 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -388,7 +388,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == LTU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
 return GET_CODE (op) == GEU;
   return false;
 })
@@ -406,7 +406,7 @@
   machine_mode ccmode = GET_MODE (op0);
   if (ccmode == CC_Cmode)
 return GET_CODE (op) == GEU;
-  if (ccmode == CC_ADCmode || ccmode == CCmode)
+  if (ccmode == CCmode)
 return GET_CODE (op) == LTU;
   return false;
 })
-- 
2.20.1



[PATCH v2 11/11] aarch64: Implement absti2

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index cf716f815a1..4a30d4cca93 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3521,6 +3521,35 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+rtx hi_op1 = gen_highpart (DImode, operands[1]);
+rtx lo_tmp = gen_reg_rtx (DImode);
+rtx hi_tmp = gen_reg_rtx (DImode);
+rtx x, cc;
+
+emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+emit_insn (gen_subdi3_carryin_cmp (hi_tmp, const0_rtx, hi_op1));
+
+cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+emit_insn (gen_rtx_SET (lo_tmp, x));
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+emit_insn (gen_rtx_SET (hi_tmp, x));
+
+emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+DONE;
+  }
+)
+
 (define_insn "neg2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1



[PATCH v2 08/11] aarch64: Accept -1 as second argument to add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
* config/aarch64/predicates.md (aarch64_reg_or_minus1): New.
* config/aarch64/aarch64.md (add3_carryin): Use it.
(*add3_carryin): Likewise.
(*addsi3_carryin_uxtw): Likewise.
---
 gcc/config/aarch64/aarch64.md| 26 +++---
 gcc/config/aarch64/predicates.md |  6 +-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d405b40173..c11c4366bf9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2545,7 +2545,7 @@
  (plus:GPI
(ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
(match_operand:GPI 1 "aarch64_reg_or_zero"))
- (match_operand:GPI 2 "aarch64_reg_or_zero")))]
+ (match_operand:GPI 2 "aarch64_reg_zero_minus1")))]
""
""
 )
@@ -2555,28 +2555,32 @@
 ;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
- (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
-   ""
-   "adc\\t%0, %1, %2"
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ,rZ"))
+ (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")))]
+  ""
+  "@
+   adc\\t%0, %1, %2
+   sbc\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
 ;; zero_extend version of above
 (define_insn "*addsi3_carryin_uxtw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
  (plus:SI
(plus:SI
  (match_operand:SI 3 "aarch64_carry_operation" "")
- (match_operand:SI 1 "register_operand" "r"))
-   (match_operand:SI 2 "register_operand" "r"]
-   ""
-   "adc\\t%w0, %w1, %w2"
+ (match_operand:SI 1 "register_operand" "r,r"))
+   (match_operand:SI 2 "aarch64_reg_or_minus1" "r,UsM"]
+  ""
+  "@
+   adc\\t%w0, %w1, %w2
+   sbc\\t%w0, %w1, wzr"
   [(set_attr "type" "adc_reg")]
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 42864cbf4dd..2e7aa6389eb 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -68,13 +68,17 @@
(ior (match_operand 0 "register_operand")
(match_test "op == CONST0_RTX (GET_MODE (op))"
 
+(define_predicate "aarch64_reg_or_minus1"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "register_operand")
+   (match_test "op == CONSTM1_RTX (GET_MODE (op))"
+
 (define_predicate "aarch64_reg_zero_minus1"
   (and (match_code "reg,subreg,const_int")
(ior (match_operand 0 "register_operand")
(ior (match_test "op == CONST0_RTX (GET_MODE (op))")
 (match_test "op == CONSTM1_RTX (GET_MODE (op))")
 
-
 (define_predicate "aarch64_reg_or_fp_zero"
   (ior (match_operand 0 "register_operand")
(and (match_code "const_double")
-- 
2.20.1



[PATCH v2 06/11] aarch64: Use UNSPEC_ADCS for add-with-carry + output flags

2020-04-02 Thread Richard Henderson via Gcc-patches
Similar to UNSPEC_SBCS, we can unify the signed/unsigned overflow
paths by using an unspec.

Accept -1 for the second input by using SBCS.

* config/aarch64/aarch64.md (UNSPEC_ADCS): New.
(addvti4, uaddvti4): Use adddi_carryin_cmp.
(add3_carryinC): Remove.
(*add3_carryinC_zero): Remove.
(*add3_carryinC): Remove.
(add3_carryinV): Remove.
(*add3_carryinV_zero): Remove.
(*add3_carryinV): Remove.
(add3_carryin_cmp): New expander.
(*add3_carryin_cmp): New pattern.
(*add3_carryin_cmp_0): New pattern.
(*cmn3_carryin): New pattern.
(*cmn3_carryin_0): New pattern.
---
 gcc/config/aarch64/aarch64.md | 206 +++---
 1 file changed, 89 insertions(+), 117 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 564dea390be..99023494fa1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -281,6 +281,7 @@
 UNSPEC_GEN_TAG_RND ; Generate a random 4-bit MTE tag.
 UNSPEC_TAG_SPACE   ; Translate address to MTE tag address space.
 UNSPEC_LD1RO
+UNSPEC_ADCS
 UNSPEC_SBCS
 ])
 
@@ -2062,7 +2063,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_adddi3_compareV,
   CODE_FOR_adddi3_compareC,
-  CODE_FOR_adddi3_carryinV);
+  CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
   DONE;
 })
@@ -2077,7 +2078,7 @@
   aarch64_expand_addsubti (operands[0], operands[1], operands[2],
   CODE_FOR_adddi3_compareC,
   CODE_FOR_adddi3_compareC,
-  CODE_FOR_adddi3_carryinC);
+  CODE_FOR_adddi3_carryin_cmp);
   aarch64_gen_unlikely_cbranch (GEU, CC_ADCmode, operands[3]);
   DONE;
 })
@@ -2579,133 +2580,104 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_expand "add3_carryinC"
+(define_expand "add3_carryin_cmp"
   [(parallel
- [(set (match_dup 3)
-  (compare:CC_ADC
-(plus:
-  (plus:
-(match_dup 4)
-(zero_extend:
-  (match_operand:GPI 1 "register_operand")))
-  (zero_extend:
-(match_operand:GPI 2 "register_operand")))
-(match_dup 6)))
-  (set (match_operand:GPI 0 "register_operand")
-  (plus:GPI
-(plus:GPI (match_dup 5) (match_dup 1))
-(match_dup 2)))])]
+[(set (match_dup 3)
+ (unspec:CC
+   [(match_operand:GPI 1 "aarch64_reg_or_zero")
+(match_operand:GPI 2 "aarch64_reg_zero_minus1")
+(match_dup 4)]
+   UNSPEC_ADCS))
+ (set (match_operand:GPI 0 "register_operand")
+ (unspec:GPI
+   [(match_dup 1) (match_dup 2) (match_dup 4)]
+   UNSPEC_ADCS))])]
""
-{
-  operands[3] = gen_rtx_REG (CC_ADCmode, CC_REGNUM);
-  rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
-  operands[4] = gen_rtx_LTU (mode, ccin, const0_rtx);
-  operands[5] = gen_rtx_LTU (mode, ccin, const0_rtx);
-  operands[6] = immed_wide_int_const (wi::shwi (1, mode)
- << GET_MODE_BITSIZE (mode),
- TImode);
-})
+  {
+operands[3] = gen_rtx_REG (CCmode, CC_REGNUM);
+operands[4] = gen_rtx_GEU (mode, operands[3], const0_rtx);
+  }
+)
 
-(define_insn "*add3_carryinC_zero"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
- (plus:
-   (match_operand: 2 "aarch64_carry_operation" "")
-   (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
- (match_operand 4 "const_scalar_int_operand" "")))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (plus:GPI (match_operand:GPI 3 "aarch64_carry_operation" "")
- (match_dup 1)))]
-  "rtx_mode_t (operands[4], mode)
-   == (wi::shwi (1, mode) << (unsigned) GET_MODE_BITSIZE (mode))"
-   "adcs\\t%0, %1, zr"
+(define_insn "*add3_carryin_cmp"
+  [(set (reg:CC CC_REGNUM)
+   (unspec:CC
+ [(match_operand:GPI 1 "aarch64_reg_or_zero" "%rZ,rZ")
+  (match_operand:GPI 2 "aarch64_reg_zero_minus1" "rZ,UsM")
+  (match_operand:GPI 3 "aarch64_carry_operation" "")]
+ UNSPEC_ADCS))
+   (set (match_operand:GPI 0 "register_operand" "=r,r")
+   (unspec:GPI
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+ UNSPEC_ADCS))]
+   ""
+   "@
+adcs\\t%0, %1, %2
+sbcs\\t%0, %1, zr"
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*add3_carryinC"
-  [(set (reg:CC_ADC CC_REGNUM)
-   (compare:CC_ADC
- (plus:
-   (plus:
- (match_operand: 3 "aarch64_carry_operation" "")
- (zero_extend: (match_operand:GPI 1 "register_operand" "r")))
-   (zero_extend: (match_operand:GPI 2 "register_operand" "r")))
- 

[PATCH v2 10/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 122 ++
 gcc/config/aarch64/aarch64.md |  28 
 2 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93658338041..89c9192266c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,126 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  struct expand_operand ops[2];
+  rtx y_lo, y_hi, tmp;
+
+  if (CONST_INT_P (y))
+   {
+ HOST_WIDE_INT y_int = INTVAL (y);
+
+ y_lo = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (y_int == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+   break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  The cstoreti and cbranchti
+operand predicates require aarch64_plus_operand, which
+means this increment cannot overflow.  */
+ y_lo = gen_int_mode (++y_int, DImode);
+ code = (code == LE ? LT : GE);
+ /* fall through */
+
+   case LT:
+   case GE:
+ /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+ if (y_int == 0)
+   {
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+   }
+ break;
+
+   default:
+ break;
+   }
+ y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required. */
+  create_input_operand ([0], x_lo, DImode);
+  create_input_operand ([1], y_lo, DImode);
+  expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (AARCH64_EQ)));
+ break;
+
+   case LTU:
+   case GEU:
+   case LT:
+   case GE:
+ /* Compute (x - y), as double-word arithmetic.  */
+ create_input_operand ([0], x_hi, DImode);
+ create_input_operand ([1], y_hi, DImode);
+ expand_insn (CODE_FOR_cmpdi3_carryin, 2, ops);
+ break;
+
+   default:
+ gcc_unreachable ();
+   }
 }
   else
 {
-  cc_mode = SELECT_CC_MODE (code, x, y);
+  machine_mode cc_mode = SELECT_CC_MODE (code, x, y);
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   

[PATCH v2 01/11] aarch64: Accept 0 as first argument to compares

2020-04-02 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..6fdab5f3402 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,r")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v2 00/11] aarch64: Implement TImode comparisons

2020-04-02 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

In v2, I unify the various subtract-with-borrow and add-with-carry
patterns that also output flags with unspecs.  As suggested by
Richard Sandiford during review of v1.  It does seem cleaner.


r~


Richard Henderson (11):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add3_carryin
  aarch64: Provide expander for sub3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Use UNSPEC_SBCS for subtract-with-borrow + output flags
  aarch64: Use UNSPEC_ADCS for add-with-carry + output flags
  aarch64: Remove CC_ADCmode
  aarch64: Accept -1 as second argument to add3_carryin
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 303 +
 gcc/config/aarch64/aarch64-modes.def  |   1 -
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 762 ++
 gcc/config/aarch64/predicates.md  |  15 +-
 7 files changed, 527 insertions(+), 587 deletions(-)

-- 
2.20.1



[PATCH v2 02/11] aarch64: Accept zeros in add3_carryin

2020-04-02 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to
recognition failures in expand.

* config/aarch64/aarch64.md (*add3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6fdab5f3402..b242f2b1c73 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "register_operand" "r"))
- (match_operand:GPI 2 "register_operand" "r")))]
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
""
"adc\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1



[PATCH v2 03/11] aarch64: Provide expander for sub3_compare1

2020-04-02 Thread Richard Henderson via Gcc-patches
In one place we open-code a special case of this pattern into the
more specific sub3_compare1_imm, and miss this special case
in other places.  Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
* config/aarch64/aarch64.c (aarch64_expand_subvti): Remove
call to gen_subdi3_compare1_imm.
---
 gcc/config/aarch64/aarch64.c  | 11 ++-
 gcc/config/aarch64/aarch64.md | 22 +-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..7a13a8e8ec4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20333,16 +20333,9 @@ aarch64_expand_subvti (rtx op0, rtx low_dest, rtx 
low_in1,
 }
   else
 {
-  if (aarch64_plus_immediate (low_in2, DImode))
-   emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
-   GEN_INT (-INTVAL (low_in2;
-  else
-   {
- low_in2 = force_reg (DImode, low_in2);
- emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-   }
-  high_in2 = force_reg (DImode, high_in2);
+  emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
 
+  high_in2 = force_reg (DImode, high_in2);
   if (unsigned_p)
emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
   else
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b242f2b1c73..d6389cc8148 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_reg_or_imm")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], mode))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+  operands[2] = force_reg (mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-04-01 Thread Richard Henderson via Gcc-patches
On 4/1/20 9:28 AM, Richard Sandiford wrote:
> How important is it to describe the flags operation as a compare though?
> Could we instead use an unspec with three inputs, and keep it as :CC?
> That would still allow special-case matching for zero operands.

I'm not sure.

My guess is that the only interesting optimization for ADC/SBC is when
optimization determines that the low-part of op2 is zero, so that we can fold

  [(set (reg cc) (compare ...))
   (set (reg t0) (sub (reg a0) (reg b0))]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1)
   (sub (reg b1)
 (geu (reg cc) (const 0)]

to

  [(set (reg t0) (reg a0)]

  [(set (reg cc) (compare ...))
   (set (reg t1) (sub (reg a1) (reg b1))]

which combine should be able to do by propagating zeros across the compare+geu.

Though I suppose it's still possible to handle this with unspecs and
define_split, so that

  [(set (reg cc)
(unspec [(reg a1) (reg b2) (geu ...)]
UNSPEC_SBCS)
   (set (reg t1) ...)]

when the geu folds to (const_int 0), we can split this to a plain sub.

I'll see if I can make this work with a minimum of effort.


r~


Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 11:34 AM, Richard Sandiford wrote:
>> +(define_insn "*cmp3_carryinC"
>> +  [(set (reg:CC CC_REGNUM)
>> +(compare:CC
>> +  (ANY_EXTEND:
>> +(match_operand:GPI 0 "register_operand" "r"))
>> +  (plus:
>> +(ANY_EXTEND:
>> +  (match_operand:GPI 1 "register_operand" "r"))
>> +(match_operand: 2 "aarch64_borrow_operation" ""]
>> +   ""
>> +   "sbcs\\tzr, %0, %1"
>> +  [(set_attr "type" "adc_reg")]
>> +)
> 
> I guess this feeds into your reply to Segher's comment for 7/9,
> but I think:
> 
>(compare:CC X Y)
> 
> is always supposed to be the NZCV flags result of X - Y, as computed in
> the mode of X and Y.  If so, it seems like the type of extension should
> matter.  E.g. the N flag ought to be set for:
> 
>   (compare:CC
> (sign_extend 0xf...)
> (plus (sign_extend 0x7...)
>   (ltu ...)))
> 
> but ought to be clear for:
> 
>   (compare:CC
> (zero_extend 0xf...)
> (plus (zero_extend 0x7...)
>   (ltu ...)))
> 
> If so, I guess this is a bug in the existing code...

The subject of CCmodes is a sticky one.  It mostly all depends on what combine
is able to do with the patterns.

For instance, your choice of example above, even for signed, the N bit cannot
be examined by itself, because that would only be valid for a comparison
against zero, like

(compare (plus (reg) (reg))
 (const_int 0))

For this particular bit of rtl, the only valid comparison is N == V, i.e. GE/LT.

If we add a new CC mode for this, what would you call it?  Probably not
CC_NVmode, because to me that implies you can use either N or V, but it doesn't
imply you must examine both.

If we add more CC modes, does that mean that we have to improve SELECT_CC_MODE
to match those patterns?  Or do we add new CC modes just so that combine's use
of SELECT_CC_MODE *cannot* match them?


r~


Re: [PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-31 Thread Richard Henderson via Gcc-patches
On 3/31/20 9:55 AM, Richard Sandiford wrote:
>>  (define_insn "cmp"
>>[(set (reg:CC CC_REGNUM)
>> -(compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
>> -(match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
>> +(compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
>> +(match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
>>""
>>"@
>> -   cmp\\t%0, %1
>> cmp\\t%0, %1
>> -   cmn\\t%0, #%n1"
>> -  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>> +   cmn\\t%0, #%n1
>> +   cmp\\t%0, %1"
>> +  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
>>  )
>>  
>>  (define_insn "fcmp"
> 
> ...does adding 'Z' to operand 1 enable any new combinations?

Not useful ones, on reflection, but given it's a valid combination, it's easier
to include it than not.

I can certainly remove that.

r~



Re: [PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-22 Thread Richard Henderson
On 3/22/20 2:55 PM, Segher Boessenkool wrote:
> Maybe this stuff would be simpler (and more obviously correct) if it
> was more explicit CC_REGNUM is a fixed register, and the code would use
> it directly everywhere?

Indeed the biggest issue I have in this patch is what CC_MODE to expose from
the high-half compare.

For unsigned inequality, only the C bit is valid.  For signed inequality, only
the N + V bits.  For equality, only the Z bit.

Which I am trying to expose with the multiple creations of CC_REGNUM, which are
used within the comparison, which are indeed sanity checked vs the comparison
code via %m/%M.

But the mode of the CC_REGNUM does not necessarily match up with the mode of
the comparison that generates it.  And we do not have a CC_NVmode, so I'm using
full CCmode for that.

This is the part of the patch that could use the most feedback.


r~


Re: [PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-22 Thread Richard Henderson via Gcc-patches
On 3/22/20 12:30 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Mar 20, 2020 at 07:42:25PM -0700, Richard Henderson via Gcc-patches 
> wrote:
>> Duplicate all usub_*_carryinC, but use xzr for the output when we
>> only require the flags output.  The signed versions use sign_extend
>> instead of zero_extend for combine's benefit.
> 
> You actually use ANY_EXTEND, which makes a lot more sense :-)
> 
> Did you see combine create a sign_extend, ever?  Or do those just come
> from combining other insns that already contain a sign_extend?

In the penultimate patch, for cmpti, I emit this sign_extend'ed pattern
manually, so that rtl actually gets the proper description of the comparison of
the high-half of the TImode variable.


r~


[PATCH v2 7/9] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-20 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6263897c9a0..9e7c26a8df2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18517,7 +18518,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18529,13 +18531,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18610,10 +18612,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18626,8 +18626,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, 

[PATCH v2 8/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 130 ++
 gcc/config/aarch64/aarch64.md |  28 
 2 files changed, 144 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9e7c26a8df2..6ae0ea388ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2333,32 +2333,134 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  struct expand_operand ops[2];
+  rtx y_lo, y_hi, tmp;
+
+  if (CONST_INT_P (y))
+   {
+ HOST_WIDE_INT y_int = INTVAL (y);
+
+ y_lo = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (y_int == 0 && can_create_pseudo_p ())
+   {
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+   }
+   break;
+
+   case LE:
+   case GT:
+ /* Add 1 to Y to convert to LT/GE, which avoids the swap and
+keeps the constant operand.  The cstoreti and cbranchti
+operand predicates require aarch64_plus_operand, which
+means this increment cannot overflow.  */
+ y_lo = gen_int_mode (++y_int, DImode);
+ code = (code == LE ? LT : GE);
+ /* fall through */
+
+   case LT:
+   case GE:
+ /* Check only the sign bit using tst, or fold to tbz/tbnz.  */
+ if (y_int == 0)
+   {
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+   }
+ break;
+
+   default:
+ break;
+   }
+ y_hi = (y_int < 0 ? constm1_rtx : const0_rtx);
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  switch (code)
+   {
+   case LEU:
+   case GTU:
+   case LE:
+   case GT:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ break;
+
+   default:
+ break;
+   }
+
+  /* Emit cmpdi, forcing operands into registers as required. */
+  create_input_operand ([0], x_lo, DImode);
+  create_input_operand ([1], y_lo, DImode);
+  expand_insn (CODE_FOR_cmpdi, 2, ops);
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (AARCH64_EQ)));
+ break;
+
+   case LTU:
+   case GEU:
+ /* For LTU, (x - y), as double-word arithmetic.  */
+ create_input_operand ([0], x_hi, DImode);
+ create_input_operand ([1], y_hi, DImode);
+ expand_insn (CODE_FOR_ucmpdi3_carryinC, 2, ops);
+ /* The result is entirely within the C bit. */
+ break;
+
+   case LT:
+   case GE:
+ /* For LT, (x - y), as double-word arithmetic.  */
+ create_input_operand ([0], x_hi, DImode);
+ create_input_operand ([1], y_hi, DImode);
+ 

[PATCH v2 6/9] aarch64: Introduce aarch64_expand_addsubti

2020-03-20 Thread Richard Henderson via Gcc-patches
Modify aarch64_expand_subvti into a form that handles all
addition and subtraction, modulo, signed or unsigned overflow.

Use expand_insn to put the operands into the proper form,
and do not force values into register if not required.

* config/aarch64/aarch64.c (aarch64_ti_split) New.
(aarch64_addti_scratch_regs): Remove.
(aarch64_subvti_scratch_regs): Remove.
(aarch64_expand_subvti): Remove.
(aarch64_expand_addsubti): New.
* config/aarch64/aarch64-protos.h: Update to match.
* config/aarch64/aarch64.md (addti3): Use aarch64_expand_addsubti.
(addvti4, uaddvti4): Likewise.
(subvti4, usubvti4): Likewise.
(subti3): Likewise; accept immediates for operand 2.
---
 gcc/config/aarch64/aarch64-protos.h |  10 +-
 gcc/config/aarch64/aarch64.c| 136 
 gcc/config/aarch64/aarch64.md   | 125 ++---
 3 files changed, 67 insertions(+), 204 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d6d668ea920..787085b24d2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -630,16 +630,8 @@ void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
 bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
-rtx *, rtx *,
-rtx *, rtx *,
-rtx *);
-void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
- rtx *, rtx *,
- rtx *, rtx *, rtx *);
-void aarch64_expand_subvti (rtx, rtx, rtx,
-   rtx, rtx, rtx, rtx, bool);
 
+void aarch64_expand_addsubti (rtx, rtx, rtx, int, int, int);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..6263897c9a0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20241,117 +20241,61 @@ aarch64_gen_unlikely_cbranch (enum rtx_code code, 
machine_mode cc_mode,
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) addition.
+/* Generate DImode scratch registers for 128-bit (TImode) add/sub.
+   INPUT represents the TImode input operand
+   LO represents the low half (DImode) of the TImode operand
+   HI represents the high half (DImode) of the TImode operand.  */
 
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
-void
-aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
-   rtx *low_in1, rtx *low_in2,
-   rtx *high_dest, rtx *high_in1,
-   rtx *high_in2)
+static void
+aarch64_ti_split (rtx input, rtx *lo, rtx *hi)
 {
-  *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
-  *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-  subreg_highpart_offset (DImode, TImode));
+  *lo = simplify_gen_subreg (DImode, input, TImode,
+subreg_lowpart_offset (DImode, TImode));
+  *hi = simplify_gen_subreg (DImode, input, TImode,
+subreg_highpart_offset (DImode, TImode));
 }
 
-/* Generate DImode scratch registers for 128-bit (TImode) subtraction.
-
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
-   OP1 represents the TImode destination operand 1
-   OP2 represents the TImode destination operand 2
-   LOW_DEST represents the low half (DImode) of TImode operand 0
-   LOW_IN1 represents the low half (DImode) of TImode operand 1
-   LOW_IN2 represents the low half (DImode) of TImode operand 2
-   HIGH_DEST represents the high half (DImode) of TImode operand 0
-   HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   

[PATCH v2 1/9] aarch64: Accept 0 as first argument to compares

2020-03-20 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..b9ae51e48dd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -3961,14 +3961,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH v2 9/9] aarch64: Implement absti2

2020-03-20 Thread Richard Henderson via Gcc-patches
* config/aarch64/aarch64.md (absti2): New.
---
 gcc/config/aarch64/aarch64.md | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 284a8038e28..7a112f89487 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3653,6 +3653,36 @@
   }
 )
 
+(define_expand "absti2"
+  [(match_operand:TI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  ""
+  {
+rtx lo_op1 = gen_lowpart (DImode, operands[1]);
+rtx hi_op1 = gen_highpart (DImode, operands[1]);
+rtx lo_tmp = gen_reg_rtx (DImode);
+rtx hi_tmp = gen_reg_rtx (DImode);
+rtx x;
+
+emit_insn (gen_negdi_carryout (lo_tmp, lo_op1));
+emit_insn (gen_negvdi_carryinV (hi_tmp, hi_op1));
+
+rtx cc = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, lo_tmp, lo_op1);
+emit_insn (gen_rtx_SET (lo_tmp, x));
+
+x = gen_rtx_GE (VOIDmode, cc, const0_rtx);
+x = gen_rtx_IF_THEN_ELSE (DImode, x, hi_tmp, hi_op1);
+emit_insn (gen_rtx_SET (hi_tmp, x));
+
+emit_move_insn (gen_lowpart (DImode, operands[0]), lo_tmp);
+emit_move_insn (gen_highpart (DImode, operands[0]), hi_tmp);
+DONE;
+  }
+)
+
 (define_insn "neg2"
   [(set (match_operand:GPI 0 "register_operand" "=r,w")
(neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))]
-- 
2.20.1



[PATCH v2 5/9] aarch64: Provide expander for sub3_compare1

2020-03-20 Thread Richard Henderson via Gcc-patches
In a couple of places we open-code a special case of this
pattern into the more specific sub3_compare1_imm.
Centralize that special case into an expander.

* config/aarch64/aarch64.md (*sub3_compare1): Rename
from sub3_compare1.
(sub3_compare1): New expander.
---
 gcc/config/aarch64/aarch64.md | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 076158b0071..47eeba7311c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3120,7 +3120,7 @@
   [(set_attr "type" "alus_imm")]
 )
 
-(define_insn "sub3_compare1"
+(define_insn "*sub3_compare1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
  (match_operand:GPI 1 "aarch64_reg_or_zero" "rkZ")
@@ -3132,6 +3132,26 @@
   [(set_attr "type" "alus_sreg")]
 )
 
+(define_expand "sub3_compare1"
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_reg_or_imm")))
+ (set (match_operand:GPI 0 "register_operand")
+ (minus:GPI (match_dup 1) (match_dup 2)))])]
+  ""
+{
+  if (aarch64_plus_immediate (operands[2], mode))
+{
+  emit_insn (gen_sub3_compare1_imm
+(operands[0], operands[1], operands[2],
+ GEN_INT (-INTVAL (operands[2];
+  DONE;
+}
+  operands[2] = force_reg (mode, operands[2]);
+})
+
 (define_peephole2
   [(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
-- 
2.20.1



[PATCH v2 3/9] aarch64: Add cmp_*_carryinC patterns

2020-03-20 Thread Richard Henderson via Gcc-patches
Duplicate all usub_*_carryinC, but use xzr for the output when we
only require the flags output.  The signed versions use sign_extend
instead of zero_extend for combine's benefit.

These will be used shortly for TImode comparisons.

* config/aarch64/aarch64.md (cmp3_carryinC): New.
(*cmp3_carryinC_z1): New.
(*cmp3_carryinC_z2): New.
(*cmp3_carryinC): New.
---
 gcc/config/aarch64/aarch64.md | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a996a5f1c39..9b1c3f797f9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3440,6 +3440,18 @@
""
 )
 
+(define_expand "cmp3_carryinC"
+   [(set (reg:CC CC_REGNUM)
+(compare:CC
+  (ANY_EXTEND:
+(match_operand:GPI 0 "register_operand"))
+  (plus:
+(ANY_EXTEND:
+  (match_operand:GPI 1 "register_operand"))
+(ltu: (reg:CC CC_REGNUM) (const_int 0)]
+   ""
+)
+
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3457,6 +3469,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC_z1"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (const_int 0)
+ (plus:
+   (ANY_EXTEND:
+ (match_operand:GPI 0 "register_operand" "r"))
+   (match_operand: 1 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, zr, %0"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3472,6 +3497,17 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC_z2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (match_operand: 1 "aarch64_borrow_operation" "")))]
+   ""
+   "sbcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3490,6 +3526,20 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*cmp3_carryinC"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (plus:
+   (ANY_EXTEND:
+ (match_operand:GPI 1 "register_operand" "r"))
+   (match_operand: 2 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, %0, %1"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_expand "sub3_carryinV"
   [(parallel
  [(set (reg:CC_V CC_REGNUM)
-- 
2.20.1



[PATCH v2 4/9] aarch64: Add cmp_carryinC_m2

2020-03-20 Thread Richard Henderson via Gcc-patches
Combine will fold immediate -1 differently than the other
*cmp*_carryinC* patterns.  In this case we can use adcs
with an xzr input, and it occurs frequently when comparing
128-bit values to small negative constants.

* config/aarch64/aarch64.md (cmp_carryinC_m2): New.
---
 gcc/config/aarch64/aarch64.md | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b1c3f797f9..076158b0071 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3452,6 +3452,7 @@
""
 )
 
+;; Substituting zero into the first input operand.
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3482,6 +3483,7 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting zero into the second input operand.
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3508,6 +3510,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+;; Substituting -1 into the second input operand.
+(define_insn "*cmp3_carryinC_m2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (neg:
+   (match_operand: 1 "aarch64_carry_operation" ""))
+ (ANY_EXTEND:
+   (match_operand:GPI 0 "register_operand" "r"]
+   ""
+   "adcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
-- 
2.20.1



[PATCH v2 2/9] aarch64: Accept zeros in add3_carryin

2020-03-20 Thread Richard Henderson via Gcc-patches
The expander and the insn pattern did not match, leading to
recognition failures in expand.

* config/aarch64/aarch64.md (*add3_carryin): Accept zeros.
---
 gcc/config/aarch64/aarch64.md | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9ae51e48dd..a996a5f1c39 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2606,16 +2606,17 @@
""
 )
 
-;; Note that add with carry with two zero inputs is matched by cset,
-;; and that add with carry with one zero input is matched by cinc.
+;; While add with carry with two zero inputs will be folded to cset,
+;; and add with carry with one zero input will be folded to cinc,
+;; accept the zeros during initial expansion.
 
 (define_insn "*add3_carryin"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI
  (plus:GPI
(match_operand:GPI 3 "aarch64_carry_operation" "")
-   (match_operand:GPI 1 "register_operand" "r"))
- (match_operand:GPI 2 "register_operand" "r")))]
+   (match_operand:GPI 1 "aarch64_reg_or_zero" "rZ"))
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")))]
""
"adc\\t%0, %1, %2"
   [(set_attr "type" "adc_reg")]
-- 
2.20.1



[PATCH v2 0/9] aarch64: Implement TImode comparisons

2020-03-20 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

Although I'm no longer using ccmp for most of the TImode comparisons.
Thanks to Wilco Dijkstra for pulling off my blinders and reminding me
that we can use subs+sbcs for (almost) all compares.

The first 5 patches clean up or add patterns to support the expansion
and not generate extraneous constant loads.

The aarch64_expand_addsubti patch tidies up the existing TImode
arithmetic expansions.

EXAMPLE __subvti3 (context diff is easier to read):

*** 12,27 
10: b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
!   14: eb02003fcmp x1, x2
!   18: 5400010cb.gt38 <__subvti3+0x38>
!   1c: 54000140b.eq44 <__subvti3+0x44>  // b.none
20: d65f03c0ret
!   24: eb01005fcmp x2, x1
!   28: 548cb.gt38 <__subvti3+0x38>
!   2c: 54a1b.ne20 <__subvti3+0x20>  // b.any
!   30: eb9fcmp x4, x0
!   34: 5469b.ls20 <__subvti3+0x20>  // b.plast
!   38: a9bf7bfdstp x29, x30, [sp, #-16]!
!   3c: 910003fdmov x29, sp
!   40: 9400bl  0 
!   44: eb04001fcmp x0, x4
!   48: 5488b.hi38 <__subvti3+0x38>  // b.pmore
!   4c: d65f03c0ret
--- 12,22 
10: b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
!   14: eb9fcmp x4, x0
!   18: fa01005fsbcsxzr, x2, x1
!   1c: 54abb.lt30 <__subvti3+0x30>  // b.tstop
20: d65f03c0ret
!   24: eb04001fcmp x0, x4
!   28: fa02003fsbcsxzr, x1, x2
!   2c: 54aab.ge20 <__subvti3+0x20>  // b.tcont
!   30: a9bf7bfdstp x29, x30, [sp, #-16]!
!   34: 910003fdmov x29, sp
!   38: 9400bl  0 

EXAMPLE from the pr:

void test3(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}

*** 11,23 
subsx0, x0, x2
sbc x1, x1, xzr
!   cmp x1, 0
!   ble .L6
! .L1:
ret
.p2align 2,,3
- .L6:
-   bne .L4
-   cmp x0, 1
-   bhi .L1
  .L4:
b   doit
--- 11,19 
subsx0, x0, x2
sbc x1, x1, xzr
!   cmp x0, 2
!   sbcsxzr, x1, xzr
!   blt .L4
    ret
    .p2align 2,,3
  .L4:
b   doit


r~


Richard Henderson (9):
  aarch64: Accept 0 as first argument to compares
  aarch64: Accept zeros in add3_carryin
  aarch64: Add cmp_*_carryinC patterns
  aarch64: Add cmp_carryinC_m2
  aarch64: Provide expander for sub3_compare1
  aarch64: Introduce aarch64_expand_addsubti
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Implement TImode comparisons
  aarch64: Implement absti2

 gcc/config/aarch64/aarch64-protos.h   |  10 +-
 gcc/config/aarch64/aarch64.c  | 292 +---
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 389 +-
 5 files changed, 402 insertions(+), 312 deletions(-)

-- 
2.20.1



Re: [PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Richard Henderson via Gcc-patches
On 3/19/20 8:47 AM, Wilco Dijkstra wrote:
> Hi Richard,
> 
> Thanks for these patches - yes TI mode expansions can certainly be improved!
> So looking at your expansions for signed compares, why not copy the optimal
> sequence from 32-bit Arm?
> 
> Any compare can be done in at most 2 instructions:
> 
> void doit(void);
> void f(long long a)
> {
> if (a <= 1)
> doit();
> }
> 
> f:
> cmp r0, #2
> sbcsr3, r1, #0
> blt .L4

Well, this one requires that you be able to add 1 to an input and for that
input to not overflow.  But you're right that I should be using this sequence
for LT (not LE).

I'll have another look.


r~


[PATCH 4/6] aarch64: Simplify @ccmp operands

2020-03-19 Thread Richard Henderson via Gcc-patches
The first two arguments were "reversed", in that operand 0 was not
the output, but the input cc_reg.  Remove operand 0 entirely, since
we can get the input cc_reg from within the operand 3 comparison
expression.  This moves the output operand to index 0.

* config/aarch64/aarch64.md (@ccmpcc): New expander; remove
operand 0; change operand 3 from match_operator to match_operand.
(*ccmpcc): Rename from @ccmp; swap numbers of operand 0 & 1.
(@ccmp, *ccmp): Likewise.
(@ccmpcc_rev, *ccmpcc_rev): Likewise.
(@ccmp_rev, *ccmp_rev): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Update to match.
(aarch64_gen_ccmp_next): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 21 +-
 gcc/config/aarch64/aarch64.md | 76 +--
 2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 619357fa210..16ff40fc267 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2349,7 +2349,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 
   rtx x_hi = operand_subword (x, 1, 0, TImode);
   rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, cc_reg, x_hi, y_hi,
+  emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
   gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
   GEN_INT (AARCH64_EQ)));
 }
@@ -20445,7 +20445,7 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (treeop0));
   insn_code icode;
-  struct expand_operand ops[6];
+  struct expand_operand ops[5];
   int aarch64_cond;
 
   push_to_sequence (*prep_seq);
@@ -20484,8 +20484,8 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
 
   icode = code_for_ccmp (cc_mode, cmp_mode);
 
-  op0 = prepare_operand (icode, op0, 2, op_mode, cmp_mode, unsignedp);
-  op1 = prepare_operand (icode, op1, 3, op_mode, cmp_mode, unsignedp);
+  op0 = prepare_operand (icode, op0, 1, op_mode, cmp_mode, unsignedp);
+  op1 = prepare_operand (icode, op1, 2, op_mode, cmp_mode, unsignedp);
   if (!op0 || !op1)
 {
   end_sequence ();
@@ -20517,15 +20517,14 @@ aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
**gen_seq, rtx prev,
   aarch64_cond = AARCH64_INVERSE_CONDITION_CODE (aarch64_cond);
 }
 
-  create_fixed_operand ([0], XEXP (prev, 0));
-  create_fixed_operand ([1], target);
-  create_fixed_operand ([2], op0);
-  create_fixed_operand ([3], op1);
-  create_fixed_operand ([4], prev);
-  create_fixed_operand ([5], GEN_INT (aarch64_cond));
+  create_fixed_operand ([0], target);
+  create_fixed_operand ([1], op0);
+  create_fixed_operand ([2], op1);
+  create_fixed_operand ([3], prev);
+  create_fixed_operand ([4], GEN_INT (aarch64_cond));
 
   push_to_sequence (*gen_seq);
-  if (!maybe_expand_insn (icode, 6, ops))
+  if (!maybe_expand_insn (icode, 5, ops))
 {
   end_sequence ();
   return NULL_RTX;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0fe41117640..12213176103 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -495,11 +495,24 @@
   ""
   "")
 
-(define_insn "@ccmp"
-  [(set (match_operand:CC_ONLY 1 "cc_register" "")
+(define_expand "@ccmp"
+  [(set (match_operand:CC_ONLY 0 "cc_register")
+   (if_then_else:CC_ONLY
+ (match_operand 3 "aarch64_comparison_operator")
+ (compare:CC_ONLY
+   (match_operand:GPI 1 "aarch64_reg_or_zero")
+   (match_operand:GPI 2 "aarch64_ccmp_operand"))
+ (unspec:CC_ONLY
+   [(match_operand 4 "immediate_operand")]
+   UNSPEC_NZCV)))]
+  ""
+)
+
+(define_insn "*ccmp"
+  [(set (match_operand:CC_ONLY 0 "cc_register" "")
(if_then_else:CC_ONLY
  (match_operator 4 "aarch64_comparison_operator"
-  [(match_operand 0 "cc_register" "")
+  [(match_operand 1 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
@@ -515,11 +528,24 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
-(define_insn "@ccmp"
-  [(set (match_operand:CCFP_CCFPE 1 "cc_register" "")
+(define_expand "@ccmp"
+  [(set (match_operand:CCFP_CCFPE 0 "cc_register")
+   (if_then_else:CCFP_CCFPE
+ (match_operand 3 "aarch64_comparison_operator")
+ (compare:CCFP_CCFPE
+   (match_operand:GPF 1 "register_operand")
+   (match_operand:GPF 2 "register_operand"))
+ (unspec:CCFP_CCFPE
+   [(match_operand 4 "immediate_operand")]
+   UNSPEC_NZCV)))]
+  ""
+)
+
+(define_insn "*ccmp"
+  [(set (match_operand:CCFP_CCFPE 0 "cc_register" "")
(if_then_else:CCFP_CCFPE
  (match_operator 4 "aarch64_comparison_operator"
-  

[PATCH 2/6] aarch64: Adjust result of aarch64_gen_compare_reg

2020-03-19 Thread Richard Henderson via Gcc-patches
Return the entire comparison expression, not just the cc_reg.
This will allow the routine to adjust the comparison code as
needed for TImode comparisons.

Note that some users were passing e.g. EQ to aarch64_gen_compare_reg
and then using gen_rtx_NE.  Pass the proper code in the first place.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Return
the final comparison for code & cc_reg.
(aarch64_gen_compare_reg_maybe_ze): Likewise.
(aarch64_expand_compare_and_swap): Update to match -- do not
build the final comparison here, but PUT_MODE as necessary.
(aarch64_split_compare_and_swap): Use prebuilt comparison.
* config/aarch64/aarch64-simd.md (aarch64_cmdi): Likewise.
(aarch64_cmdi): Likewise.
(aarch64_cmtstdi): Likewise.
* config/aarch64/aarch64-speculation.cc
(aarch64_speculation_establish_tracker): Likewise.
* config/aarch64/aarch64.md (cbranch4, cbranch4): Likewise.
(mod3, abs2): Likewise.
(cstore4, cstore4): Likewise.
(cmov6, cmov6): Likewise.
(movcc, movcc, movcc): Likewise.
(cc): Likewise.
(ffs2): Likewise.
(cstorecc4): Remove redundant "".
---
 gcc/config/aarch64/aarch64.c  | 26 +++---
 gcc/config/aarch64/aarch64-simd.md| 18 ++---
 gcc/config/aarch64/aarch64-speculation.cc |  5 +-
 gcc/config/aarch64/aarch64.md | 96 ++-
 4 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c90de65de12..619357fa210 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2328,7 +2328,7 @@ emit_set_insn (rtx x, rtx y)
 }
 
 /* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  */
+   return the rtx for the CCmode comparison.  */
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
@@ -2359,7 +2359,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
   emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
 }
-  return cc_reg;
+  return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
 }
 
 /* Similarly, but maybe zero-extend Y if Y_MODE < SImode.  */
@@ -2382,7 +2382,7 @@ aarch64_gen_compare_reg_maybe_ze (RTX_CODE code, rtx x, 
rtx y,
  cc_mode = CC_SWPmode;
  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
  emit_set_insn (cc_reg, t);
- return cc_reg;
+ return gen_rtx_fmt_ee (code, VOIDmode, cc_reg, const0_rtx);
}
 }
 
@@ -18506,7 +18506,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else if (TARGET_OUTLINE_ATOMICS)
 {
@@ -18517,7 +18518,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
   rval = emit_library_call_value (func, NULL_RTX, LCT_NORMAL, r_mode,
  oldval, mode, newval, mode,
  XEXP (mem, 0), Pmode);
-  cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = aarch64_gen_compare_reg_maybe_ze (EQ, rval, oldval, mode);
+  PUT_MODE (x, SImode);
 }
   else
 {
@@ -18529,13 +18531,13 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (GEN_FCN (code) (rval, mem, oldval, newval,
 is_weak, mod_s, mod_f));
   cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
 }
 
   if (r_mode != mode)
 rval = gen_lowpart (mode, rval);
   emit_move_insn (operands[1], rval);
 
-  x = gen_rtx_EQ (SImode, cc_reg, const0_rtx);
   emit_insn (gen_rtx_SET (bval, x));
 }
 
@@ -18610,10 +18612,8 @@ aarch64_split_compare_and_swap (rtx operands[])
   if (strong_zero_p)
 x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
-{
-  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
-}
+x = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -18626,8 +18626,7 @@ aarch64_split_compare_and_swap (rtx operands[])
{
  /* Emit an explicit compare instruction, so that we can correctly
 track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+ x = aarch64_gen_compare_reg (NE, 

[PATCH 6/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Richard Henderson via Gcc-patches
Use ccmp to perform all TImode comparisons branchless.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Expand all of
the comparisons for TImode, not just NE.
* config/aarch64/aarch64.md (cbranchti4, cstoreti4): New.
---
 gcc/config/aarch64/aarch64.c  | 182 +++---
 gcc/config/aarch64/aarch64.md |  28 ++
 2 files changed, 196 insertions(+), 14 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7899dad759..911dc1c91cd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2363,32 +2363,186 @@ rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
   machine_mode cmp_mode = GET_MODE (x);
-  machine_mode cc_mode;
   rtx cc_reg;
 
   if (cmp_mode == TImode)
 {
-  gcc_assert (code == NE);
-
-  cc_mode = CCmode;
-  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
-
   rtx x_lo = operand_subword (x, 0, 0, TImode);
-  rtx y_lo = operand_subword (y, 0, 0, TImode);
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
-
   rtx x_hi = operand_subword (x, 1, 0, TImode);
-  rtx y_hi = operand_subword (y, 1, 0, TImode);
-  emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
-  gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
+  rtx y_lo, y_hi, tmp;
+
+  if (y == const0_rtx)
+   {
+ y_lo = y_hi = y;
+ switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For equality, IOR the two halves together.  If this gets
+used for a branch, we expect this to fold to cbz/cbnz;
+otherwise it's no larger than cmp+ccmp below.  Beware of
+the compare-and-swap post-reload split and use cmp+ccmp.  */
+ if (!can_create_pseudo_p ())
+   break;
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+ emit_insn (gen_cmpdi (tmp, const0_rtx));
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ goto done;
+
+   case LT:
+   case GE:
+ /* Check only the sign bit.  Choose to expose this detail,
+lest something later tries to use a COMPARE in a way
+that doesn't correspond.  This is "tst".  */
+ cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+ tmp = gen_rtx_AND (DImode, x_hi, GEN_INT (INT64_MIN));
+ tmp = gen_rtx_COMPARE (CC_NZmode, tmp, const0_rtx);
+ emit_set_insn (cc_reg, tmp);
+ code = (code == LT ? NE : EQ);
+ goto done;
+
+   case LE:
+   case GT:
+ /* For GT, (x_hi >= 0) && ((x_hi | x_lo) != 0),
+and of course the inverse for LE.  */
+ emit_insn (gen_cmpdi (x_hi, const0_rtx));
+
+ tmp = gen_reg_rtx (DImode);
+ emit_insn (gen_iordi3 (tmp, x_hi, x_lo));
+
+ /* Combine the two terms:
+(GE ? (compare tmp 0) : EQ),
+so that the whole term is true for NE, false for EQ.  */
+ cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+ emit_insn (gen_ccmpccdi
+(cc_reg, tmp, const0_rtx,
+ gen_rtx_GE (VOIDmode, cc_reg, const0_rtx),
+ GEN_INT (aarch64_nzcv_codes[AARCH64_EQ])));
+
+ /* The result is entirely within the Z bit. */
+ code = (code == GT ? NE : EQ);
+ goto done;
+
+   default:
+ break;
+   }
+   }
+  else
+   {
+ y_lo = operand_subword (y, 0, 0, TImode);
+ y_hi = operand_subword (y, 1, 0, TImode);
+   }
+
+  cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  switch (code)
+   {
+   case EQ:
+   case NE:
+ /* For EQ, (x_lo == y_lo) && (x_hi == y_hi).  */
+ emit_insn (gen_cmpdi (x_lo, y_lo));
+ emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
+  gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
+ break;
+
+   case LEU:
+   case GTU:
+ std::swap (x_lo, y_lo);
+ std::swap (x_hi, y_hi);
+ code = swap_condition (code);
+ /* fall through */
+
+   case LTU:
+   case GEU:
+ /* For LTU, (x - y), as double-word arithmetic.  */
+ emit_insn (gen_cmpdi (x_lo, y_lo));
+ /* The ucmp*_carryinC pattern uses zero_extend, and so cannot
+take the constant 0 we allow elsewhere.  Force to reg now
+and allow combine to eliminate via simplification.  */
+ x_hi = force_reg (DImode, x_hi);
+ y_hi = force_reg (DImode, y_hi);
+ emit_insn (gen_ucmpdi3_carryinC(x_hi, y_hi));
+ /* The result is entirely within 

[PATCH 5/6] aarch64: Improve nzcv argument to ccmp

2020-03-19 Thread Richard Henderson via Gcc-patches
Currently we use %k to interpret an aarch64_cond_code value.
This interpretation is done via an array, aarch64_nzcv_codes.
The rtl is neither hindered nor harmed by using the proper
nzcv value itself, so index the array earlier than later.
This makes it easier to compare the rtl to the assembly.

It is slightly confusing in that aarch64_nzcv_codes has
values of nzcv which produce the inverse of the code that
is the index.  Invert those values.

* config/aarch64/aarch64.c (AARCH64_CC_{NZCV}): Move up.
(aarch64_nzcv_codes): Move up; reverse values of even/odd entries.
(aarch64_gen_compare_reg): Use aarch64_nzcv_codes in
gen_ccmpccdi generation.
(aarch64_print_operand): Remove case 'k'.
(aarch64_gen_ccmp_next): Invert condition for !AND, remove
inversion for AND; use aarch64_nzcv_codes.
* config/aarch64/aarch64.md (*ccmpcc): Remove %k from
all alternatives.
(*ccmpcc_rev, *ccmp, *ccmp_rev): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 81 +++
 gcc/config/aarch64/aarch64.md | 16 +++
 2 files changed, 42 insertions(+), 55 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 16ff40fc267..d7899dad759 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1270,6 +1270,36 @@ aarch64_cc;
 
 #define AARCH64_INVERSE_CONDITION_CODE(X) ((aarch64_cc) (((int) X) ^ 1))
 
+/* N Z C V.  */
+#define AARCH64_CC_V 1
+#define AARCH64_CC_C (1 << 1)
+#define AARCH64_CC_Z (1 << 2)
+#define AARCH64_CC_N (1 << 3)
+
+/*
+ * N Z C V flags for ccmp.  Indexed by aarch64_cond_code.
+ * These are the flags to make the given code be *true*.
+ */
+static const int aarch64_nzcv_codes[] =
+{
+  AARCH64_CC_Z,/* EQ, Z == 1.  */
+  0,   /* NE, Z == 0.  */
+  AARCH64_CC_C,/* CS, C == 1.  */
+  0,   /* CC, C == 0.  */
+  AARCH64_CC_N,/* MI, N == 1.  */
+  0,   /* PL, N == 0.  */
+  AARCH64_CC_V,/* VS, V == 1.  */
+  0,   /* VC, V == 0.  */
+  AARCH64_CC_C,/* HI, C == 1 && Z == 0.  */
+  0,   /* LS, !(C == 1 && Z == 0).  */
+  0,   /* GE, N == V.  */
+  AARCH64_CC_V,/* LT, N != V.  */
+  0,   /* GT, Z == 0 && N == V.  */
+  AARCH64_CC_V,/* LE, !(Z == 0 && N == V).  */
+  0,   /* AL, Any.  */
+  0/* NV, Any.  */
+};
+
 struct aarch64_branch_protect_type
 {
   /* The type's name that the user passes to the branch-protection option
@@ -2351,7 +2381,7 @@ aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
   rtx y_hi = operand_subword (y, 1, 0, TImode);
   emit_insn (gen_ccmpccdi (cc_reg, x_hi, y_hi,
   gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
-  GEN_INT (AARCH64_EQ)));
+  GEN_INT (aarch64_nzcv_codes[AARCH64_NE])));
 }
   else
 {
@@ -9302,33 +9332,6 @@ aarch64_const_vec_all_in_range_p (rtx vec,
   return true;
 }
 
-/* N Z C V.  */
-#define AARCH64_CC_V 1
-#define AARCH64_CC_C (1 << 1)
-#define AARCH64_CC_Z (1 << 2)
-#define AARCH64_CC_N (1 << 3)
-
-/* N Z C V flags for ccmp.  Indexed by AARCH64_COND_CODE.  */
-static const int aarch64_nzcv_codes[] =
-{
-  0,   /* EQ, Z == 1.  */
-  AARCH64_CC_Z,/* NE, Z == 0.  */
-  0,   /* CS, C == 1.  */
-  AARCH64_CC_C,/* CC, C == 0.  */
-  0,   /* MI, N == 1.  */
-  AARCH64_CC_N, /* PL, N == 0.  */
-  0,   /* VS, V == 1.  */
-  AARCH64_CC_V, /* VC, V == 0.  */
-  0,   /* HI, C ==1 && Z == 0.  */
-  AARCH64_CC_C,/* LS, !(C == 1 && Z == 0).  */
-  AARCH64_CC_V,/* GE, N == V.  */
-  0,   /* LT, N != V.  */
-  AARCH64_CC_Z, /* GT, Z == 0 && N == V.  */
-  0,   /* LE, !(Z == 0 && N == V).  */
-  0,   /* AL, Any.  */
-  0/* NV, Any.  */
-};
-
 /* Print floating-point vector immediate operand X to F, negating it
first if NEGATE is true.  Return true on success, false if it isn't
a constant we can handle.  */
@@ -9416,7 +9419,6 @@ sizetochar (int size)
(32-bit or 64-bit).
  '0':  Print a normal operand, if it's a general register,
then we assume DImode.
- 'k':  Print NZCV for conditional compare instructions.
  'A':  Output address constant representing the first
argument of X, specifying a relocation offset
if appropriate.
@@ -9866,22 +9868,6 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   output_addr_const (asm_out_file, x);
   break;
 
-case 'k':
-  {
-   HOST_WIDE_INT cond_code;
-
-   if (!CONST_INT_P (x))
- {
-   output_operand_lossage ("invalid operand for '%%%c'", code);
-   return;
- }
-
-   cond_code = INTVAL (x);
-   gcc_assert (cond_code >= 0 && cond_code <= AARCH64_NV);
-  

[PATCH 1/6] aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC

2020-03-19 Thread Richard Henderson via Gcc-patches
Use xzr for the output when we only require the flags output.
This will be used shortly for TImode comparisons.

* config/aarch64/aarch64.md (ucmp3_carryinC): New.
(*ucmp3_carryinC_z1): New.
(*ucmp3_carryinC_z2): New.
(*ucmp3_carryinC): New.
---
 gcc/config/aarch64/aarch64.md | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..fcc1ddafaec 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3439,6 +3439,18 @@
""
 )
 
+(define_expand "ucmp3_carryinC"
+   [(set (reg:CC CC_REGNUM)
+(compare:CC
+  (zero_extend:
+(match_operand:GPI 0 "register_operand"))
+  (plus:
+(zero_extend:
+  (match_operand:GPI 1 "register_operand"))
+(ltu: (reg:CC CC_REGNUM) (const_int 0)]
+   ""
+)
+
 (define_insn "*usub3_carryinC_z1"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3456,6 +3468,19 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC_z1"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (const_int 0)
+ (plus:
+   (zero_extend:
+ (match_operand:GPI 0 "register_operand" "r"))
+   (match_operand: 1 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, zr, %0"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC_z2"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3471,6 +3496,17 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC_z2"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (zero_extend:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (match_operand: 1 "aarch64_borrow_operation" "")))]
+   ""
+   "sbcs\\tzr, %0, zr"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_insn "*usub3_carryinC"
   [(set (reg:CC CC_REGNUM)
(compare:CC
@@ -3489,6 +3525,20 @@
   [(set_attr "type" "adc_reg")]
 )
 
+(define_insn "*ucmp3_carryinC"
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC
+ (zero_extend:
+   (match_operand:GPI 0 "register_operand" "r"))
+ (plus:
+   (zero_extend:
+ (match_operand:GPI 1 "register_operand" "r"))
+   (match_operand: 2 "aarch64_borrow_operation" ""]
+   ""
+   "sbcs\\tzr, %0, %1"
+  [(set_attr "type" "adc_reg")]
+)
+
 (define_expand "sub3_carryinV"
   [(parallel
  [(set (reg:CC_V CC_REGNUM)
-- 
2.20.1



[PATCH 3/6] aarch64: Accept 0 as first argument to compares

2020-03-19 Thread Richard Henderson via Gcc-patches
While cmp (extended register) and cmp (immediate) uses ,
cmp (shifted register) uses .  So we can perform cmp xzr, x0.

For ccmp, we only have  as an input.

* config/aarch64/aarch64.md (cmp): For operand 0, use
aarch64_reg_or_zero.  Shuffle reg/reg to last alternative
and accept Z.
(@ccmpcc): For operand 0, use aarch64_reg_or_zero and Z.
(@ccmpcc_rev): Likewise.
---
 gcc/config/aarch64/aarch64.md | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 29dfd6df30c..0fe41117640 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -502,7 +502,7 @@
   [(match_operand 0 "cc_register" "")
(const_int 0)])
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
  (unspec:CC_ONLY
[(match_operand 5 "immediate_operand")]
@@ -542,7 +542,7 @@
[(match_operand 5 "immediate_operand")]
UNSPEC_NZCV)
  (compare:CC_ONLY
-   (match_operand:GPI 2 "register_operand" "r,r,r")
+   (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ,rZ,rZ")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"]
   ""
   "@
@@ -4009,14 +4009,14 @@
 
 (define_insn "cmp"
   [(set (reg:CC CC_REGNUM)
-   (compare:CC (match_operand:GPI 0 "register_operand" "rk,rk,rk")
-   (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
+   (compare:CC (match_operand:GPI 0 "aarch64_reg_or_zero" "rk,rk,rkZ")
+   (match_operand:GPI 1 "aarch64_plus_operand" "I,J,rZ")))]
   ""
   "@
-   cmp\\t%0, %1
cmp\\t%0, %1
-   cmn\\t%0, #%n1"
-  [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
+   cmn\\t%0, #%n1
+   cmp\\t%0, %1"
+  [(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )
 
 (define_insn "fcmp"
-- 
2.20.1



[PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Richard Henderson via Gcc-patches
This is attacking case 3 of PR 94174.

The existing ccmp optimization happens at the gimple level,
which means that rtl expansion of TImode stuff cannot take
advantage.  But we can to even better than the existing
ccmp optimization.

This expansion is similar size to our current branchful 
expansion, but all straight-line code.  I will assume in
general that the branch predictor will work better with
fewer branches.

E.g.

-  10:  b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
-  14:  eb02003fcmp x1, x2
-  18:  5400010cb.gt38 <__subvti3+0x38>
-  1c:  54000140b.eq44 <__subvti3+0x44>  // b.none
-  20:  d65f03c0ret
-  24:  eb01005fcmp x2, x1
-  28:  548cb.gt38 <__subvti3+0x38>
-  2c:  54a1b.ne20 <__subvti3+0x20>  // b.any
-  30:  eb9fcmp x4, x0
-  34:  5469b.ls20 <__subvti3+0x20>  // b.plast
-  38:  a9bf7bfdstp x29, x30, [sp, #-16]!
-  3c:  910003fdmov x29, sp
-  40:  9400bl  0 
-  44:  eb04001fcmp x0, x4
-  48:  5488b.hi38 <__subvti3+0x38>  // b.pmore
-  4c:  d65f03c0ret

+  10:  b7f800e3tbnzx3, #63, 2c <__subvti3+0x2c>
+  14:  eb01005fcmp x2, x1
+  18:  1a9fb7e2csetw2, ge  // ge = tcont
+  1c:  fa400080ccmpx4, x0, #0x0, eq  // eq = none
+  20:  7a40a844ccmpw2, #0x0, #0x4, ge  // ge = tcont
+  24:  54e0b.eq40 <__subvti3+0x40>  // b.none
+  28:  d65f03c0ret
+  2c:  eb01005fcmp x2, x1
+  30:  1a9fc7e2csetw2, le
+  34:  fa400081ccmpx4, x0, #0x1, eq  // eq = none
+  38:  7a40d844ccmpw2, #0x0, #0x4, le
+  3c:  5460b.eq28 <__subvti3+0x28>  // b.none
+  40:  a9bf7bfdstp x29, x30, [sp, #-16]!
+  44:  910003fdmov x29, sp
+  48:  9400bl  0 

So one less insn, but 2 branches instead of 6.

As for the specific case of the PR,

void test_int128(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}

0:  eb02subsx0, x0, x2
4:  da1f0021sbc x1, x1, xzr
8:  f13fcmp x1, #0x0
-   c:  544db.le14 
-  10:  d65f03c0ret
-  14:  5461b.ne20   // b.any
-  18:  f100041fcmp x0, #0x1
-  1c:  54a8b.hi10   // b.pmore
+   c:  1a9fc7e1csetw1, le
+  10:  fa410801ccmpx0, #0x1, #0x1, eq  // eq = none
+  14:  7a40d824ccmpw1, #0x0, #0x4, le
+  18:  5441b.ne20   // b.any
+  1c:  d65f03c0ret
   20:  1400b   0 


r~


Richard Henderson (6):
  aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Accept 0 as first argument to compares
  aarch64: Simplify @ccmp operands
  aarch64: Improve nzcv argument to ccmp
  aarch64: Implement TImode comparisons

 gcc/config/aarch64/aarch64.c  | 304 --
 gcc/config/aarch64/aarch64-simd.md|  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md | 280 ++--
 4 files changed, 429 insertions(+), 178 deletions(-)

-- 
2.20.1



[arm, v3] Follow up for asm-flags (thumb1, ilp32)

2019-11-19 Thread Richard Henderson
I'm not sure what happened to v2.  I can see it in my sent email, but it never
made it to the mailing list, and possibly not to Richard E. either.

So resending, with an extra testsuite fix for ilp32, spotted by Christophe.

Re thumb1, rather than an ifdef in config/arm/aarch-common.c, as I did in v1, I
am swapping out a targetm hook when changing into and out of thumb1 mode.


r~


gcc/
* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
to define __GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (thumb1_md_asm_adjust): New function.
(arm_option_params_internal): Swap out targetm.md_asm_adjust
depending on TARGET_THUMB1.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.

gcc/testsuite/
* testsuite/gcc.target/arm/asm-flag-3.c: Skip for thumb1.
* testsuite/gcc.target/arm/asm-flag-5.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-6.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-4.c: New test.

* testsuite/gcc.target/aarch64/asm-flag-6.c: Use %w for
asm inputs to cmp instruction for ILP32.


diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..546b35a5cbd 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -122,7 +122,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   if (arm_arch_notm)
 builtin_define ("__ARM_ARCH_ISA_ARM");
   builtin_define ("__APCS_32__");
-  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
+
+  def_or_undef_macro (pfile, "__GCC_ASM_FLAG_OUTPUTS__", !TARGET_THUMB1);
 
   def_or_undef_macro (pfile, "__thumb__", TARGET_THUMB);
   def_or_undef_macro (pfile, "__thumb2__", TARGET_THUMB2);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1fd30c238cd..a6b401b7f2e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -325,6 +325,9 @@ static unsigned int arm_hard_regno_nregs (unsigned int, 
machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx_insn * thumb1_md_asm_adjust (vec &, vec &,
+   vec &, vec &,
+   HARD_REG_SET &);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -2941,6 +2944,11 @@ arm_option_params_internal (void)
   /* For THUMB2, we limit the conditional sequence to one IT block.  */
   if (TARGET_THUMB2)
 max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
+
+  if (TARGET_THUMB1)
+targetm.md_asm_adjust = thumb1_md_asm_adjust;
+  else
+targetm.md_asm_adjust = arm_md_asm_adjust;
 }
 
 /* True if -mflip-thumb should next add an attribute for the default
@@ -32528,6 +32536,23 @@ arm_run_selftests (void)
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
 
+/* Worker function for TARGET_MD_ASM_ADJUST, while in thumb1 mode.
+   Unlike the arm version, we do NOT implement asm flag outputs.  */
+
+rtx_insn *
+thumb1_md_asm_adjust (vec , vec &/*inputs*/,
+ vec ,
+ vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+if (strncmp (constraints[i], "=@cc", 4) == 0)
+  {
+   sorry ("asm flags not supported in thumb1 mode");
+   break;
+  }
+  return NULL;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-arm.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));
 
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
index 9707ebfcebb..97104d3ac73 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -1,6 +1,7 @@
 /* Test the valid @cc asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 #ifndef __GCC_ASM_FLAG_OUTPUTS__
 #error "missing preprocessor 

Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-19 Thread Richard Henderson
On 11/19/19 9:29 AM, Christophe Lyon wrote:
> On Mon, 18 Nov 2019 at 20:54, Richard Henderson
>  wrote:
>>
>> On 11/18/19 1:30 PM, Christophe Lyon wrote:
>>> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
>>> when compiling with -mabi=ilp32. I have less details than for Arm,
>>> because here I'm using the Foundation Model as simulator instead of
>>> Qemu. In addition, I'm using an old version of it, so maybe it's a
>>> simulator bug. Does it work on your side?
>>
>> I don't know how to test ilp32 with qemu.  Is there a distribution that uses
>> this mode, and one tests in system mode?  We don't have user-only support for
>> ilp32.
>>
> 
> Sorry I wasn't clear: I test aarch64-elf with -mabi=ilp32, using newlib.

In the short term, can you please try this testsuite patch?


r~
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
index 963b5a48c70..54d7fbf317d 100644
--- a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -1,6 +1,12 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
 
+#ifdef __LP64__
+#define W ""
+#else
+#define W "w"
+#endif
+
 int test_bits (long nzcv)
 {
   long n, z, c, v;
@@ -16,7 +22,7 @@ int test_cmps (long x, long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@ccgt"(gt), "=@cclt"(lt), "=@ccge"(ge), "=@ccle"(le)
   : [x] "r"(x), [y] "r"(y));
 
@@ -30,7 +36,7 @@ int test_cmpu (unsigned long x, unsigned long y)
 {
   long gt, lt, ge, le;
 
-  __asm__ ("cmp %[x], %[y]"
+  __asm__ ("cmp %"W"[x], %"W"[y]"
   : "=@cchi"(gt), "=@cclo"(lt), "=@cchs"(ge), "=@ccls"(le)
   : [x] "r"(x), [y] "r"(y));
 


Re: [PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-18 Thread Richard Henderson
On 11/18/19 1:30 PM, Christophe Lyon wrote:
> I'm sorry to notice that the last test (asm-flag-6.c) fails to execute
> when compiling with -mabi=ilp32. I have less details than for Arm,
> because here I'm using the Foundation Model as simulator instead of
> Qemu. In addition, I'm using an old version of it, so maybe it's a
> simulator bug. Does it work on your side?

I don't know how to test ilp32 with qemu.  Is there a distribution that uses
this mode, and one tests in system mode?  We don't have user-only support for
ilp32.

I think I have reproduced this with newlib and aarch64-elf.  It could be
solvable by using either unsigned long long, or by using %w constraints with
the two cmp instructions.

Except that I made that change and the failure didn't go away.  I'm having
trouble building a version of gdb that can debug this...


r~


Re: [PATCH v2 5/6] arm: Add testsuite checks for asm-flag

2019-11-18 Thread Richard Henderson
On 11/18/19 1:25 PM, Christophe Lyon wrote:
> Hi Richard
> 
> On Thu, 14 Nov 2019 at 11:08, Richard Henderson
>  wrote:
>>
>> Inspired by the tests in gcc.target/i386.  Testing code generation,
>> diagnostics, and execution.
>>
>> * gcc.target/arm/asm-flag-1.c: New test.
>> * gcc.target/arm/asm-flag-3.c: New test.
>> * gcc.target/arm/asm-flag-5.c: New test.
>> * gcc.target/arm/asm-flag-6.c: New test.
> 
> I've noticed ICEs when using -march=armv5t, but I believe these are
> fixed by your follow-up patch for thumb1.
> 
> However, I've also noticed that asm-flag-6 fails at execution time
> when generating code for cortex-m (I have m3, m4, and m33 in my list)
> QEMU complains with:
> qemu: fatal: v7m_msr 2048
> 
> Indeed, it crashes on
> 0x81c4:  f383 8800  msr  apsr, r3
> 
> Which looks like a qemu bug, probably similar to the vmsr one I fixed
> recently. While reading the ARM ARM, I also noticed that "apsr" is
> deprecated and should be "APSR_nzcvq" (as emitted by GCC), so it seems
> qemu's disassembler needs an update too. Do you want me to have a look
> at the qemu problems, or will you handle them?

I can handle them.  Thanks for the report.


r~


[arm] Follow up for asm-flags vs thumb1

2019-11-14 Thread Richard Henderson
What I committed today does in fact ICE for thumb1, as you suspected.

I'm currently testing the following vs

  arm-sim/
  arm-sim/-mthumb
  arm-sim/-mcpu=cortex-a15/-mthumb.

which, with the default cpu for arm-elf-eabi, should test all of arm, thumb1,
thumb2.

I'm not thrilled about the ifdef in aarch-common.c, but I don't see a different
way to catch this case for arm and still compile for aarch64.

Ideas?

Particularly ones that work with __attribute__((target("thumb")))?  Which, now
that I've thought about it I really should be testing...


r~
gcc/
* config/arm/aarch-common.c (arm_md_asm_adjust): Sorry
for asm flags in thumb1 mode.
* config/arm/arm-c.c (arm_cpu_builtins): Do not define
__GCC_ASM_FLAG_OUTPUTS__ in thumb1 mode.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.

gcc/testsuite/
* gcc.target/arm/asm-flag-1.c: Skip if arm_thumb1.
* gcc.target/arm/asm-flag-3.c: Skip if arm_thumb1.
* gcc.target/arm/asm-flag-5.c: Skip if arm_thumb1.
* gcc.target/arm/asm-flag-6.c: Skip if arm_thumb1.


diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 760ef6c9c0a..6f3db3838ba 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -544,6 +544,15 @@ arm_md_asm_adjust (vec , vec &/*inputs*/,
   if (strncmp (con, "=@cc", 4) != 0)
continue;
   con += 4;
+
+#ifdef TARGET_THUMB1
+  if (TARGET_THUMB1)
+   {
+ sorry ("asm flags not supported in thumb1 mode");
+ break;
+   }
+#endif
+
   if (strchr (con, ',') != NULL)
{
  error ("alternatives not allowed in % flag output");
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..865c448d531 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -122,7 +122,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   if (arm_arch_notm)
 builtin_define ("__ARM_ARCH_ISA_ARM");
   builtin_define ("__APCS_32__");
-  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
+
+  if (!TARGET_THUMB1)
+builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
 
   def_or_undef_macro (pfile, "__thumb__", TARGET_THUMB);
   def_or_undef_macro (pfile, "__thumb2__", TARGET_THUMB2);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 1c8ae0d5cd3..62a98e939c8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9810,6 +9810,8 @@ signed greater than
 signed less than equal
 @end table
 
+The flag output constraints are not supported in thumb1 mode.
+
 @item x86 family
 The flag output constraints for the x86 family are of the form
 @samp{=@@cc@var{cond}} where @var{cond} is one of the standard
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
index 9707ebfcebb..97104d3ac73 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-1.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -1,6 +1,7 @@
 /* Test the valid @cc asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 #ifndef __GCC_ASM_FLAG_OUTPUTS__
 #error "missing preprocessor define"
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-3.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
index e84e3431277..e2d616051cc 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-3.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
@@ -1,6 +1,7 @@
 /* Test some of the valid @cc asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 #define DO(C) \
 void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); }
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-5.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
index 4d4394e1478..9a8ff586c29 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-5.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
@@ -1,6 +1,7 @@
 /* Test error conditions of asm flag outputs.  */
 /* { dg-do compile } */
 /* { dg-options "" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 void f_B(void) { _Bool x; asm("" : "=@"(x)); }
 void f_c(void) { char x; asm("" : "=@"(x)); }
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-6.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
index 09174e04ae6..d862db4e106 100644
--- a/gcc/testsuite/gcc.target/arm/asm-flag-6.c
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
@@ -1,5 +1,6 @@
 /* Executable testcase for 'output flags.'  */
 /* { dg-do run } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 
 int test_bits (long nzcv)
 {


Re: [PATCH v2 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-14 Thread Richard Henderson
On 11/14/19 3:48 PM, Richard Earnshaw (lists) wrote:
> On 14/11/2019 10:07, Richard Henderson wrote:
>> Since all but a couple of lines is shared between the two targets,
>> enable them both at once.
>>
>> * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Declare.
>> * config/arm/aarch-common.c (arm_md_asm_adjust): New.
>> * config/arm/arm-c.c (arm_cpu_builtins): Define
>> __GCC_ASM_FLAG_OUTPUTS__.
>> * config/arm/arm.c (TARGET_MD_ASM_ADJUST): New.
>> * config/aarch64/aarch64-c.c (aarch64_define_unconditional_macros):
>> Define __GCC_ASM_FLAG_OUTPUTS__.
>> * config/aarch64/aarch64.c (TARGET_MD_ASM_ADJUST): New.
>> * doc/extend.texi (FlagOutputOperands): Add documentation
>> for ARM and AArch64.
> 
> In AArch64 when SVE is enabled, there are some additional condition names 
> which
> are more suited for describing the way conditions are set by the SVE
> instructions.  Do you plan to support those as well?

I did not, no.

I read the acle spec once at the beginning of the year, and vaguely recall that
it already covers pretty much all one wants to do.  I haven't given much
thought to sve in inline asm since.

I suppose I can add them if they're thought important.


r~


Re: [PATCH v2 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-14 Thread Richard Henderson
On 11/14/19 3:39 PM, Richard Earnshaw (lists) wrote:
> Not had a chance to look at this in detail, but I don't see any support for
> 
> 1) Thumb1 where we do not expose the condition codes at all
> 2) Thumb2 where we need IT instructions along-side the conditional 
> instructions
> themselves.
> 
> How have you tested this for those targets?

I tested aarch64-linux and arm-elf-eabi (I'm currently 8 time zones away from
my arm-linux-eabihf box, so using sim).

I didn't know about the thumb1 restriction.  I had assumed somehow that we'd
just use branch insns to form whatever cstore* is required.  I suppose it's
easy enough to generate an error/sorry for asm-flags in thumb1 mode.

As for thumb2, correct behaviour comes from the existing cstore* patterns, and
the testsuite need not check for IT specifically because unified asm syntax
says that the insns that are conditional under the IT should still bear the
conditions themselves.

I presume I can test both of these cases with arm-elf-eabi + -mthumb{1,2}?


r~


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-14 Thread Richard Henderson
On 11/13/19 8:35 PM, Jeff Law wrote:
> On 11/13/19 6:04 AM, Bernd Schmidt wrote:
>> The cc0 machinery allows for eliminating unnecessary comparisons by
>> examining the effect instructions have on the flags registers. I have
>> replicated that mechanism with a relatively modest amount of code based
>> on a final_postscan_insn hook, but it is now opt-in: an instruction
>> pattern can set the "flags_valid" attribute to a number of possible
>> values to indicate what effect it has. That should be more reliable (try
>> git log m68k.md to see recent sprinkling of CC_STATUS_INIT to squash
>> bugs with the previous mechanism).
> Yea, sounds like a reimplementation of the tst elimination bits, but
> buried in the backend.  Given the choice of dropping the port or burying
> this kind of stuff in there, I'd lean towards accepting the latter.

Indeed.  Even if we wanted an eventual transition to the tst elimination bits,
this is a better starting place than from cc0.


r~


Re: [PATCH v2 0/6] Implement asm flag outputs for arm + aarch64

2019-11-14 Thread Richard Henderson
On 11/14/19 2:08 PM, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 11/14/19 10:07 AM, Richard Henderson wrote:
>> I've put the implementation into config/arm/aarch-common.c, so
>> that it can be shared between the two targets.  This required
>> a little bit of cleanup to the CC modes and constraints to get
>> the two targets to match up.
>>
>> Changes for v2:
>>   * Document overflow flags.
>>   * Add "hs" and "lo" as aliases of "cs" and "cc".
>>   * Add unsigned cmp tests to asm-flag-6.c.
>>
>> Richard Sandiford has given his ack for the aarch64 side.
>> I'm still looking for an ack for the arm side.
>>
> The arm parts look good to me, there's not too much arm-specific stuff that's
> not shared with aarch64 thankfully.

Yes indeed.

Committed series ending in r278228.

Thanks,

r~


Re: [PATCH v2 2/6] arm: Fix the "c" constraint

2019-11-14 Thread Richard Henderson
On 11/14/19 2:07 PM, Kyrill Tkachov wrote:
> 
> On 11/14/19 10:07 AM, Richard Henderson wrote:
>> The existing definition using register class CC_REG does not
>> work because CC_REGNUM does not support normal modes, and so
>> fails to match register_operand.  Use a non-register constraint
>> and the cc_register predicate instead.
>>
>>     * config/arm/constraints.md (c): Use cc_register predicate.
> 
> 
> Ok.
> 
> Does this need a backport to the branches?

I don't think so, because it is currently unused.

I tried to track down if it was *ever* used and did not succeed.
The first reference I see is

commit cffb2a26c44c682185b6bb405d48fcbe1fbc0b37
Author: rearnsha 
Date:   Sat Apr 8 14:29:53 2000 +

Merge changes from merged-arm-thumb-backend-branch onto trunk.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@33028 \
138bc75d-0d04-0410-961f-82ee72b054a4

within REG_CLASS_FROM_LETTER.  But I cannot find a user of the constraint
within a checkout of that revision.

Unless I miss something, it seems to have been garbage for a very long time.


r~


[PATCH v2 5/6] arm: Add testsuite checks for asm-flag

2019-11-14 Thread Richard Henderson
Inspired by the tests in gcc.target/i386.  Testing code generation,
diagnostics, and execution.

* gcc.target/arm/asm-flag-1.c: New test.
* gcc.target/arm/asm-flag-3.c: New test.
* gcc.target/arm/asm-flag-5.c: New test.
* gcc.target/arm/asm-flag-6.c: New test.
---
 gcc/testsuite/gcc.target/arm/asm-flag-1.c | 36 +
 gcc/testsuite/gcc.target/arm/asm-flag-3.c | 38 ++
 gcc/testsuite/gcc.target/arm/asm-flag-5.c | 30 +++
 gcc/testsuite/gcc.target/arm/asm-flag-6.c | 62 +++
 4 files changed, 166 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c

diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
new file mode 100644
index 000..9707ebfcebb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -0,0 +1,36 @@
+/* Test the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+
+void f(char *out)
+{
+  asm(""
+  : "=@ccne"(out[0]), "=@cceq"(out[1]),
+   "=@cccs"(out[2]), "=@"(out[3]),
+   "=@ccmi"(out[4]), "=@ccpl"(out[5]),
+   "=@ccvs"(out[6]), "=@ccvc"(out[7]),
+   "=@cchi"(out[8]), "=@ccls"(out[9]),
+   "=@ccge"(out[10]), "=@cclt"(out[11]),
+   "=@ccgt"(out[12]), "=@ccle"(out[13]),
+   "=@cchs"(out[14]), "=@cclo"(out[15]));
+}
+
+/* There will be at least one of each.  */
+/* { dg-final { scan-assembler "movne" } } */
+/* { dg-final { scan-assembler "moveq" } } */
+/* { dg-final { scan-assembler "movcs" } } */
+/* { dg-final { scan-assembler "movcc" } } */
+/* { dg-final { scan-assembler "movmi" } } */
+/* { dg-final { scan-assembler "movpl" } } */
+/* { dg-final { scan-assembler "movvs" } } */
+/* { dg-final { scan-assembler "movvc" } } */
+/* { dg-final { scan-assembler "movhi" } } */
+/* { dg-final { scan-assembler "movls" } } */
+/* { dg-final { scan-assembler "movge" } } */
+/* { dg-final { scan-assembler "movls" } } */
+/* { dg-final { scan-assembler "movgt" } } */
+/* { dg-final { scan-assembler "movle" } } */
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-3.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
new file mode 100644
index 000..e84e3431277
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
@@ -0,0 +1,38 @@
+/* Test some of the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#define DO(C) \
+void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); }
+
+DO(ne)
+DO(eq)
+DO(cs)
+DO(cc)
+DO(hs)
+DO(lo)
+DO(mi)
+DO(pl)
+DO(vs)
+DO(vc)
+DO(hi)
+DO(ls)
+DO(ge)
+DO(lt)
+DO(gt)
+DO(le)
+
+/* { dg-final { scan-assembler "bne" } } */
+/* { dg-final { scan-assembler "beq" } } */
+/* { dg-final { scan-assembler "bcs" } } */
+/* { dg-final { scan-assembler "bcc" } } */
+/* { dg-final { scan-assembler "bmi" } } */
+/* { dg-final { scan-assembler "bpl" } } */
+/* { dg-final { scan-assembler "bvs" } } */
+/* { dg-final { scan-assembler "bvc" } } */
+/* { dg-final { scan-assembler "bhi" } } */
+/* { dg-final { scan-assembler "bls" } } */
+/* { dg-final { scan-assembler "bge" } } */
+/* { dg-final { scan-assembler "blt" } } */
+/* { dg-final { scan-assembler "bgt" } } */
+/* { dg-final { scan-assembler "ble" } } */
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-5.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
new file mode 100644
index 000..4d4394e1478
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
@@ -0,0 +1,30 @@
+/* Test error conditions of asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void f_B(void) { _Bool x; asm("" : "=@"(x)); }
+void f_c(void) { char x; asm("" : "=@"(x)); }
+void f_s(void) { short x; asm("" : "=@"(x)); }
+void f_i(void) { int x; asm("" : "=@"(x)); }
+void f_l(void) { long x; asm("" : "=@"(x)); }
+void f_ll(void) { long long x; asm("" : "=@"(x)); }
+
+void f_f(void)
+{
+  float x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+void f_d(void)
+{
+  double x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+struct S { int x[3]; };
+
+void f_S(void)
+{
+  struct S x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-6.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
new file mode 100644
index 000..09174e04ae6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
@@ -0,0 +1,62 @@
+/* Executable testcase for 'output flags.'  */
+/* { dg-do run } */
+
+int test_bits (long nzcv)
+{
+  long n, z, c, v;
+
+  __asm__ ("msr APSR_nzcvq, %[in]"
+  : "=@ccmi"(n), "=@cceq"(z), "=@cccs"(c), "=@ccvs"(v)
+  : [in] "r"(nzcv << 28));
+
+ 

[PATCH v2 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-14 Thread Richard Henderson
Inspired by the tests in gcc.target/i386.  Testing code generation,
diagnostics, and execution.

* gcc.target/aarch64/asm-flag-1.c: New test.
* gcc.target/aarch64/asm-flag-3.c: New test.
* gcc.target/aarch64/asm-flag-5.c: New test.
* gcc.target/aarch64/asm-flag-6.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c | 35 +++
 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c | 38 
 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c | 30 +
 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c | 62 +++
 4 files changed, 165 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c

diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
new file mode 100644
index 000..49901e59c38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
@@ -0,0 +1,35 @@
+/* Test the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+
+void f(char *out)
+{
+  asm(""
+  : "=@ccne"(out[0]), "=@cceq"(out[1]),
+   "=@cccs"(out[2]), "=@"(out[3]),
+   "=@ccmi"(out[4]), "=@ccpl"(out[5]),
+   "=@ccvs"(out[6]), "=@ccvc"(out[7]),
+   "=@cchi"(out[8]), "=@ccls"(out[9]),
+   "=@ccge"(out[10]), "=@cclt"(out[11]),
+   "=@ccgt"(out[12]), "=@ccle"(out[13]),
+   "=@cchs"(out[14]), "=@cclo"(out[15]));
+}
+
+/* { dg-final { scan-assembler "cset.*, ne" } } */
+/* { dg-final { scan-assembler "cset.*, eq" } } */
+/* { dg-final { scan-assembler "cset.*, cs" } } */
+/* { dg-final { scan-assembler "cset.*, cc" } } */
+/* { dg-final { scan-assembler "cset.*, mi" } } */
+/* { dg-final { scan-assembler "cset.*, pl" } } */
+/* { dg-final { scan-assembler "cset.*, vs" } } */
+/* { dg-final { scan-assembler "cset.*, vc" } } */
+/* { dg-final { scan-assembler "cset.*, hi" } } */
+/* { dg-final { scan-assembler "cset.*, ls" } } */
+/* { dg-final { scan-assembler "cset.*, ge" } } */
+/* { dg-final { scan-assembler "cset.*, ls" } } */
+/* { dg-final { scan-assembler "cset.*, gt" } } */
+/* { dg-final { scan-assembler "cset.*, le" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
new file mode 100644
index 000..e84e3431277
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
@@ -0,0 +1,38 @@
+/* Test some of the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#define DO(C) \
+void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); }
+
+DO(ne)
+DO(eq)
+DO(cs)
+DO(cc)
+DO(hs)
+DO(lo)
+DO(mi)
+DO(pl)
+DO(vs)
+DO(vc)
+DO(hi)
+DO(ls)
+DO(ge)
+DO(lt)
+DO(gt)
+DO(le)
+
+/* { dg-final { scan-assembler "bne" } } */
+/* { dg-final { scan-assembler "beq" } } */
+/* { dg-final { scan-assembler "bcs" } } */
+/* { dg-final { scan-assembler "bcc" } } */
+/* { dg-final { scan-assembler "bmi" } } */
+/* { dg-final { scan-assembler "bpl" } } */
+/* { dg-final { scan-assembler "bvs" } } */
+/* { dg-final { scan-assembler "bvc" } } */
+/* { dg-final { scan-assembler "bhi" } } */
+/* { dg-final { scan-assembler "bls" } } */
+/* { dg-final { scan-assembler "bge" } } */
+/* { dg-final { scan-assembler "blt" } } */
+/* { dg-final { scan-assembler "bgt" } } */
+/* { dg-final { scan-assembler "ble" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
new file mode 100644
index 000..4d4394e1478
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
@@ -0,0 +1,30 @@
+/* Test error conditions of asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void f_B(void) { _Bool x; asm("" : "=@"(x)); }
+void f_c(void) { char x; asm("" : "=@"(x)); }
+void f_s(void) { short x; asm("" : "=@"(x)); }
+void f_i(void) { int x; asm("" : "=@"(x)); }
+void f_l(void) { long x; asm("" : "=@"(x)); }
+void f_ll(void) { long long x; asm("" : "=@"(x)); }
+
+void f_f(void)
+{
+  float x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+void f_d(void)
+{
+  double x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+struct S { int x[3]; };
+
+void f_S(void)
+{
+  struct S x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
new file mode 100644
index 000..963b5a48c70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -0,0 +1,62 @@
+/* Executable testcase for 'output flags.'  */
+/* { dg-do run } */
+
+int test_bits (long nzcv)
+{
+  long n, z, c, v;
+
+  __asm__ ("msr nzcv, 

[PATCH v2 3/6] arm: Rename CC_NOOVmode to CC_NZmode

2019-11-14 Thread Richard Henderson
CC_NZmode is a more accurate description of what we require
from the mode, and matches up with the definition in aarch64.

Rename noov_comparison_operator to nz_comparison_operator
in order to match.

* config/arm/arm-modes.def (CC_NZ): Rename from CC_NOOV.
* config/arm/predicates.md (nz_comparison_operator): Rename
from noov_comparison_operator.
* config/arm/arm.c (arm_select_cc_mode): Use CC_NZmode name.
(arm_gen_dicompare_reg): Likewise.
(maybe_get_arm_condition_code): Likewise.
(thumb1_final_prescan_insn): Likewise.
(arm_emit_coreregs_64bit_shift): Likewise.
* config/arm/arm.md (addsi3_compare0): Likewise.
(*addsi3_compare0_scratch, subsi3_compare0): Likewise.
(*mulsi3_compare0, *mulsi3_compare0_v6): Likewise.
(*mulsi3_compare0_scratch, *mulsi3_compare0_scratch_v6): Likewise.
(*mulsi3addsi_compare0, *mulsi3addsi_compare0_v6): Likewise.
(*mulsi3addsi_compare0_scratch): Likewise.
(*mulsi3addsi_compare0_scratch_v6): Likewise.
(*andsi3_compare0, *andsi3_compare0_scratch): Likewise.
(*zeroextractsi_compare0_scratch): Likewise.
(*ne_zeroextractsi, *ne_zeroextractsi_shifted): Likewise.
(*ite_ne_zeroextractsi, *ite_ne_zeroextractsi_shifted): Likewise.
(andsi_not_shiftsi_si_scc_no_reuse): Likewise.
(andsi_not_shiftsi_si_scc): Likewise.
(*andsi_notsi_si_compare0, *andsi_notsi_si_compare0_scratch): Likewise.
(*iorsi3_compare0, *iorsi3_compare0_scratch): Likewise.
(*xorsi3_compare0, *xorsi3_compare0_scratch): Likewise.
(*shiftsi3_compare0, *shiftsi3_compare0_scratch): Likewise.
(*not_shiftsi_compare0, *not_shiftsi_compare0_scratch): Likewise.
(*notsi_compare0, *notsi_compare0_scratch): Likewise.
(return_addr_mask, *check_arch2): Likewise.
(*arith_shiftsi_compare0, *arith_shiftsi_compare0_scratch): Likewise.
(*sub_shiftsi_compare0, *sub_shiftsi_compare0_scratch): Likewise.
(compare_scc splitters): Likewise.
(movcond_addsi): Likewise.
* config/arm/thumb2.md (thumb2_addsi3_compare0): Likewise.
(*thumb2_addsi3_compare0_scratch): Likewise.
(*thumb2_mulsi_short_compare0): Likewise.
(*thumb2_mulsi_short_compare0_scratch): Likewise.
(compare peephole2s): Likewise.
* config/arm/thumb1.md (thumb1_cbz): Use CC_NZmode and
nz_comparison_operator names.
(cbranchsi4_insn): Likewise.
---
 gcc/config/arm/arm.c |  12 +--
 gcc/config/arm/arm-modes.def |   4 +-
 gcc/config/arm/arm.md| 186 +--
 gcc/config/arm/predicates.md |   2 +-
 gcc/config/arm/thumb1.md |   8 +-
 gcc/config/arm/thumb2.md |  34 +++
 6 files changed, 123 insertions(+), 123 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9086cf65953..d996207853c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15376,7 +15376,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
  || GET_CODE (x) == ASHIFT || GET_CODE (x) == ASHIFTRT
  || GET_CODE (x) == ROTATERT
  || (TARGET_32BIT && GET_CODE (x) == ZERO_EXTRACT)))
-return CC_NOOVmode;
+return CC_NZmode;
 
   /* A comparison of ~reg with a const is really a special
  canoncialization of compare (~const, reg), which is a reverse
@@ -15492,11 +15492,11 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, 
rtx scratch)
  }
 
rtx clobber = gen_rtx_CLOBBER (VOIDmode, scratch);
-   cc_reg = gen_rtx_REG (CC_NOOVmode, CC_REGNUM);
+   cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
 
rtx set
  = gen_rtx_SET (cc_reg,
-gen_rtx_COMPARE (CC_NOOVmode,
+gen_rtx_COMPARE (CC_NZmode,
  gen_rtx_IOR (SImode, x_lo, x_hi),
  const0_rtx));
emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set,
@@ -23881,7 +23881,7 @@ maybe_get_arm_condition_code (rtx comparison)
return code;
   return ARM_NV;
 
-case E_CC_NOOVmode:
+case E_CC_NZmode:
   switch (comp_code)
{
case NE: return ARM_NE;
@@ -25304,7 +25304,7 @@ thumb1_final_prescan_insn (rtx_insn *insn)
  cfun->machine->thumb1_cc_insn = insn;
  cfun->machine->thumb1_cc_op0 = SET_DEST (set);
  cfun->machine->thumb1_cc_op1 = const0_rtx;
- cfun->machine->thumb1_cc_mode = CC_NOOVmode;
+ cfun->machine->thumb1_cc_mode = CC_NZmode;
  if (INSN_CODE (insn) == CODE_FOR_thumb1_subsi3_insn)
{
  rtx src1 = XEXP (SET_SRC (set), 1);
@@ -30486,7 +30486,7 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx 
out, rtx in,
   else
 {
   /* We have a shift-by-register.  */
-  rtx cc_reg = gen_rtx_REG (CC_NOOVmode, CC_REGNUM);
+  

[PATCH v2 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-14 Thread Richard Henderson
Since all but a couple of lines is shared between the two targets,
enable them both at once.

* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Declare.
* config/arm/aarch-common.c (arm_md_asm_adjust): New.
* config/arm/arm-c.c (arm_cpu_builtins): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (TARGET_MD_ASM_ADJUST): New.
* config/aarch64/aarch64-c.c (aarch64_define_unconditional_macros):
Define __GCC_ASM_FLAG_OUTPUTS__.
* config/aarch64/aarch64.c (TARGET_MD_ASM_ADJUST): New.
* doc/extend.texi (FlagOutputOperands): Add documentation
for ARM and AArch64.
---
 gcc/config/arm/aarch-common-protos.h |   6 ++
 gcc/config/aarch64/aarch64-c.c   |   2 +
 gcc/config/aarch64/aarch64.c |   3 +
 gcc/config/arm/aarch-common.c| 136 +++
 gcc/config/arm/arm-c.c   |   1 +
 gcc/config/arm/arm.c |   3 +
 gcc/doc/extend.texi  |  39 
 7 files changed, 190 insertions(+)

diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 3bf38a104f6..f15cf336e9d 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -23,6 +23,8 @@
 #ifndef GCC_AARCH_COMMON_PROTOS_H
 #define GCC_AARCH_COMMON_PROTOS_H
 
+#include "hard-reg-set.h"
+
 extern int aarch_accumulator_forwarding (rtx_insn *, rtx_insn *);
 extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
@@ -141,5 +143,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;
 };
 
+rtx_insn *
+arm_md_asm_adjust (vec , vec &/*inputs*/,
+   vec ,
+   vec , HARD_REG_SET _regs);
 
 #endif /* GCC_AARCH_COMMON_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 7c322ca0813..0af859f1c14 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -69,6 +69,8 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__ARM_FEATURE_UNALIGNED");
   builtin_define ("__ARM_PCS_AAPCS64");
   builtin_define_with_int_value ("__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE / 8);
+
+  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
 }
 
 /* Undefine/redefine macros that depend on the current backend state and may
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d2a3c7ef90a..9a5f27fea3a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21933,6 +21933,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_STRICT_ARGUMENT_NAMING
 #define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true
 
+#undef TARGET_MD_ASM_ADJUST
+#define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 965a07a43e3..760ef6c9c0a 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -26,10 +26,16 @@
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "insn-modes.h"
 #include "tm.h"
 #include "rtl.h"
 #include "rtl-iter.h"
 #include "memmodel.h"
+#include "diagnostic.h"
+#include "tree.h"
+#include "expr.h"
+#include "function.h"
+#include "emit-rtl.h"
 
 /* Return TRUE if X is either an arithmetic shift left, or
is a multiplication by a power of two.  */
@@ -520,3 +526,133 @@ arm_mac_accumulator_is_mul_result (rtx producer, rtx 
consumer)
   && !reg_overlap_mentioned_p (mul_result, mac_op0)
   && !reg_overlap_mentioned_p (mul_result, mac_op1));
 }
+
+/* Worker function for TARGET_MD_ASM_ADJUST.
+   We implement asm flag outputs.  */
+
+rtx_insn *
+arm_md_asm_adjust (vec , vec &/*inputs*/,
+   vec ,
+   vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  bool saw_asm_flag = false;
+
+  start_sequence ();
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+{
+  const char *con = constraints[i];
+  if (strncmp (con, "=@cc", 4) != 0)
+   continue;
+  con += 4;
+  if (strchr (con, ',') != NULL)
+   {
+ error ("alternatives not allowed in % flag output");
+ continue;
+   }
+
+  machine_mode mode;
+  rtx_code code;
+  int con01 = 0;
+
+#define C(X, Y)  (unsigned char)(X) * 256 + (unsigned char)(Y)
+
+  /* All of the condition codes are two characters.  */
+  if (con[0] != 0 && con[1] != 0 && con[2] == 0)
+   con01 = C(con[0], con[1]);
+
+  switch (con01)
+   {
+   case C('c', 'c'):
+   case C('l', 'o'):
+ mode = CC_Cmode, code = GEU;
+ break;
+   case C('c', 's'):
+   case C('h', 's'):
+ mode = CC_Cmode, code = LTU;
+ break;
+   case C('e', 'q'):
+ mode = CC_NZmode, code = EQ;
+ break;
+   case C('g', 'e'):
+ mode = CCmode, code = GE;
+  

[PATCH v2 1/6] aarch64: Add "c" constraint

2019-11-14 Thread Richard Henderson
Mirror arm in letting "c" match the condition code register.

* config/aarch64/constraints.md (c): New constraint.
---
 gcc/config/aarch64/constraints.md | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index d0c3dd5bc1f..b9e5d13e851 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -39,6 +39,10 @@
 (define_register_constraint "y" "FP_LO8_REGS"
   "Floating point and SIMD vector registers V0 - V7.")
 
+(define_constraint "c"
+ "@internal The condition code register."
+  (match_operand 0 "cc_register"))
+
 (define_constraint "I"
  "A constant that can be used with an ADD operation."
  (and (match_code "const_int")
-- 
2.17.1



[PATCH v2 2/6] arm: Fix the "c" constraint

2019-11-14 Thread Richard Henderson
The existing definition using register class CC_REG does not
work because CC_REGNUM does not support normal modes, and so
fails to match register_operand.  Use a non-register constraint
and the cc_register predicate instead.

* config/arm/constraints.md (c): Use cc_register predicate.
---
 gcc/config/arm/constraints.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b76de81b85c..e02b678d26d 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -94,8 +94,9 @@
  "@internal
   Thumb only.  The union of the low registers and the stack register.")
 
-(define_register_constraint "c" "CC_REG"
- "@internal The condition code register.")
+(define_constraint "c"
+ "@internal The condition code register."
+ (match_operand 0 "cc_register"))
 
 (define_register_constraint "Cs" "CALLER_SAVE_REGS"
  "@internal The caller save registers.  Useful for sibcalls.")
-- 
2.17.1



[PATCH v2 0/6] Implement asm flag outputs for arm + aarch64

2019-11-14 Thread Richard Henderson
I've put the implementation into config/arm/aarch-common.c, so
that it can be shared between the two targets.  This required
a little bit of cleanup to the CC modes and constraints to get
the two targets to match up.

Changes for v2:
  * Document overflow flags.
  * Add "hs" and "lo" as aliases of "cs" and "cc".
  * Add unsigned cmp tests to asm-flag-6.c.

Richard Sandiford has given his ack for the aarch64 side.
I'm still looking for an ack for the arm side.


r~


Richard Henderson (6):
  aarch64: Add "c" constraint
  arm: Fix the "c" constraint
  arm: Rename CC_NOOVmode to CC_NZmode
  arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__
  arm: Add testsuite checks for asm-flag
  aarch64: Add testsuite checks for asm-flag

 gcc/config/arm/aarch-common-protos.h  |   6 +
 gcc/config/aarch64/aarch64-c.c|   2 +
 gcc/config/aarch64/aarch64.c  |   3 +
 gcc/config/arm/aarch-common.c | 136 +
 gcc/config/arm/arm-c.c|   1 +
 gcc/config/arm/arm.c  |  15 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |  35 
 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c |  38 
 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c |  62 ++
 gcc/testsuite/gcc.target/arm/asm-flag-1.c |  36 
 gcc/testsuite/gcc.target/arm/asm-flag-3.c |  38 
 gcc/testsuite/gcc.target/arm/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/arm/asm-flag-6.c |  62 ++
 gcc/config/aarch64/constraints.md |   4 +
 gcc/config/arm/arm-modes.def  |   4 +-
 gcc/config/arm/arm.md | 186 +-
 gcc/config/arm/constraints.md |   5 +-
 gcc/config/arm/predicates.md  |   2 +-
 gcc/config/arm/thumb1.md  |   8 +-
 gcc/config/arm/thumb2.md  |  34 ++--
 gcc/doc/extend.texi   |  39 
 22 files changed, 651 insertions(+), 125 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c

-- 
2.17.1



Re: [PATCH 0/6] Implement asm flag outputs for arm + aarch64

2019-11-13 Thread Richard Henderson
On 11/12/19 9:21 PM, Richard Sandiford wrote:
> Apart from the vc/vs thing you mentioned in the follow-up for 4/6,
> it looks like 4/6, 5/6 and 6/6 are missing "hs" and "lo".  OK for
> aarch64 with those added.

Are those aliases for two of the other conditions?  They're not in the list
within the pseudocode for ConditionHolds in the ARM ARM.  Which is what I was
documenting to support...


r~


Re: [Committed] IBM Z: Add pattern for load truth value of comparison into reg

2019-11-11 Thread Richard Henderson
On 11/11/19 4:03 PM, Andreas Krebbel wrote:
> On 11.11.19 15:39, Richard Henderson wrote:
>> On 11/7/19 12:52 PM, Andreas Krebbel wrote:
>>> +; Such patterns get directly emitted by noce_emit_store_flag.
>>> +(define_insn_and_split "*cstorecc_z13"
>>> +  [(set (match_operand:GPR  0 "register_operand""=")
>>> +   (match_operator:GPR 1 "s390_comparison"
>>> +   [(match_operand 2 "cc_reg_operand""c")
>>> +(match_operand 3 "const_int_operand"  "")]))]
>>
>> The clobbered-output seems superfluous, since it can't overlap "c".
> I thought it would be "more" correct this way, but it might lead to an extra 
> reload being emitted - right?

Well, possibly no extra reloads either, since no input will overlap.

>> I believe the only valid const_int is 0, fwiw, so perhaps matching any
>> const_int is overkill.
> We also have CCRAW mode where that value is != 0.

Oh wow.  That's an interesting way to fold those combinations.

>> Does it help Z12 to allow the 3-insn sequence using LOC(G)R?
> Prior to z13 we prefer the variant using a conditional branch.

Ok, just checking.  Thanks,


r~


Re: [Committed] IBM Z: Add pattern for load truth value of comparison into reg

2019-11-11 Thread Richard Henderson
On 11/7/19 12:52 PM, Andreas Krebbel wrote:
> +; Such patterns get directly emitted by noce_emit_store_flag.
> +(define_insn_and_split "*cstorecc_z13"
> +  [(set (match_operand:GPR  0 "register_operand""=")
> + (match_operator:GPR 1 "s390_comparison"
> + [(match_operand 2 "cc_reg_operand""c")
> +  (match_operand 3 "const_int_operand"  "")]))]

The clobbered-output seems superfluous, since it can't overlap "c".
I believe the only valid const_int is 0, fwiw, so perhaps matching any
const_int is overkill.

Does it help Z12 to allow the 3-insn sequence using LOC(G)R?

> +  "TARGET_Z13"
> +  "#"
 > +  "reload_completed"
> +  [(set (match_dup 0) (const_int 0))
> +   (set (match_dup 0)
> + (if_then_else:GPR
> +  (match_op_dup 1 [(match_dup 2) (match_dup 3)])
> +  (const_int 1)
> +  (match_dup 0)))])


r~


Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-09 Thread Richard Henderson
> +;; define_subst and associated attributes
> +
> +(define_subst "add_setq"
> +  [(set (match_operand:SI 0 "" "")
> +(match_operand:SI 1 "" ""))]
> +  ""
> +  [(set (match_dup 0)
> +(match_dup 1))
> +   (set (reg:CC APSRQ_REGNUM)
> + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))])
> +
> +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq")
> +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ"
> +"ARM_Q_BIT_READ")

Is there a good reason to use CCmode for the Q bit instead of BImode?

Is there a good reason not to represent the clobber of the Q bit when we're not
representing the set?

Although it probably doesn't matter, because of the unspec, the update of the Q
bit here in the subst isn't right.  Better would be

  (set (reg:BI APSRQ_REGNUM)
   (ior:BI (unspec:BI [(match_dup 1)] UNSPEC_Q_SET)
   (reg:BI APSRQ_REGNUM)))


> +/* Implement TARGET_CHECK_BUILTIN_CALL.  Record a read of the Q bit through
> +   intrinsics in the machine function.  */
> +bool
> +arm_check_builtin_call (location_t , vec , tree fndecl,
> + tree, unsigned int, tree *)
> +{
> +  int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> +  if (fcode == ARM_BUILTIN_saturation_occurred
> +  || fcode == ARM_BUILTIN_set_saturation)
> +{
> +  if (cfun && cfun->decl)
> + DECL_ATTRIBUTES (cfun->decl)
> +   = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
> +DECL_ATTRIBUTES (cfun->decl));
> +}
> +  return true;
> +}

Where does this attribute affect inlining?  Or get merged into the signature of
a calling function during inlining?

This check happens way way early in the c front end, while doing semantic
checks.  I think this is the wrong hook to use, especially since this is not a
semantic check of the arguments to the builtin.

I think you want to record the use of such a builtin during rtl expansion,
within the define_expand for the buitin, setting a bool in struct
machine_function.  Then use that bool in the arm.md predicates.

(I'll note that there are 5 "int" fields in the arm machine_function that
should be bool, now that we're not using C89.  Probably best to re-order all of
them to the end and add your new bool next to them.)

> +(define_insn "arm_set_apsr"
> +  [(set (reg:CC APSRQ_REGNUM)
> + (unspec_volatile:CC
> +   [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))]
> +  "TARGET_ARM_QBIT"
> +  "msr%?\tAPSR_nzcvq, %0"
> +  [(set_attr "predicable" "yes")
> +   (set_attr "conds" "set")]
> +)

This is missing a clobber (or set) of CC_REGNUM.
Why unspec_volatile and not unspec?

> +;; Read the APSR and set the Q bit (bit position 27) according to operand 0
> +(define_expand "arm_set_saturation"
> +  [(match_operand:SI 0 "reg_or_int_operand")]
> +  "TARGET_ARM_QBIT"
> +  {
> +rtx apsr = gen_reg_rtx (SImode);
> +emit_insn (gen_arm_get_apsr (apsr));
> +rtx to_insert = gen_reg_rtx (SImode);
> +if (CONST_INT_P (operands[0]))
> +  emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode)
> +  ? CONST0_RTX (SImode) : CONST1_RTX (SImode));
> +else
> +  {
> +rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode));
> +emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0],
> +   CONST0_RTX (SImode)));
> +  }
> +emit_insn (gen_insv (apsr, CONST1_RTX (SImode),
> +gen_int_mode (27, SImode), to_insert));
> +emit_insn (gen_arm_set_apsr (apsr));
> +DONE;
> +  }
> +)

Why are you preserving APSR.NZCV across this operation?  It should not be live
during the builtin that expands this.


r~


Re: [PATCH 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-08 Thread Richard Henderson
On 11/8/19 11:54 AM, Richard Henderson wrote:
> +@table @code
> +@item eq
> +``equal'' or Z flag set
> +@item ne
> +``not equal'' or Z flag clear
> +@item cs
> +``carry'' or C flag set
> +@item cc
> +C flag clear
> +@item mi
> +``minus'' or N flag set
> +@item pl
> +``plus'' or N flag clear
> +@item hi
> +unsigned greater than

Dang, skipped right over vc/vs here.  Will fix.


r~


[PATCH 5/6] arm: Add testsuite checks for asm-flag

2019-11-08 Thread Richard Henderson
Inspired by the tests in gcc.target/i386.  Testing code generation,
diagnostics, and execution.

* gcc.target/arm/asm-flag-1.c: New test.
* gcc.target/arm/asm-flag-3.c: New test.
* gcc.target/arm/asm-flag-5.c: New test.
* gcc.target/arm/asm-flag-6.c: New test.
---
 gcc/testsuite/gcc.target/arm/asm-flag-1.c | 35 ++
 gcc/testsuite/gcc.target/arm/asm-flag-3.c | 36 +++
 gcc/testsuite/gcc.target/arm/asm-flag-5.c | 30 
 gcc/testsuite/gcc.target/arm/asm-flag-6.c | 43 +++
 4 files changed, 144 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c

diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-1.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
new file mode 100644
index 000..e1ce4120d98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-1.c
@@ -0,0 +1,35 @@
+/* Test the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+
+void f(char *out)
+{
+  asm(""
+  : "=@ccne"(out[0]), "=@cceq"(out[1]),
+   "=@cccs"(out[2]), "=@"(out[3]),
+   "=@ccmi"(out[4]), "=@ccpl"(out[5]),
+   "=@ccvs"(out[6]), "=@ccvc"(out[7]),
+   "=@cchi"(out[8]), "=@ccls"(out[9]),
+   "=@ccge"(out[10]), "=@cclt"(out[11]),
+   "=@ccgt"(out[12]), "=@ccle"(out[13]));
+}
+
+/* There will be at least one of each, probably two.  */
+/* { dg-final { scan-assembler "movne" } } */
+/* { dg-final { scan-assembler "moveq" } } */
+/* { dg-final { scan-assembler "movcs" } } */
+/* { dg-final { scan-assembler "movcc" } } */
+/* { dg-final { scan-assembler "movmi" } } */
+/* { dg-final { scan-assembler "movpl" } } */
+/* { dg-final { scan-assembler "movvs" } } */
+/* { dg-final { scan-assembler "movvc" } } */
+/* { dg-final { scan-assembler "movhi" } } */
+/* { dg-final { scan-assembler "movls" } } */
+/* { dg-final { scan-assembler "movge" } } */
+/* { dg-final { scan-assembler "movls" } } */
+/* { dg-final { scan-assembler "movgt" } } */
+/* { dg-final { scan-assembler "movle" } } */
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-3.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
new file mode 100644
index 000..8b0bd8a00f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-3.c
@@ -0,0 +1,36 @@
+/* Test some of the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#define DO(C) \
+void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); }
+
+DO(ne)
+DO(eq)
+DO(cs)
+DO(cc)
+DO(mi)
+DO(pl)
+DO(vs)
+DO(vc)
+DO(hi)
+DO(ls)
+DO(ge)
+DO(lt)
+DO(gt)
+DO(le)
+
+/* { dg-final { scan-assembler "bne" } } */
+/* { dg-final { scan-assembler "beq" } } */
+/* { dg-final { scan-assembler "bcs" } } */
+/* { dg-final { scan-assembler "bcc" } } */
+/* { dg-final { scan-assembler "bmi" } } */
+/* { dg-final { scan-assembler "bpl" } } */
+/* { dg-final { scan-assembler "bvs" } } */
+/* { dg-final { scan-assembler "bvc" } } */
+/* { dg-final { scan-assembler "bhi" } } */
+/* { dg-final { scan-assembler "bls" } } */
+/* { dg-final { scan-assembler "bge" } } */
+/* { dg-final { scan-assembler "blt" } } */
+/* { dg-final { scan-assembler "bgt" } } */
+/* { dg-final { scan-assembler "ble" } } */
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-5.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
new file mode 100644
index 000..4d4394e1478
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-5.c
@@ -0,0 +1,30 @@
+/* Test error conditions of asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void f_B(void) { _Bool x; asm("" : "=@"(x)); }
+void f_c(void) { char x; asm("" : "=@"(x)); }
+void f_s(void) { short x; asm("" : "=@"(x)); }
+void f_i(void) { int x; asm("" : "=@"(x)); }
+void f_l(void) { long x; asm("" : "=@"(x)); }
+void f_ll(void) { long long x; asm("" : "=@"(x)); }
+
+void f_f(void)
+{
+  float x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+void f_d(void)
+{
+  double x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+struct S { int x[3]; };
+
+void f_S(void)
+{
+  struct S x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-6.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
new file mode 100644
index 000..ef2e06afc37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-6.c
@@ -0,0 +1,43 @@
+/* Executable testcase for 'output flags.'  */
+/* { dg-do run } */
+
+int test_bits (long nzcv)
+{
+  long n, z, c, v;
+
+  __asm__ ("msr APSR_nzcvq, %[in]"
+  : "=@ccmi"(n), "=@cceq"(z), "=@cccs"(c), "=@ccvs"(v)
+  : [in] "r"(nzcv << 28));
+
+  return n * 8 + z * 4 + c * 2 + 

[PATCH 6/6] aarch64: Add testsuite checks for asm-flag

2019-11-08 Thread Richard Henderson
Inspired by the tests in gcc.target/i386.  Testing code generation,
diagnostics, and execution.

* gcc.target/aarch64/asm-flag-1.c: New test.
* gcc.target/aarch64/asm-flag-3.c: New test.
* gcc.target/aarch64/asm-flag-5.c: New test.
* gcc.target/aarch64/asm-flag-6.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c | 34 +++
 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c | 36 
 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c | 30 +
 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c | 43 +++
 4 files changed, 143 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c

diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
new file mode 100644
index 000..e3e79c29b8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
@@ -0,0 +1,34 @@
+/* Test the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+
+void f(char *out)
+{
+  asm(""
+  : "=@ccne"(out[0]), "=@cceq"(out[1]),
+   "=@cccs"(out[2]), "=@"(out[3]),
+   "=@ccmi"(out[4]), "=@ccpl"(out[5]),
+   "=@ccvs"(out[6]), "=@ccvc"(out[7]),
+   "=@cchi"(out[8]), "=@ccls"(out[9]),
+   "=@ccge"(out[10]), "=@cclt"(out[11]),
+   "=@ccgt"(out[12]), "=@ccle"(out[13]));
+}
+
+/* { dg-final { scan-assembler "cset.*, ne" } } */
+/* { dg-final { scan-assembler "cset.*, eq" } } */
+/* { dg-final { scan-assembler "cset.*, cs" } } */
+/* { dg-final { scan-assembler "cset.*, cc" } } */
+/* { dg-final { scan-assembler "cset.*, mi" } } */
+/* { dg-final { scan-assembler "cset.*, pl" } } */
+/* { dg-final { scan-assembler "cset.*, vs" } } */
+/* { dg-final { scan-assembler "cset.*, vc" } } */
+/* { dg-final { scan-assembler "cset.*, hi" } } */
+/* { dg-final { scan-assembler "cset.*, ls" } } */
+/* { dg-final { scan-assembler "cset.*, ge" } } */
+/* { dg-final { scan-assembler "cset.*, ls" } } */
+/* { dg-final { scan-assembler "cset.*, gt" } } */
+/* { dg-final { scan-assembler "cset.*, le" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
new file mode 100644
index 000..8b0bd8a00f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
@@ -0,0 +1,36 @@
+/* Test some of the valid @cc asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#define DO(C) \
+void f##C(void) { char x; asm("" : "=@cc"#C(x)); if (!x) asm(""); asm(""); }
+
+DO(ne)
+DO(eq)
+DO(cs)
+DO(cc)
+DO(mi)
+DO(pl)
+DO(vs)
+DO(vc)
+DO(hi)
+DO(ls)
+DO(ge)
+DO(lt)
+DO(gt)
+DO(le)
+
+/* { dg-final { scan-assembler "bne" } } */
+/* { dg-final { scan-assembler "beq" } } */
+/* { dg-final { scan-assembler "bcs" } } */
+/* { dg-final { scan-assembler "bcc" } } */
+/* { dg-final { scan-assembler "bmi" } } */
+/* { dg-final { scan-assembler "bpl" } } */
+/* { dg-final { scan-assembler "bvs" } } */
+/* { dg-final { scan-assembler "bvc" } } */
+/* { dg-final { scan-assembler "bhi" } } */
+/* { dg-final { scan-assembler "bls" } } */
+/* { dg-final { scan-assembler "bge" } } */
+/* { dg-final { scan-assembler "blt" } } */
+/* { dg-final { scan-assembler "bgt" } } */
+/* { dg-final { scan-assembler "ble" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
new file mode 100644
index 000..4d4394e1478
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
@@ -0,0 +1,30 @@
+/* Test error conditions of asm flag outputs.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void f_B(void) { _Bool x; asm("" : "=@"(x)); }
+void f_c(void) { char x; asm("" : "=@"(x)); }
+void f_s(void) { short x; asm("" : "=@"(x)); }
+void f_i(void) { int x; asm("" : "=@"(x)); }
+void f_l(void) { long x; asm("" : "=@"(x)); }
+void f_ll(void) { long long x; asm("" : "=@"(x)); }
+
+void f_f(void)
+{
+  float x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+void f_d(void)
+{
+  double x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
+
+struct S { int x[3]; };
+
+void f_S(void)
+{
+  struct S x;
+  asm("" : "=@"(x)); /* { dg-error invalid type } */
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
new file mode 100644
index 000..d9b90b8e517
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
@@ -0,0 +1,43 @@
+/* Executable testcase for 'output flags.'  */
+/* { dg-do run } */
+
+int test_bits (long nzcv)
+{
+  long n, z, c, v;
+
+  __asm__ ("msr nzcv, %[in]"
+  : "=@ccmi"(n), "=@cceq"(z), 

[PATCH 4/6] arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__

2019-11-08 Thread Richard Henderson
Since all but a couple of lines is shared between the two targets,
enable them both at once.

* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Declare.
* config/arm/aarch-common.c (arm_md_asm_adjust): New.
* config/arm/arm-c.c (arm_cpu_builtins): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (TARGET_MD_ASM_ADJUST): New.
* config/aarch64/aarch64-c.c (aarch64_define_unconditional_macros):
Define __GCC_ASM_FLAG_OUTPUTS__.
* config/aarch64/aarch64.c (TARGET_MD_ASM_ADJUST): New.
* doc/extend.texi (FlagOutputOperands): Add documentation
for ARM and AArch64.
---
 gcc/config/arm/aarch-common-protos.h |   6 ++
 gcc/config/aarch64/aarch64-c.c   |   2 +
 gcc/config/aarch64/aarch64.c |   3 +
 gcc/config/arm/aarch-common.c| 131 +++
 gcc/config/arm/arm-c.c   |   1 +
 gcc/config/arm/arm.c |   3 +
 gcc/doc/extend.texi  |  33 +++
 7 files changed, 179 insertions(+)

diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 3bf38a104f6..f15cf336e9d 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -23,6 +23,8 @@
 #ifndef GCC_AARCH_COMMON_PROTOS_H
 #define GCC_AARCH_COMMON_PROTOS_H
 
+#include "hard-reg-set.h"
+
 extern int aarch_accumulator_forwarding (rtx_insn *, rtx_insn *);
 extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
@@ -141,5 +143,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;
 };
 
+rtx_insn *
+arm_md_asm_adjust (vec , vec &/*inputs*/,
+   vec ,
+   vec , HARD_REG_SET _regs);
 
 #endif /* GCC_AARCH_COMMON_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 7c322ca0813..0af859f1c14 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -69,6 +69,8 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__ARM_FEATURE_UNALIGNED");
   builtin_define ("__ARM_PCS_AAPCS64");
   builtin_define_with_int_value ("__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE / 8);
+
+  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
 }
 
 /* Undefine/redefine macros that depend on the current backend state and may
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1dfff331a5a..26de9879bc7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21947,6 +21947,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_STRICT_ARGUMENT_NAMING
 #define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true
 
+#undef TARGET_MD_ASM_ADJUST
+#define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 965a07a43e3..8b98c8d3802 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -26,10 +26,16 @@
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "insn-modes.h"
 #include "tm.h"
 #include "rtl.h"
 #include "rtl-iter.h"
 #include "memmodel.h"
+#include "errors.h"
+#include "tree.h"
+#include "expr.h"
+#include "function.h"
+#include "emit-rtl.h"
 
 /* Return TRUE if X is either an arithmetic shift left, or
is a multiplication by a power of two.  */
@@ -520,3 +526,128 @@ arm_mac_accumulator_is_mul_result (rtx producer, rtx 
consumer)
   && !reg_overlap_mentioned_p (mul_result, mac_op0)
   && !reg_overlap_mentioned_p (mul_result, mac_op1));
 }
+
+/* Worker function for TARGET_MD_ASM_ADJUST.
+   We implement asm flag outputs.  */
+
+rtx_insn *
+arm_md_asm_adjust (vec , vec &/*inputs*/,
+   vec ,
+   vec &/*clobbers*/, HARD_REG_SET &/*clobbered_regs*/)
+{
+  bool saw_asm_flag = false;
+
+  start_sequence ();
+  for (unsigned i = 0, n = outputs.length (); i < n; ++i)
+{
+  const char *con = constraints[i];
+  if (strncmp (con, "=@cc", 4) != 0)
+   continue;
+  con += 4;
+  if (strchr (con, ',') != NULL)
+   {
+ error ("alternatives not allowed in % flag output");
+ continue;
+   }
+
+  machine_mode mode = CCmode;
+  rtx_code code = UNKNOWN;
+
+  switch (con[0])
+   {
+   case 'c':
+ if (con[1] == 'c' && con[2] == 0)
+   mode = CC_Cmode, code = GEU;
+ else if (con[1] == 's' && con[2] == 0)
+   mode = CC_Cmode, code = LTU;
+ break;
+   case 'e':
+ if (con[1] == 'q' && con[2] == 0)
+   mode = CC_NZmode, code = EQ;
+ break;
+   case 'g':
+ if (con[1] == 'e' && con[2] == 0)
+   mode = CCmode, code = GE;
+ else if (con[1] == 't' && con[2] == 0)
+   mode = CCmode, code = GT;
+ break;
+   case 'h':
+ if (con[1] == 'i' && 

[PATCH 2/6] arm: Fix the "c" constraint

2019-11-08 Thread Richard Henderson
The existing definition using register class CC_REG does not
work because CC_REGNUM does not support normal modes, and so
fails to match register_operand.  Use a non-register constraint
and the cc_register predicate instead.

* config/arm/constraints.md (c): Use cc_register predicate.
---
 gcc/config/arm/constraints.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b76de81b85c..e02b678d26d 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -94,8 +94,9 @@
  "@internal
   Thumb only.  The union of the low registers and the stack register.")
 
-(define_register_constraint "c" "CC_REG"
- "@internal The condition code register.")
+(define_constraint "c"
+ "@internal The condition code register."
+ (match_operand 0 "cc_register"))
 
 (define_register_constraint "Cs" "CALLER_SAVE_REGS"
  "@internal The caller save registers.  Useful for sibcalls.")
-- 
2.17.1



[PATCH 3/6] arm: Rename CC_NOOVmode to CC_NZmode

2019-11-08 Thread Richard Henderson
CC_NZmode is a more accurate description of what we require
from the mode, and matches up with the definition in aarch64.

Rename noov_comparison_operator to nz_comparison_operator
in order to match.

* config/arm/arm-modes.def (CC_NZ): Rename from CC_NOOV.
* config/arm/predicates.md (nz_comparison_operator): Rename
from noov_comparison_operator.
* config/arm/arm.c (arm_select_cc_mode): Use CC_NZmode name.
(arm_gen_dicompare_reg): Likewise.
(maybe_get_arm_condition_code): Likewise.
(thumb1_final_prescan_insn): Likewise.
(arm_emit_coreregs_64bit_shift): Likewise.
* config/arm/arm.md (addsi3_compare0): Likewise.
(*addsi3_compare0_scratch, subsi3_compare0): Likewise.
(*mulsi3_compare0, *mulsi3_compare0_v6): Likewise.
(*mulsi3_compare0_scratch, *mulsi3_compare0_scratch_v6): Likewise.
(*mulsi3addsi_compare0, *mulsi3addsi_compare0_v6): Likewise.
(*mulsi3addsi_compare0_scratch): Likewise.
(*mulsi3addsi_compare0_scratch_v6): Likewise.
(*andsi3_compare0, *andsi3_compare0_scratch): Likewise.
(*zeroextractsi_compare0_scratch): Likewise.
(*ne_zeroextractsi, *ne_zeroextractsi_shifted): Likewise.
(*ite_ne_zeroextractsi, *ite_ne_zeroextractsi_shifted): Likewise.
(andsi_not_shiftsi_si_scc_no_reuse): Likewise.
(andsi_not_shiftsi_si_scc): Likewise.
(*andsi_notsi_si_compare0, *andsi_notsi_si_compare0_scratch): Likewise.
(*iorsi3_compare0, *iorsi3_compare0_scratch): Likewise.
(*xorsi3_compare0, *xorsi3_compare0_scratch): Likewise.
(*shiftsi3_compare0, *shiftsi3_compare0_scratch): Likewise.
(*not_shiftsi_compare0, *not_shiftsi_compare0_scratch): Likewise.
(*notsi_compare0, *notsi_compare0_scratch): Likewise.
(return_addr_mask, *check_arch2): Likewise.
(*arith_shiftsi_compare0, *arith_shiftsi_compare0_scratch): Likewise.
(*sub_shiftsi_compare0, *sub_shiftsi_compare0_scratch): Likewise.
(compare_scc splitters): Likewise.
(movcond_addsi): Likewise.
* config/arm/thumb2.md (thumb2_addsi3_compare0): Likewise.
(*thumb2_addsi3_compare0_scratch): Likewise.
(*thumb2_mulsi_short_compare0): Likewise.
(*thumb2_mulsi_short_compare0_scratch): Likewise.
(compare peephole2s): Likewise.
* config/arm/thumb1.md (thumb1_cbz): Use CC_NZmode and
nz_comparison_operator names.
(cbranchsi4_insn): Likewise.
---
 gcc/config/arm/arm.c |  12 +--
 gcc/config/arm/arm-modes.def |   4 +-
 gcc/config/arm/arm.md| 186 +--
 gcc/config/arm/predicates.md |   2 +-
 gcc/config/arm/thumb1.md |   8 +-
 gcc/config/arm/thumb2.md |  34 +++
 6 files changed, 123 insertions(+), 123 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index eddd3ca93ed..b620322318b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15379,7 +15379,7 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
  || GET_CODE (x) == ASHIFT || GET_CODE (x) == ASHIFTRT
  || GET_CODE (x) == ROTATERT
  || (TARGET_32BIT && GET_CODE (x) == ZERO_EXTRACT)))
-return CC_NOOVmode;
+return CC_NZmode;
 
   /* A comparison of ~reg with a const is really a special
  canoncialization of compare (~const, reg), which is a reverse
@@ -15495,11 +15495,11 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, 
rtx scratch)
  }
 
rtx clobber = gen_rtx_CLOBBER (VOIDmode, scratch);
-   cc_reg = gen_rtx_REG (CC_NOOVmode, CC_REGNUM);
+   cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
 
rtx set
  = gen_rtx_SET (cc_reg,
-gen_rtx_COMPARE (CC_NOOVmode,
+gen_rtx_COMPARE (CC_NZmode,
  gen_rtx_IOR (SImode, x_lo, x_hi),
  const0_rtx));
emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set,
@@ -23884,7 +23884,7 @@ maybe_get_arm_condition_code (rtx comparison)
return code;
   return ARM_NV;
 
-case E_CC_NOOVmode:
+case E_CC_NZmode:
   switch (comp_code)
{
case NE: return ARM_NE;
@@ -25307,7 +25307,7 @@ thumb1_final_prescan_insn (rtx_insn *insn)
  cfun->machine->thumb1_cc_insn = insn;
  cfun->machine->thumb1_cc_op0 = SET_DEST (set);
  cfun->machine->thumb1_cc_op1 = const0_rtx;
- cfun->machine->thumb1_cc_mode = CC_NOOVmode;
+ cfun->machine->thumb1_cc_mode = CC_NZmode;
  if (INSN_CODE (insn) == CODE_FOR_thumb1_subsi3_insn)
{
  rtx src1 = XEXP (SET_SRC (set), 1);
@@ -30484,7 +30484,7 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx 
out, rtx in,
   else
 {
   /* We have a shift-by-register.  */
-  rtx cc_reg = gen_rtx_REG (CC_NOOVmode, CC_REGNUM);
+  

[PATCH 0/6] Implement asm flag outputs for arm + aarch64

2019-11-08 Thread Richard Henderson
I've put the implementation into config/arm/aarch-common.c, so
that it can be shared between the two targets.  This required
a little bit of cleanup to the CC modes and constraints to get
the two targets to match up.

I really should have done more than just x86 years ago, so that
it would be done now and I could just use it in the kernel...  ;-)


r~


Richard Henderson (6):
  aarch64: Add "c" constraint
  arm: Fix the "c" constraint
  arm: Rename CC_NOOVmode to CC_NZmode
  arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__
  arm: Add testsuite checks for asm-flag
  aarch64: Add testsuite checks for asm-flag

 gcc/config/arm/aarch-common-protos.h  |   6 +
 gcc/config/aarch64/aarch64-c.c|   2 +
 gcc/config/aarch64/aarch64.c  |   3 +
 gcc/config/arm/aarch-common.c | 131 
 gcc/config/arm/arm-c.c|   1 +
 gcc/config/arm/arm.c  |  15 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c |  34 
 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c |  36 
 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c |  43 
 gcc/testsuite/gcc.target/arm/asm-flag-1.c |  35 
 gcc/testsuite/gcc.target/arm/asm-flag-3.c |  36 
 gcc/testsuite/gcc.target/arm/asm-flag-5.c |  30 +++
 gcc/testsuite/gcc.target/arm/asm-flag-6.c |  43 
 gcc/config/aarch64/constraints.md |   4 +
 gcc/config/arm/arm-modes.def  |   4 +-
 gcc/config/arm/arm.md | 186 +-
 gcc/config/arm/constraints.md |   5 +-
 gcc/config/arm/predicates.md  |   2 +-
 gcc/config/arm/thumb1.md  |   8 +-
 gcc/config/arm/thumb2.md  |  34 ++--
 gcc/doc/extend.texi   |  33 
 22 files changed, 596 insertions(+), 125 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c

-- 
2.17.1



[PATCH 1/6] aarch64: Add "c" constraint

2019-11-08 Thread Richard Henderson
Mirror arm in letting "c" match the condition code register.

* config/aarch64/constraints.md (c): New constraint.
---
 gcc/config/aarch64/constraints.md | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index d0c3dd5bc1f..b9e5d13e851 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -39,6 +39,10 @@
 (define_register_constraint "y" "FP_LO8_REGS"
   "Floating point and SIMD vector registers V0 - V7.")
 
+(define_constraint "c"
+ "@internal The condition code register."
+  (match_operand 0 "cc_register"))
+
 (define_constraint "I"
  "A constant that can be used with an ADD operation."
  (and (match_code "const_int")
-- 
2.17.1



Re: [PATCH, AArch64] PR target/91833

2019-09-25 Thread Richard Henderson
On 9/25/19 3:54 PM, Joseph Myers wrote:
> On Fri, 20 Sep 2019, Richard Henderson wrote:
> 
>> Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).
>>
>> The existing configure claims to be generated by 2.69, but there
>> are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
>> that seems untoward though.
> 
> They're meant to be generated with *unmodified* 2.69 (they were when I did 
> the move to 2.69).  Not with a distribution version that may have some 
> patches, such as the runstatedir patch.

For the record, the first attachment here is the adjustment patch that I
committed over my incorrect rebuild.  The second attachment is the composite
autoconf diff against r276133.

Sorry for the noise.


r~


* config.in, configure: Re-rebuild with stock autoconf 2.69,
not the ubuntu modified 2.69.

diff --git a/libgcc/configure b/libgcc/configure
index 28c7394b3f9..117e9c97e57 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -675,7 +675,6 @@ infodir
 docdir
 oldincludedir
 includedir
-runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -766,7 +765,6 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
-runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1019,15 +1017,6 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
-  -runstatedir | --runstatedir | --runstatedi | --runstated \
-  | --runstate | --runstat | --runsta | --runst | --runs \
-  | --run | --ru | --r)
-ac_prev=runstatedir ;;
-  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
-  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
-  | --run=* | --ru=* | --r=*)
-runstatedir=$ac_optarg ;;
-
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1165,7 +1154,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir runstatedir
+   libdir localedir mandir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1318,7 +1307,6 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
-  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -4185,7 +4173,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4231,7 +4219,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4255,7 +4243,7 @@ rm -f core conftest.err conftest.$ac_objext 
conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4300,7 +4288,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)

Re: [PATCH, AArch64] PR target/91833

2019-09-25 Thread Richard Henderson
On 9/25/19 3:54 PM, Joseph Myers wrote:
> On Fri, 20 Sep 2019, Richard Henderson wrote:
> 
>> Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).
>>
>> The existing configure claims to be generated by 2.69, but there
>> are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
>> that seems untoward though.
> 
> They're meant to be generated with *unmodified* 2.69 (they were when I did 
> the move to 2.69).  Not with a distribution version that may have some 
> patches, such as the runstatedir patch.

Oops.  Well, I'll re-re-generate with stock 2.69.

That still retains the _DARWIN_USE_64_BIT_INODE, which
wasn't there before.  I presume that's an artifact of
a previous rebuild.


r~


[PATCH, AArch64] Fix PR target/91834

2019-09-21 Thread Richard Henderson
As diagnosed in the PR.

* config/aarch64/lse.S (LDNM): Ensure STXR output does not
overlap the inputs.
---
 libgcc/config/aarch64/lse.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index a5f6673596c..c7979382ad7 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -227,8 +227,8 @@ STARTFN NAME(LDNM)
 8: mov s(tmp0), s(0)
 0: LDXRs(0), [x1]
OP  s(tmp1), s(0), s(tmp0)
-   STXRw(tmp1), s(tmp1), [x1]
-   cbnzw(tmp1), 0b
+   STXRw(tmp2), s(tmp1), [x1]
+   cbnzw(tmp2), 0b
ret
 
 ENDFN  NAME(LDNM)
-- 
2.17.1



[PATCH, AArch64] PR target/91833

2019-09-21 Thread Richard Henderson
Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).

The existing configure claims to be generated by 2.69, but there
are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
that seems untoward though.


r~


* config/aarch64/lse-init.c: Include auto-target.h.  Disable
initialization if !HAVE_SYS_AUXV_H.
* configure.ac (AC_CHECK_HEADERS): Add sys/auxv.h.
* config.in, configure: Rebuild.
---
 libgcc/config/aarch64/lse-init.c |  4 +++-
 libgcc/config.in |  8 
 libgcc/configure | 26 +++---
 libgcc/configure.ac  |  2 +-
 4 files changed, 31 insertions(+), 9 deletions(-)
 mode change 100644 => 100755 libgcc/configure

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
index 33d29147479..1a8f4c55213 100644
--- a/libgcc/config/aarch64/lse-init.c
+++ b/libgcc/config/aarch64/lse-init.c
@@ -23,12 +23,14 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#include "auto-target.h"
+
 /* Define the symbol gating the LSE implementations.  */
 _Bool __aarch64_have_lse_atomics
   __attribute__((visibility("hidden"), nocommon));
 
 /* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
-#ifndef inhibit_libc
+#if !defined(inhibit_libc) && defined(HAVE_SYS_AUXV_H)
 # include 
 
 /* Disable initialization if the system headers are too old.  */
diff --git a/libgcc/config.in b/libgcc/config.in
index d634af9d949..59a3d8daf52 100644
--- a/libgcc/config.in
+++ b/libgcc/config.in
@@ -43,6 +43,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STRING_H
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_SYS_AUXV_H
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_STAT_H
 
@@ -82,6 +85,11 @@
 /* Define to 1 if the target use emutls for thread-local storage. */
 #undef USE_EMUTLS
 
+/* Enable large inode numbers on Mac OS X 10.5.  */
+#ifndef _DARWIN_USE_64_BIT_INODE
+# define _DARWIN_USE_64_BIT_INODE 1
+#endif
+
 /* Number of bits in a file offset, on hosts where this is settable. */
 #undef _FILE_OFFSET_BITS
 
diff --git a/libgcc/configure b/libgcc/configure
old mode 100644
new mode 100755
index 29f647319b4..28c7394b3f9
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -675,6 +675,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -765,6 +766,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1017,6 +1019,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1154,7 +1165,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir
+   libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1307,6 +1318,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -4173,7 +4185,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4219,7 +4231,7 @@ else
 We can't simply define LARGE_OFF_T to 

Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line

2019-09-19 Thread Richard Henderson
On 9/18/19 5:58 AM, Kyrill Tkachov wrote:
> Thanks for this.
> 
> I've bootstrapped and tested this patch series on systems with and without LSE
> support, both with and without patch [6/6], so 4 setups in total.
> 
> It all looks clean for me.
> 
> I'm favour of this series going in (modulo patch 6/6, leaving the option to
> turn it on to the user).
> 
> I've got a couple of small comments on some of the patches that IMO can be
> fixed when committing.

Thanks.  Committed with the requested modifications.


r~


[PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics

2019-09-17 Thread Richard Henderson
* config/aarch64/aarch64.opt (-moutline-atomics): New.
* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
(aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
* config/aarch64/atomics.md (atomic_exchange): Likewise.
(atomic_): Likewise.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
testsuite/
* gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
* gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-char.c: Likewise.
* gcc.target/aarch64/atomic-op-consume.c: Likewise.
* gcc.target/aarch64/atomic-op-imm.c: Likewise.
* gcc.target/aarch64/atomic-op-int.c: Likewise.
* gcc.target/aarch64/atomic-op-long.c: Likewise.
* gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
* gcc.target/aarch64/atomic-op-release.c: Likewise.
* gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
* gcc.target/aarch64/atomic-op-short.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
* gcc.target/aarch64/sync-comp-swap.c: Likewise.
* gcc.target/aarch64/sync-op-acquire.c: Likewise.
* gcc.target/aarch64/sync-op-full.c: Likewise.
---
 gcc/config/aarch64/aarch64-protos.h   | 13 +++
 gcc/config/aarch64/aarch64.c  | 87 +
 .../atomic-comp-swap-release-acquire.c|  2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c|  2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c|  2 +-
 .../gcc.target/aarch64/atomic-op-char.c   |  2 +-
 .../gcc.target/aarch64/atomic-op-consume.c|  2 +-
 .../gcc.target/aarch64/atomic-op-imm.c|  2 +-
 .../gcc.target/aarch64/atomic-op-int.c|  2 +-
 .../gcc.target/aarch64/atomic-op-long.c   |  2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c|  2 +-
 .../gcc.target/aarch64/atomic-op-release.c|  2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c|  2 +-
 .../gcc.target/aarch64/atomic-op-short.c  |  2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |  2 +-
 .../atomic_cmp_exchange_zero_strong_1.c   |  2 +-
 .../gcc.target/aarch64/sync-comp-swap.c   |  2 +-
 .../gcc.target/aarch64/sync-op-acquire.c  |  2 +-
 .../gcc.target/aarch64/sync-op-full.c |  2 +-
 gcc/config/aarch64/aarch64.opt|  3 +
 gcc/config/aarch64/atomics.md | 94 +--
 gcc/doc/invoke.texi   | 16 +++-
 22 files changed, 221 insertions(+), 26 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index c4b73d26df6..1c1aac7201a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
 bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
 
+struct atomic_ool_names
+{
+const char *str[5][4];
+};
+
+rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names);
+extern const atomic_ool_names aarch64_ool_swp_names;
+extern const atomic_ool_names aarch64_ool_ldadd_names;
+extern const atomic_ool_names aarch64_ool_ldset_names;
+extern const atomic_ool_names aarch64_ool_ldclr_names;
+extern const atomic_ool_names aarch64_ool_ldeor_names;
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b937514e6f8..56a4a47db73 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
 }
 
+/* We store the names of the various atomic helpers in a 5x4 array.
+   Return the libcall function given MODE, MODEL and NAMES.  */
+
+rtx
+aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names)
+{
+  memmodel model = memmodel_base (INTVAL (model_rtx));
+  int mode_idx, model_idx;
+
+  switch (mode)
+{
+case E_QImode:
+  mode_idx = 0;
+  break;
+case E_HImode:
+  mode_idx = 1;
+  break;
+case E_SImode:
+  mode_idx = 2;
+  break;
+case E_DImode:
+  mode_idx = 3;
+  break;
+case E_TImode:
+  mode_idx = 4;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  switch (model)
+{
+case MEMMODEL_RELAXED:
+  model_idx = 0;
+  break;
+case MEMMODEL_CONSUME:
+case MEMMODEL_ACQUIRE:
+  model_idx = 1;
+  break;
+case MEMMODEL_RELEASE:
+  

[PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics

2019-09-17 Thread Richard Henderson
This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.S.

* config/aarch64/lse-init.c: New file.
* config/aarch64/lse.S: New file.
* config/aarch64/t-lse: New file.
* config.host: Add t-lse to all aarch64 tuples.
---
 libgcc/config/aarch64/lse-init.c |  45 ++
 libgcc/config.host   |   4 +
 libgcc/config/aarch64/lse.S  | 235 +++
 libgcc/config/aarch64/t-lse  |  44 ++
 4 files changed, 328 insertions(+)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
new file mode 100644
index 000..51fb21d45c9
--- /dev/null
+++ b/libgcc/config/aarch64/lse-init.c
@@ -0,0 +1,45 @@
+/* Out-of-line LSE atomics for AArch64 architecture, Init.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Define the symbol gating the LSE implementations.  */
+_Bool __aarch64_have_lse_atomics
+  __attribute__((visibility("hidden"), nocommon));
+
+/* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
+#ifndef inhibit_libc
+# include 
+
+/* Disable initialization if the system headers are too old.  */
+# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+
+static void __attribute__((constructor))
+init_have_lse_atomics (void)
+{
+  unsigned long hwcap = getauxval (AT_HWCAP);
+  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+
+# endif /* HWCAP */
+#endif /* inhibit_libc */
diff --git a/libgcc/config.host b/libgcc/config.host
index 728e543ea39..122113fc519 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/aarch64-unwind.h
;;
 aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/freebsd-unwind.h
;;
@@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
;;
 aarch64*-*-fuchsia*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
;;
 aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
;;
 alpha*-*-linux*)
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
new file mode 100644
index 000..c24a39242ca
--- /dev/null
+++ b/libgcc/config/aarch64/lse.S
@@ -0,0 +1,235 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are 

[PATCH, AArch64 v4 6/6] TESTING: Enable -moutline-atomics by default

2019-09-17 Thread Richard Henderson
---
 gcc/common/config/aarch64/aarch64-common.c | 6 --
 gcc/config/aarch64/aarch64.c   | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 07c03253951..2bbf454eea9 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -32,9 +32,11 @@
 #include "diagnostic.h"
 #include "params.h"
 
-#ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
 #endif
 
 #undef  TARGET_HANDLE_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 56a4a47db73..ca4363e7831 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20535,9 +20535,11 @@ aarch64_run_selftests (void)
 #undef TARGET_C_MODE_FOR_SUFFIX
 #define TARGET_C_MODE_FOR_SUFFIX aarch64_c_mode_for_suffix
 
-#ifdef TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
-#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END)
+#ifdef  TARGET_BIG_ENDIAN_DEFAULT
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_END | MASK_OUTLINE_ATOMICS)
+#else
+#define TARGET_DEFAULT_TARGET_FLAGS (MASK_OUTLINE_ATOMICS)
 #endif
 
 #undef TARGET_CLASS_MAX_NREGS
-- 
2.17.1



[PATCH, AArch64 v4 2/6] aarch64: Implement TImode compare-and-swap

2019-09-17 Thread Richard Henderson
This pattern will only be used with the __sync functions, because
we do not yet have a bare TImode atomic load.

* config/aarch64/aarch64.c (aarch64_gen_compare_reg): Add support
for NE comparison of TImode values.
(aarch64_emit_load_exclusive): Add support for TImode.
(aarch64_emit_store_exclusive): Likewise.
(aarch64_split_compare_and_swap): Disable strong_zero_p for TImode.
* config/aarch64/atomics.md (@atomic_compare_and_swap):
Change iterator from ALLI to ALLI_TI.
(@atomic_compare_and_swap): New.
(@atomic_compare_and_swap_lse): New.
(aarch64_load_exclusive_pair): New.
(aarch64_store_exclusive_pair): New.
* config/aarch64/iterators.md (JUST_TI): New.
---
 gcc/config/aarch64/aarch64.c| 48 ++---
 gcc/config/aarch64/atomics.md   | 93 +++--
 gcc/config/aarch64/iterators.md |  3 ++
 3 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 99d51e2aef9..a5c4f55627d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2039,10 +2039,33 @@ emit_set_insn (rtx x, rtx y)
 rtx
 aarch64_gen_compare_reg (RTX_CODE code, rtx x, rtx y)
 {
-  machine_mode mode = SELECT_CC_MODE (code, x, y);
-  rtx cc_reg = gen_rtx_REG (mode, CC_REGNUM);
+  machine_mode cmp_mode = GET_MODE (x);
+  machine_mode cc_mode;
+  rtx cc_reg;
 
-  emit_set_insn (cc_reg, gen_rtx_COMPARE (mode, x, y));
+  if (cmp_mode == TImode)
+{
+  gcc_assert (code == NE);
+
+  cc_mode = CCmode;
+  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+
+  rtx x_lo = operand_subword (x, 0, 0, TImode);
+  rtx y_lo = operand_subword (y, 0, 0, TImode);
+  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x_lo, y_lo));
+
+  rtx x_hi = operand_subword (x, 1, 0, TImode);
+  rtx y_hi = operand_subword (y, 1, 0, TImode);
+  emit_insn (gen_ccmpdi (cc_reg, cc_reg, x_hi, y_hi,
+gen_rtx_EQ (cc_mode, cc_reg, const0_rtx),
+GEN_INT (AARCH64_EQ)));
+}
+  else
+{
+  cc_mode = SELECT_CC_MODE (code, x, y);
+  cc_reg = gen_rtx_REG (cc_mode, CC_REGNUM);
+  emit_set_insn (cc_reg, gen_rtx_COMPARE (cc_mode, x, y));
+}
   return cc_reg;
 }
 
@@ -2593,7 +2616,6 @@ aarch64_zero_extend_const_eq (machine_mode xmode, rtx x,
   gcc_assert (r != NULL);
   return rtx_equal_p (x, r);
 }
- 
 
 /* Return TARGET if it is nonnull and a register of mode MODE.
Otherwise, return a fresh register of mode MODE if we can,
@@ -16814,16 +16836,26 @@ static void
 aarch64_emit_load_exclusive (machine_mode mode, rtx rval,
 rtx mem, rtx model_rtx)
 {
-  emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
+  if (mode == TImode)
+emit_insn (gen_aarch64_load_exclusive_pair (gen_lowpart (DImode, rval),
+   gen_highpart (DImode, rval),
+   mem, model_rtx));
+  else
+emit_insn (gen_aarch64_load_exclusive (mode, rval, mem, model_rtx));
 }
 
 /* Emit store exclusive.  */
 
 static void
 aarch64_emit_store_exclusive (machine_mode mode, rtx bval,
- rtx rval, rtx mem, rtx model_rtx)
+ rtx mem, rtx rval, rtx model_rtx)
 {
-  emit_insn (gen_aarch64_store_exclusive (mode, bval, rval, mem, model_rtx));
+  if (mode == TImode)
+emit_insn (gen_aarch64_store_exclusive_pair
+  (bval, mem, operand_subword (rval, 0, 0, TImode),
+   operand_subword (rval, 1, 0, TImode), model_rtx));
+  else
+emit_insn (gen_aarch64_store_exclusive (mode, bval, mem, rval, model_rtx));
 }
 
 /* Mark the previous jump instruction as unlikely.  */
@@ -16950,7 +16982,7 @@ aarch64_split_compare_and_swap (rtx operands[])
CBNZscratch, .label1
 .label2:
CMP rval, 0.  */
-  bool strong_zero_p = !is_weak && oldval == const0_rtx;
+  bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
 
   label1 = NULL;
   if (!is_weak)
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index a679270cd38..f8bdd048b37 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -21,11 +21,11 @@
 ;; Instruction patterns.
 
 (define_expand "@atomic_compare_and_swap"
-  [(match_operand:SI 0 "register_operand") ;; bool out
-   (match_operand:ALLI 1 "register_operand")   ;; val out
-   (match_operand:ALLI 2 "aarch64_sync_memory_operand");; 
memory
-   (match_operand:ALLI 3 "nonmemory_operand")  ;; expected
-   (match_operand:ALLI 4 "aarch64_reg_or_zero");; 
desired
+  [(match_operand:SI 0 "register_operand" "")  ;; bool out
+   (match_operand:ALLI_TI 1 "register_operand" "") 

[PATCH, AArch64 v4 3/6] aarch64: Tidy aarch64_split_compare_and_swap

2019-09-17 Thread Richard Henderson
With aarch64_track_speculation, we had extra code to do exactly what the
!strong_zero_p path already did.  The rest is reducing code duplication.

* config/aarch64/aarch64 (aarch64_split_compare_and_swap): Disable
strong_zero_p for aarch64_track_speculation; unify some code paths;
use aarch64_gen_compare_reg instead of open-coding.
---
 gcc/config/aarch64/aarch64.c | 50 ++--
 1 file changed, 14 insertions(+), 36 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a5c4f55627d..b937514e6f8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16955,13 +16955,11 @@ aarch64_emit_post_barrier (enum memmodel model)
 void
 aarch64_split_compare_and_swap (rtx operands[])
 {
-  rtx rval, mem, oldval, newval, scratch;
+  rtx rval, mem, oldval, newval, scratch, x, model_rtx;
   machine_mode mode;
   bool is_weak;
   rtx_code_label *label1, *label2;
-  rtx x, cond;
   enum memmodel model;
-  rtx model_rtx;
 
   rval = operands[0];
   mem = operands[1];
@@ -16982,7 +16980,8 @@ aarch64_split_compare_and_swap (rtx operands[])
CBNZscratch, .label1
 .label2:
CMP rval, 0.  */
-  bool strong_zero_p = !is_weak && oldval == const0_rtx && mode != TImode;
+  bool strong_zero_p = (!is_weak && !aarch64_track_speculation &&
+   oldval == const0_rtx && mode != TImode);
 
   label1 = NULL;
   if (!is_weak)
@@ -16995,35 +16994,20 @@ aarch64_split_compare_and_swap (rtx operands[])
   /* The initial load can be relaxed for a __sync operation since a final
  barrier will be emitted to stop code hoisting.  */
   if (is_mm_sync (model))
-aarch64_emit_load_exclusive (mode, rval, mem,
-GEN_INT (MEMMODEL_RELAXED));
+aarch64_emit_load_exclusive (mode, rval, mem, GEN_INT (MEMMODEL_RELAXED));
   else
 aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
 
   if (strong_zero_p)
-{
-  if (aarch64_track_speculation)
-   {
- /* Emit an explicit compare instruction, so that we can correctly
-track the condition codes.  */
- rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
- x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
-   }
-  else
-   x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
-
-  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
-   gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
-  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
-}
+x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
   else
 {
-  cond = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
-  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
-  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
-   gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
-  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+  rtx cc_reg = aarch64_gen_compare_reg_maybe_ze (NE, rval, oldval, mode);
+  x = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
 }
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+   gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
+  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
   aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
@@ -17044,22 +17028,16 @@ aarch64_split_compare_and_swap (rtx operands[])
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 }
   else
-{
-  cond = gen_rtx_REG (CCmode, CC_REGNUM);
-  x = gen_rtx_COMPARE (CCmode, scratch, const0_rtx);
-  emit_insn (gen_rtx_SET (cond, x));
-}
+aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 
   emit_label (label2);
+
   /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
  to set the condition flags.  If this is not used it will be removed by
  later passes.  */
   if (strong_zero_p)
-{
-  cond = gen_rtx_REG (CCmode, CC_REGNUM);
-  x = gen_rtx_COMPARE (CCmode, rval, const0_rtx);
-  emit_insn (gen_rtx_SET (cond, x));
-}
+aarch64_gen_compare_reg (NE, rval, const0_rtx);
+
   /* Emit any final barrier needed for a __sync operation.  */
   if (is_mm_sync (model))
 aarch64_emit_post_barrier (model);
-- 
2.17.1



[PATCH, AArch64 v4 1/6] aarch64: Extend %R for integer registers

2019-09-17 Thread Richard Henderson
* config/aarch64/aarch64.c (aarch64_print_operand): Allow integer
registers with %R.
---
 gcc/config/aarch64/aarch64.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 232317d4a5a..99d51e2aef9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8420,7 +8420,7 @@ sizetochar (int size)
  'S/T/U/V':Print a FP/SIMD register name for a register 
list.
The register printed is the FP/SIMD register name
of X + 0/1/2/3 for S/T/U/V.
- 'R':  Print a scalar FP/SIMD register name + 1.
+ 'R':  Print a scalar Integer/FP/SIMD register name + 1.
  'X':  Print bottom 16 bits of integer constant in hex.
  'w/x':Print a general register name or the zero register
(32-bit or 64-bit).
@@ -8623,12 +8623,13 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   break;
 
 case 'R':
-  if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
-   {
- output_operand_lossage ("incompatible floating point / vector 
register operand for '%%%c'", code);
- return;
-   }
-  asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+  if (REG_P (x) && FP_REGNUM_P (REGNO (x)))
+   asm_fprintf (f, "q%d", REGNO (x) - V0_REGNUM + 1);
+  else if (REG_P (x) && GP_REGNUM_P (REGNO (x)))
+   asm_fprintf (f, "x%d", REGNO (x) - R0_REGNUM + 1);
+  else
+   output_operand_lossage ("incompatible register operand for '%%%c'",
+   code);
   break;
 
 case 'X':
-- 
2.17.1



[PATCH, AArch64 v4 0/6] LSE atomics out-of-line

2019-09-17 Thread Richard Henderson
Version 3 was back in November:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html

Changes since v3:
  * Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
This is the probable cause of the bootstrap problem that Kyrill reported.
  * Add unwind markers to the out-of-line functions.
  * Use uxt{8,16} instead of mov in CAS functions,
in preference to including the uxt with the cmp.
  * Prefer the lse case in the out-of-line fallthru (Wilco).
  * Name the option -moutline-atomics (Wilco)
  * Name the variable __aarch64_have_lse_atomics (Wilco);
fix the definition in lse-init.c.
  * Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
  * Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).

Still not done is a custom calling convention during code generation,
but that can come later as an optimization.

Tested aarch64-linux on a thunder x1.
I have not run tests on any platform supporting LSE, even qemu.


r~


Richard Henderson (6):
  aarch64: Extend %R for integer registers
  aarch64: Implement TImode compare-and-swap
  aarch64: Tidy aarch64_split_compare_and_swap
  aarch64: Add out-of-line functions for LSE atomics
  aarch64: Implement -moutline-atomics
  TESTING: Enable -moutline-atomics by default

 gcc/config/aarch64/aarch64-protos.h   |  13 +
 gcc/common/config/aarch64/aarch64-common.c|   6 +-
 gcc/config/aarch64/aarch64.c  | 204 +++
 .../atomic-comp-swap-release-acquire.c|   2 +-
 .../gcc.target/aarch64/atomic-op-acq_rel.c|   2 +-
 .../gcc.target/aarch64/atomic-op-acquire.c|   2 +-
 .../gcc.target/aarch64/atomic-op-char.c   |   2 +-
 .../gcc.target/aarch64/atomic-op-consume.c|   2 +-
 .../gcc.target/aarch64/atomic-op-imm.c|   2 +-
 .../gcc.target/aarch64/atomic-op-int.c|   2 +-
 .../gcc.target/aarch64/atomic-op-long.c   |   2 +-
 .../gcc.target/aarch64/atomic-op-relaxed.c|   2 +-
 .../gcc.target/aarch64/atomic-op-release.c|   2 +-
 .../gcc.target/aarch64/atomic-op-seq_cst.c|   2 +-
 .../gcc.target/aarch64/atomic-op-short.c  |   2 +-
 .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
 .../atomic_cmp_exchange_zero_strong_1.c   |   2 +-
 .../gcc.target/aarch64/sync-comp-swap.c   |   2 +-
 .../gcc.target/aarch64/sync-op-acquire.c  |   2 +-
 .../gcc.target/aarch64/sync-op-full.c |   2 +-
 libgcc/config/aarch64/lse-init.c  |  45 
 gcc/config/aarch64/aarch64.opt|   3 +
 gcc/config/aarch64/atomics.md | 187 +-
 gcc/config/aarch64/iterators.md   |   3 +
 gcc/doc/invoke.texi   |  16 +-
 libgcc/config.host|   4 +
 libgcc/config/aarch64/lse.S   | 235 ++
 libgcc/config/aarch64/t-lse   |  44 
 28 files changed, 709 insertions(+), 85 deletions(-)
 create mode 100644 libgcc/config/aarch64/lse-init.c
 create mode 100644 libgcc/config/aarch64/lse.S
 create mode 100644 libgcc/config/aarch64/t-lse

-- 
2.17.1



Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-17 Thread Richard Henderson
On 9/17/19 6:55 AM, Wilco Dijkstra wrote:
> Hi Kyrill,
> 
>>> When you select a CPU the goal is that we optimize and schedule for that
>>> specific microarchitecture. That implies using atomics that work best for
>>> that core rather than outlining them.
>>
>> I think we want to go ahead with this framework to enable the portable 
>> deployment of LSE atomics.
>>
>> More CPU-specific fine-tuning can come later separately.
> 
> I'm not talking about CPU-specific fine-tuning, but ensuring we don't penalize
> performance when a user selects the specific CPU their application will run 
> on.
> And in that case outlining is unnecessary.

>From aarch64_override_options:

Given both -march=foo -mcpu=bar, then the architecture will be foo and -mcpu
will be treated as -mtune=bar, but will not use any insn not in foo.

Given only -mcpu=foo, then the architecture will be the one supported by foo.

So if foo supports LSE, then we will not outline the functions, no matter how
we arrive at foo.


r~


Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-14 Thread Richard Henderson
On 9/5/19 10:35 AM, Wilco Dijkstra wrote:
> Agreed. I've got a couple of general comments:
> 
> * The option name -matomic-ool sounds too abbreviated. I think eg.
> -moutline-atomics is more descriptive and user friendlier.

Changed.

> * Similarly the exported __aa64_have_atomics variable could be named
>   __aarch64_have_lse_atomics so it's clear that it is about LSE atomics.

Changed.

> +@item -matomic-ool
> +@itemx -mno-atomic-ool
> +Enable or disable calls to out-of-line helpers to implement atomic 
> operations.
> +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions
> +should be used; if not, they will use the load/store-exclusive instructions
> +that are present in the base ARMv8.0 ISA.
> +
> +This option is only applicable when compiling for the base ARMv8.0
> +instruction set.  If using a later revision, e.g. @option{-march=armv8.1-a}
> +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be
> +used directly. 
> 
> So what is the behaviour when you explicitly select a specific CPU?

Selecting a specific cpu selects the specific architecture that the cpu
supports, does it not?  Thus the architecture example above still applies.

Unless I don't understand what distinction that you're making?

> +/* Branch to LABEL if LSE is enabled.
> +   The branch should be easily predicted, in that it will, after 
> constructors,
> +   always branch the same way.  The expectation is that systems that 
> implement
> +   ARMv8.1-Atomics are "beefier" than those that omit the extension.
> +   By arranging for the fall-through path to use load-store-exclusive insns,
> +   we aid the branch predictor of the smallest cpus.  */ 
> 
> I'd say that by the time GCC10 is released and used in distros, systems 
> without
> LSE atomics would be practically non-existent. So we should favour LSE atomics
> by default.

I suppose.  Does it not continue to be true that an a53 is more impacted by the
branch prediction than an a76?


r~


  1   2   3   4   5   6   7   8   9   10   >