Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-30 Thread Kyrill Tkachov


On 30/10/17 13:54, Wilco Dijkstra wrote:

Kyrill Tkachov wrote:

On 16/10/17 12:30, Wilco Dijkstra wrote:

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

... but it's still used, and the patterns were put there for a reason.
Even if GCC itself doesn't use them much they may be used by other
applications.

So I'd support removing the left shift 1-bit expansions, but not the
right shift ones.

The purpose of removing the shift-by-1 cases is not just to cleanup code
but also to improve code generation. These shifts cannot expand early
and thus don't benefit from optimization (like shift merging). They also
suffer from the DImode register allocation issues.

As a simple example this loop runs >20% faster with my patch on both
Cortex-A53 and Cortex-A57 when built with -mfpu=vfp:

long long loop1 (long long x, long long y, int n)
{
   int i;
for (i = 0; i < n; i++)
{
   x >>= 1;
   x |= y;
   x >>= 1;
   x |= y;
   x >>= 1;
   x |= y;
   x >>= 1;
   x |= y;
}
return x;
}

So given these shifts are bad for performance, why have them at all?


Thanks, that makes sense.
Ok for trunk.

Thanks,
Kyrill


Wilco




Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-30 Thread Wilco Dijkstra
Kyrill Tkachov wrote:
> On 16/10/17 12:30, Wilco Dijkstra wrote:

> > DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
> > so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
> > patterns.
>
> ... but it's still used, and the patterns were put there for a reason.
> Even if GCC itself doesn't use them much they may be used by other 
> applications.
> 
> So I'd support removing the left shift 1-bit expansions, but not the 
> right shift ones.

The purpose of removing the shift-by-1 cases is not just to cleanup code
but also to improve code generation. These shifts cannot expand early
and thus don't benefit from optimization (like shift merging). They also
suffer from the DImode register allocation issues.

As a simple example this loop runs >20% faster with my patch on both
Cortex-A53 and Cortex-A57 when built with -mfpu=vfp:

long long loop1 (long long x, long long y, int n)
{
  int i;
   for (i = 0; i < n; i++)
   {
  x >>= 1;
  x |= y;
  x >>= 1;
  x |= y;
  x >>= 1;
  x |= y;
  x >>= 1;
  x |= y;
   }
   return x;
}

So given these shifts are bad for performance, why have them at all?

Wilco

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-27 Thread Kyrill Tkachov

Hi Wilco,

Sorry for the delay.

On 16/10/17 12:30, Wilco Dijkstra wrote:
 


ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
 
A left shift of 1 can always be done using an add, so slightly adjust rtx

cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.


I agree...


DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.


... but it's still used, and the patterns were put there for a reason.
Even if GCC itself doesn't use them much they may be used by other 
applications.


So I'd support removing the left shift 1-bit expansions, but not the 
right shift ones.


Thanks,
Kyrill


Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

 * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
 (arm_ashldi3_1bit): Remove pattern.
 (ashrdi3): Remove shift by 1 expansion.
 (arm_ashrdi3_1bit): Remove pattern.
 (lshrdi3): Remove shift by 1 expansion.
 (arm_lshrdi3_1bit): Remove pattern.
 * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
 cost of ashldi3 by 1.
 * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
 (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
 + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
if (speed_p)
  *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
return true;
  }
else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
"
  )
  
-(define_insn "arm_ashldi3_1bit"

-  [(set (match_operand:DI0 "s_register_operand" "=r,")
-(ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
  (define_expand "ashlsi3"
[(set (match_operand:SI0 "s_register_operand" "")
  (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
"
  )
  
-(define_insn "arm_ashrdi3_1bit"

-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-(ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
  (define_expand "ashrsi3"
[(set (match_operand:SI  0 "s_register_operand" "")
  (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-16 Thread Wilco Dijkstra
    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" 

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-06-27 Thread Wilco Dijkstra
    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" 

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-06-13 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-02-23 Thread Wilco Dijkstra
    

ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" 

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-02-02 Thread Wilco Dijkstra

ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread Wilco Dijkstra
kugan wrote:
> Wilco Dijkstra wrote:
> > +   /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
> > +   if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))

> Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
> done only for SImode. Am I missing something?

The diff doesn't show enough context, but this is inside an if that checks
for DImode shifts. Note the shift count is SImode. 

> Also, what was the motivation for this patch. Is that to improve the 
> maintainability of the arm back-end?

These particular patterns should never have existed. Optimized
expansions should be added to arm_emit_coreregs_64bit_shift.

You may have noticed a few patches have been proposed recently to 
improve the generated code of DImode operations (PR77308).
The key realization was that GCC will generate absolutely terrible code 
unless either all DImode operations are split before register allocation,
or we only use Neon instructions. There is no middle ground here, trying
to allocate DImode registers from only 5 available register pairs (if lucky)
just isn't going to work.

So the goal is to enable early splitting in all DImode patterns. Removing
no-split multi-instruction patterns helps -  these are a bad idea anyway.

Wilco

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread kugan

Hi Wilco,

On 18/01/17 06:23, Wilco Dijkstra wrote:

ChangeLog:
2017-01-17  Wilco Dijkstra  

* config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
(arm_ashldi3_1bit): Remove pattern.
(ashrdi3): Remove shift by 1 expansion.
(arm_ashrdi3_1bit): Remove pattern.
(lshrdi3): Remove shift by 1 expansion.
(arm_lshrdi3_1bit): Remove pattern.
* config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
cost of ashldi3 by 1.
* config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
(di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
   + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
  if (speed_p)
*cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
  return true;
}
Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
done only for SImode. Am I missing something?


Also, what was the motivation for this patch. Is that to improve the 
maintainability of the arm back-end?


Thanks,
Kugan