Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
On 30/10/17 13:54, Wilco Dijkstra wrote: Kyrill Tkachov wrote: On 16/10/17 12:30, Wilco Dijkstra wrote: DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. ... but it's still used, and the patterns were put there for a reason. Even if GCC itself doesn't use them much they may be used by other applications. So I'd support removing the left shift 1-bit expansions, but not the right shift ones. The purpose of removing the shift-by-1 cases is not just to cleanup code but also to improve code generation. These shifts cannot expand early and thus don't benefit from optimization (like shift merging). They also suffer from the DImode register allocation issues. As a simple example this loop runs >20% faster with my patch on both Cortex-A53 and Cortex-A57 when built with -mfpu=vfp: long long loop1 (long long x, long long y, int n) { int i; for (i = 0; i < n; i++) { x >>= 1; x |= y; x >>= 1; x |= y; x >>= 1; x |= y; x >>= 1; x |= y; } return x; } So given these shifts are bad for performance, why have them at all? Thanks, that makes sense. Ok for trunk. Thanks, Kyrill Wilco
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
Kyrill Tkachov wrote: > On 16/10/17 12:30, Wilco Dijkstra wrote: > > DImode right shifts of 1 are rarely used (6 in total in the GCC binary), > > so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit > > patterns. > > ... but it's still used, and the patterns were put there for a reason. > Even if GCC itself doesn't use them much they may be used by other > applications. > > So I'd support removing the left shift 1-bit expansions, but not the > right shift ones. The purpose of removing the shift-by-1 cases is not just to cleanup code but also to improve code generation. These shifts cannot expand early and thus don't benefit from optimization (like shift merging). They also suffer from the DImode register allocation issues. As a simple example this loop runs >20% faster with my patch on both Cortex-A53 and Cortex-A57 when built with -mfpu=vfp: long long loop1 (long long x, long long y, int n) { int i; for (i = 0; i < n; i++) { x >>= 1; x |= y; x >>= 1; x |= y; x >>= 1; x |= y; x >>= 1; x |= y; } return x; } So given these shifts are bad for performance, why have them at all? Wilco
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
Hi Wilco, Sorry for the delay. On 16/10/17 12:30, Wilco Dijkstra wrote: ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. I agree... DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. ... but it's still used, and the patterns were put there for a reason. Even if GCC itself doesn't use them much they may be used by other applications. So I'd support removing the left shift 1-bit expansions, but not the right shift ones. Thanks, Kyrill Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) -{ - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; -} - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI0 "s_register_operand" "=r,") -(ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) -{ - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; -} - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") -(ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) -{ - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; -} - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type"
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type"
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")]
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")]
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type"
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that adddi3 is preferred in all cases, and the arm_ashldi3_1bit is redundant. DImode right shifts of 1 are rarely used (6 in total in the GCC binary), so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } else if (mode == SImode) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4061,12 +4061,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4083,18 +4077,6 @@ " ) -(define_insn "arm_ashldi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") @@ -4130,12 +4112,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4152,18 +4128,6 @@ " ) -(define_insn "arm_ashrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")] -) - (define_expand "ashrsi3" [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") @@ -4196,12 +4160,6 @@ { rtx scratch1, scratch2; - if (operands[2] == CONST1_RTX (SImode)) - { - emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); - DONE; - } - /* Ideally we should use iwmmxt here if we could know that operands[1] ends up already living in an iwmmxt register. Otherwise it's cheaper to have the alternate code being generated than moving @@ -4218,18 +4176,6 @@ " ) -(define_insn "arm_lshrdi3_1bit" - [(set (match_operand:DI 0 "s_register_operand" "=r,") - (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r") - (const_int 1))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" - "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx" - [(set_attr "conds" "clob") - (set_attr "length" "8") - (set_attr "type" "multiple")]
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
kugan wrote: > Wilco Dijkstra wrote: > > + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ > > + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) > Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is > done only for SImode. Am I missing something? The diff doesn't show enough context, but this is inside an if that checks for DImode shifts. Note the shift count is SImode. > Also, what was the motivation for this patch. Is that to improve the > maintainability of the arm back-end? These particular patterns should never have existed. Optimized expansions should be added to arm_emit_coreregs_64bit_shift. You may have noticed a few patches have been proposed recently to improve the generated code of DImode operations (PR77308). The key realization was that GCC will generate absolutely terrible code unless either all DImode operations are split before register allocation, or we only use Neon instructions. There is no middle ground here, trying to allocate DImode registers from only 5 available register pairs (if lucky) just isn't going to work. So the goal is to enable early splitting in all DImode patterns. Removing no-split multi-instruction patterns helps - these are a bad idea anyway. Wilco
Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
Hi Wilco, On 18/01/17 06:23, Wilco Dijkstra wrote: ChangeLog: 2017-01-17 Wilco Dijkstra* config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion. (arm_ashrdi3_1bit): Remove pattern. (lshrdi3): Remove shift by 1 expansion. (arm_lshrdi3_1bit): Remove pattern. * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase cost of ashldi3 by 1. * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion. (di3_neon): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p)); if (speed_p) *cost += 2 * extra_cost->alu.shift; + /* Slightly disparage left shift by 1 at so we prefer adddi3. */ + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) + *cost += 1; return true; } Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is done only for SImode. Am I missing something? Also, what was the motivation for this patch. Is that to improve the maintainability of the arm back-end? Thanks, Kugan