Re: [PATCH] [Aarch64] Variable shift count truncation issues
Okay I will take a look at this. Michael Collison > On Jun 30, 2017, at 11:04 AM, Andreas Schwabwrote: > >> On Jun 23 2017, Michael Collison wrote: >> >> diff --git a/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c >> b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c >> new file mode 100644 >> index 000..e2b020e >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c >> @@ -0,0 +1,61 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +/* The integer variable shift and rotate instructions truncate their >> + shift amounts by the datasize. Make sure that we don't emit a redundant >> + masking operation. */ >> + >> +unsigned >> +f1 (unsigned x, int y) >> +{ >> + return x << (y & 31); >> +} >> + >> +unsigned long >> +f2 (unsigned long x, int y) >> +{ >> + return x << (y & 63); >> +} >> + >> +unsigned long >> +f3 (unsigned long bit_addr, int y) >> +{ >> + unsigned long bitnumb = bit_addr & 63; >> + return (1L << bitnumb); >> +} >> + >> +unsigned int >> +f4 (unsigned int x, unsigned int y) >> +{ >> + y &= 31; >> + return x >> y | (x << (32 - y)); >> +} >> + >> +unsigned long >> +f5 (unsigned long x, unsigned long y) >> +{ >> + y &= 63; >> + return x >> y | (x << (64 - y)); >> +} >> + >> +unsigned long >> +f6 (unsigned long x, unsigned long y) >> +{ >> + >> + return (x << (64 - (y & 63))); >> + >> +} >> + >> +unsigned long >> +f7 (unsigned long x, unsigned long y) >> +{ >> + return (x << -(y & 63)); >> +} >> + >> +/* { dg-final { scan-assembler-times "lsl\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" >> 1 } } */ >> +/* { dg-final { scan-assembler-times "lsl\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" >> 4 } } */ >> +/* { dg-final { scan-assembler-times "ror\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" >> 1 } } */ >> +/* { dg-final { scan-assembler-times "ror\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" >> 1 } } */ >> +/* { dg-final { scan-assembler-not "and\tw\[0-9\]+, w\[0-9\]+, 31" } } */ >> +/* { dg-final { scan-assembler-not "and\tx\[0-9\]+, x\[0-9\]+, 63" } } */ >> +/* { dg-final { scan-assembler-not "sub\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } >> } */ > > That fails with -mabi=ilp32. > > Andreas. > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different."
Re: [PATCH] [Aarch64] Variable shift count truncation issues
On Jun 23 2017, Michael Collisonwrote: > diff --git a/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c > b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c > new file mode 100644 > index 000..e2b020e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/var_shift_mask_1.c > @@ -0,0 +1,61 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +/* The integer variable shift and rotate instructions truncate their > + shift amounts by the datasize. Make sure that we don't emit a redundant > + masking operation. */ > + > +unsigned > +f1 (unsigned x, int y) > +{ > + return x << (y & 31); > +} > + > +unsigned long > +f2 (unsigned long x, int y) > +{ > + return x << (y & 63); > +} > + > +unsigned long > +f3 (unsigned long bit_addr, int y) > +{ > + unsigned long bitnumb = bit_addr & 63; > + return (1L << bitnumb); > +} > + > +unsigned int > +f4 (unsigned int x, unsigned int y) > +{ > + y &= 31; > + return x >> y | (x << (32 - y)); > +} > + > +unsigned long > +f5 (unsigned long x, unsigned long y) > +{ > + y &= 63; > + return x >> y | (x << (64 - y)); > +} > + > +unsigned long > +f6 (unsigned long x, unsigned long y) > +{ > + > + return (x << (64 - (y & 63))); > + > +} > + > +unsigned long > +f7 (unsigned long x, unsigned long y) > +{ > + return (x << -(y & 63)); > +} > + > +/* { dg-final { scan-assembler-times "lsl\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" > 1 } } */ > +/* { dg-final { scan-assembler-times "lsl\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" > 4 } } */ > +/* { dg-final { scan-assembler-times "ror\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" > 1 } } */ > +/* { dg-final { scan-assembler-times "ror\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" > 1 } } */ > +/* { dg-final { scan-assembler-not "and\tw\[0-9\]+, w\[0-9\]+, 31" } } */ > +/* { dg-final { scan-assembler-not "and\tx\[0-9\]+, x\[0-9\]+, 63" } } */ > +/* { dg-final { scan-assembler-not "sub\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } > } */ That fails with -mabi=ilp32. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH] [Aarch64] Variable shift count truncation issues
On Fri, Jun 23, 2017 at 10:27:55AM +0100, Michael Collison wrote: > Fixed the "nitpick" issues pointed out by James. Okay for trunk? > > I have a few comments below, which are closer to nitpicking than structural > > issues with the patch. > > > > With those fixed, this is OK to commit. This is still OK for trunk. Thanks, James > 2017-05-22 Kyrylo Tkachov> Michael Collison > > PR target/70119 > * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): > New pattern. > (*aarch64_reg_3_neg_mask2): New pattern. > (*aarch64_reg_3_minus_mask): New pattern. > (*aarch64__reg_di3_mask2): New pattern. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost > of shift when the shift amount is masked with constant equal to > the size of the mode. > * config/aarch64/predicates.md (subreg_lowpart_operator): New > predicate. > > > 2016-05-22 Kyrylo Tkachov > Michael Collison > > PR target/70119 > * gcc.target/aarch64/var_shift_mask_1.c: New test. >
RE: [PATCH] [Aarch64] Variable shift count truncation issues
Fixed the "nitpick" issues pointed out by James. Okay for trunk? 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. * config/aarch64/predicates.md (subreg_lowpart_operator): New predicate. 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. -Original Message- From: James Greenhalgh [mailto:james.greenha...@arm.com] Sent: Thursday, June 22, 2017 3:17 AM To: Michael Collison <michael.colli...@arm.com>; Wilco Dijkstra <wilco.dijks...@arm.com>; Christophe Lyon <christophe.l...@linaro.org>; GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; richard.sandif...@linaro.org Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues On Wed, Jun 21, 2017 at 04:42:00PM +0100, Richard Sandiford wrote: > Michael Collison <michael.colli...@arm.com> writes: > > Updated the patch per Richard's suggestions to allow scheduling of > > instructions before reload. > > Thanks, this looks good to me FWIW, but obviously I can't approve it. Thanks for the review Richard, that gives me good confidence in this patch. I have a few comments below, which are closer to nitpicking than structural issues with the patch. With those fixed, this is OK to commit. > > 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > Michael Collison <michael.colli...@arm.com> With the work you've done, you can probably place yourself first on the ChangeLog now ;) > > > > PR target/70119 > > * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): > > New pattern. > > (*aarch64_reg_3_neg_mask2): New pattern. > > (*aarch64_reg_3_minus_mask): New pattern. > > (*aarch64__reg_di3_mask2): New pattern. > > * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost > > of shift when the shift amount is masked with constant equal to > > the size of the mode. > > * config/aarch64/predicates.md (subreg_lowpart_operator): New > > predicate. > > > > > > 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > Michael Collison <michael.colli...@arm.com> > > > > PR target/70119 > > * gcc.target/aarch64/var_shift_mask_1.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.md > > b/gcc/config/aarch64/aarch64.md index d89df66..ff5720c 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -3942,6 +3942,97 @@ > >} > > ) > > > > +;; When the LSL, LSR, ASR, ROR instructions operate on all register > > +arguments ;; they truncate the shift/rotate amount by the size of > > +the registers they ;; operate on: 32 for W-regs, 63 for X-regs. > > +This allows us to optimise away Is this "63" a typo? Should it be 64? > > +;; such redundant masking instructions. GCC can do that > > +automatically when ;; SHIFT_COUNT_TRUNCATED is true, but we can't > > +enable it for TARGET_SIMD ;; because some of the SISD shift alternatives > > don't perform this truncations. > > +;; So this pattern exists to catch such cases. > > + > > +(define_insn "*aarch64__reg_3_mask1" > > + [(set (match_operand:GPI 0 "register_operand" "=r") > > + (SHIFT:GPI > > + (match_operand:GPI 1 "register_operand" "r") > > + (match_operator 4 "subreg_lowpart_operator" > > + [(and:GPI (match_operand:GPI 2 "register_operand" "r") > > +(match_operand 3 "const_int_operand" "n"))])))] > > + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0" Spaces around "-" > > + "\t%0, %1, %2" > > + [(set_attr "type" "shift_reg")] > > +) > > + > > +(define_insn_and_split "*aarch64_reg_3_neg_mask2" > > + [(set (match_operand:GPI 0 "register_operand" "=") > > + (SHIFT:GPI > > + (match_operand:GPI 1 "register_operand"
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Resending for the list, as the last copy got bounced. Thanks, James - Forwarded message from James Greenhalgh <james.greenha...@arm.com> - Date: Thu, 22 Jun 2017 11:16:38 +0100 From: James Greenhalgh <james.greenha...@arm.com> To: Michael Collison <michael.colli...@arm.com>, Wilco Dijkstra <wilco.dijks...@arm.com>, Christophe Lyon <christophe.l...@linaro.org>, GCC Patches <gcc-patches@gcc.gnu.org>, nd <n...@arm.com>, richard.sandif...@linaro.org Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues User-Agent: Mutt/1.5.21 (2010-09-15) On Wed, Jun 21, 2017 at 04:42:00PM +0100, Richard Sandiford wrote: > Michael Collison <michael.colli...@arm.com> writes: > > Updated the patch per Richard's suggestions to allow scheduling of > > instructions before reload. > > Thanks, this looks good to me FWIW, but obviously I can't approve it. Thanks for the review Richard, that gives me good confidence in this patch. I have a few comments below, which are closer to nitpicking than structural issues with the patch. With those fixed, this is OK to commit. > > 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > Michael Collison <michael.colli...@arm.com> With the work you've done, you can probably place yourself first on the ChangeLog now ;) > > > > PR target/70119 > > * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): > > New pattern. > > (*aarch64_reg_3_neg_mask2): New pattern. > > (*aarch64_reg_3_minus_mask): New pattern. > > (*aarch64__reg_di3_mask2): New pattern. > > * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost > > of shift when the shift amount is masked with constant equal to > > the size of the mode. > > * config/aarch64/predicates.md (subreg_lowpart_operator): New > > predicate. > > > > > > 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > Michael Collison <michael.colli...@arm.com> > > > > PR target/70119 > > * gcc.target/aarch64/var_shift_mask_1.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > > index d89df66..ff5720c 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -3942,6 +3942,97 @@ > >} > > ) > > > > +;; When the LSL, LSR, ASR, ROR instructions operate on all register > > arguments > > +;; they truncate the shift/rotate amount by the size of the registers they > > +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise > > away Is this "63" a typo? Should it be 64? > > +;; such redundant masking instructions. GCC can do that automatically when > > +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD > > +;; because some of the SISD shift alternatives don't perform this > > truncations. > > +;; So this pattern exists to catch such cases. > > + > > +(define_insn "*aarch64__reg_3_mask1" > > + [(set (match_operand:GPI 0 "register_operand" "=r") > > + (SHIFT:GPI > > + (match_operand:GPI 1 "register_operand" "r") > > + (match_operator 4 "subreg_lowpart_operator" > > + [(and:GPI (match_operand:GPI 2 "register_operand" "r") > > +(match_operand 3 "const_int_operand" "n"))])))] > > + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0" Spaces around "-" > > + "\t%0, %1, %2" > > + [(set_attr "type" "shift_reg")] > > +) > > + > > +(define_insn_and_split "*aarch64_reg_3_neg_mask2" > > + [(set (match_operand:GPI 0 "register_operand" "=") > > + (SHIFT:GPI > > + (match_operand:GPI 1 "register_operand" "r") > > + (match_operator 4 "subreg_lowpart_operator" > > + [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") > > + (match_operand 3 "const_int_operand" "n")))])))] > > + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" > > + "#" > > + "&& 1" I'd prefer "true" to "1" > > + [(const_int 0)] > > + { > > +rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) > > + : operands[0]); > > +emit_insn (gen_negsi2 (tmp, operands[2])); > > + > > +rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]); &g
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Michael Collison <michael.colli...@arm.com> writes: > Updated the patch per Richard's suggestions to allow scheduling of > instructions before reload. Thanks, this looks good to me FWIW, but obviously I can't approve it. Richard > Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk? > > 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Michael Collison <michael.colli...@arm.com> > > PR target/70119 > * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): > New pattern. > (*aarch64_reg_3_neg_mask2): New pattern. > (*aarch64_reg_3_minus_mask): New pattern. > (*aarch64__reg_di3_mask2): New pattern. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost > of shift when the shift amount is masked with constant equal to > the size of the mode. > * config/aarch64/predicates.md (subreg_lowpart_operator): New > predicate. > > > 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Michael Collison <michael.colli...@arm.com> > > PR target/70119 > * gcc.target/aarch64/var_shift_mask_1.c: New test. > > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Thursday, June 15, 2017 6:40 AM > To: Michael Collison <michael.colli...@arm.com> > Cc: Wilco Dijkstra <wilco.dijks...@arm.com>; Christophe Lyon > <christophe.l...@linaro.org>; GCC Patches <gcc-patches@gcc.gnu.org>; nd > <n...@arm.com> > Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues > > Michael Collison <michael.colli...@arm.com> writes: >> +(define_insn_and_split "*aarch64_reg_3_neg_mask2" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> +(SHIFT:GPI >> + (match_operand:GPI 1 "register_operand" "r") >> + (match_operator 4 "subreg_lowpart_operator" >> + [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") >> + (match_operand 3 "const_int_operand" "n")))]))) >> + (clobber (match_scratch:SI 5 "="))] >> + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" >> + "#" >> + "&& reload_completed" >> + [(const_int 0)] >> + { >> +emit_insn (gen_negsi2 (operands[5], operands[2])); >> + >> +rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]); >> +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, >> + SUBREG_BYTE (operands[4])); >> +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); >> +DONE; >> + } >> +) > > Thanks, I agree this looks correct from the split/reload_completed POV. > I think we can go one better though, either: > > (a) Still allow the split when !reload_completed, and use: > > if (GET_MODE (operands[5]) == SCRATCH) >operands[5] = gen_reg_rtx (SImode); > > This will allow the individual instructions to be scheduled by sched1. > > (b) Continue to restrict the split to reload_completed, change operand 0 > to = so that it can be used as a temporary, and drop operand 5 entirely. > > Or perhaps do both: > > (define_insn_and_split "*aarch64_reg_3_neg_mask2" > [(set (match_operand:GPI 0 "register_operand" "=") > (SHIFT:GPI > (match_operand:GPI 1 "register_operand" "r") > (match_operator 4 "subreg_lowpart_operator" > [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") > (match_operand 3 "const_int_operand" "n")))])))] > "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" > "#" > "&& 1" > [(const_int 0)] > { > rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (mode) >: operands[0]); > emit_insn (gen_negsi2 (tmp, operands[2])); > > rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]); > rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, >SUBREG_BYTE (operands[4])); > emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); > DONE; > } > ) > > Sorry for the run-around. I should have realised earlier that these patterns > didn't really need a distinct register after RA. > > Thanks, > Richard > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index
RE: [PATCH] [Aarch64] Variable shift count truncation issues
Updated the patch per Richard's suggestions to allow scheduling of instructions before reload. Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk? 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. * config/aarch64/predicates.md (subreg_lowpart_operator): New predicate. 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. -Original Message- From: Richard Sandiford [mailto:richard.sandif...@linaro.org] Sent: Thursday, June 15, 2017 6:40 AM To: Michael Collison <michael.colli...@arm.com> Cc: Wilco Dijkstra <wilco.dijks...@arm.com>; Christophe Lyon <christophe.l...@linaro.org>; GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com> Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues Michael Collison <michael.colli...@arm.com> writes: > +(define_insn_and_split "*aarch64_reg_3_neg_mask2" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (match_operator 4 "subreg_lowpart_operator" > + [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") > +(match_operand 3 "const_int_operand" "n")))]))) > + (clobber (match_scratch:SI 5 "="))] > + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" > + "#" > + "&& reload_completed" > + [(const_int 0)] > + { > +emit_insn (gen_negsi2 (operands[5], operands[2])); > + > +rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]); > +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, > + SUBREG_BYTE (operands[4])); > +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); > +DONE; > + } > +) Thanks, I agree this looks correct from the split/reload_completed POV. I think we can go one better though, either: (a) Still allow the split when !reload_completed, and use: if (GET_MODE (operands[5]) == SCRATCH) operands[5] = gen_reg_rtx (SImode); This will allow the individual instructions to be scheduled by sched1. (b) Continue to restrict the split to reload_completed, change operand 0 to = so that it can be used as a temporary, and drop operand 5 entirely. Or perhaps do both: (define_insn_and_split "*aarch64_reg_3_neg_mask2" [(set (match_operand:GPI 0 "register_operand" "=") (SHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operator 4 "subreg_lowpart_operator" [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") (match_operand 3 "const_int_operand" "n")))])))] "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" "#" "&& 1" [(const_int 0)] { rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (mode) : operands[0]); emit_insn (gen_negsi2 (tmp, operands[2])); rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]); rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, SUBREG_BYTE (operands[4])); emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); DONE; } ) Sorry for the run-around. I should have realised earlier that these patterns didn't really need a distinct register after RA. Thanks, Richard pr5546v5.patch Description: pr5546v5.patch
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Michael Collisonwrites: > +(define_insn_and_split "*aarch64_reg_3_neg_mask2" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (match_operator 4 "subreg_lowpart_operator" > + [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") > +(match_operand 3 "const_int_operand" "n")))]))) > + (clobber (match_scratch:SI 5 "="))] > + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" > + "#" > + "&& reload_completed" > + [(const_int 0)] > + { > +emit_insn (gen_negsi2 (operands[5], operands[2])); > + > +rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]); > +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, > + SUBREG_BYTE (operands[4])); > +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); > +DONE; > + } > +) Thanks, I agree this looks correct from the split/reload_completed POV. I think we can go one better though, either: (a) Still allow the split when !reload_completed, and use: if (GET_MODE (operands[5]) == SCRATCH) operands[5] = gen_reg_rtx (SImode); This will allow the individual instructions to be scheduled by sched1. (b) Continue to restrict the split to reload_completed, change operand 0 to = so that it can be used as a temporary, and drop operand 5 entirely. Or perhaps do both: (define_insn_and_split "*aarch64_reg_3_neg_mask2" [(set (match_operand:GPI 0 "register_operand" "=") (SHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operator 4 "subreg_lowpart_operator" [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r") (match_operand 3 "const_int_operand" "n")))])))] "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)" "#" "&& 1" [(const_int 0)] { rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (mode) : operands[0]); emit_insn (gen_negsi2 (tmp, operands[2])); rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]); rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op, SUBREG_BYTE (operands[4])); emit_insn (gen_3 (operands[0], operands[1], subreg_tmp)); DONE; } ) Sorry for the run-around. I should have realised earlier that these patterns didn't really need a distinct register after RA. Thanks, Richard
RE: [PATCH] [Aarch64] Variable shift count truncation issues
Updated the patch to more fully address the reload_completed issues raised by Richard, to ensure that the SIMD patterns are not used for shifts and fixed a code generation bug. Okay for trunk? Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk? 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. * config/aarch64/predicates.md (subreg_lowpart_operator): New predicate. 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. -Original Message- From: Michael Collison Sent: Tuesday, May 23, 2017 4:01 PM To: 'Richard Sandiford' <richard.sandif...@linaro.org>; Wilco Dijkstra <wilco.dijks...@arm.com>; 'Christophe Lyon' <christophe.l...@linaro.org> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com> Subject: RE: [PATCH] [Aarch64] Variable shift count truncation issues I updated the patch to resolve the big endian issues as recommended by Richard. I also changed all uses of 'can_create_pseudo_p' to '!reload_completed' to make the intent clearer. Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk? 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. * config/aarch64/predicates.md (subreg_lowpart_operator): New predicate. 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. -Original Message- From: Richard Sandiford [mailto:richard.sandif...@linaro.org] Sent: Tuesday, May 23, 2017 7:25 AM To: Wilco Dijkstra <wilco.dijks...@arm.com> Cc: Michael Collison <michael.colli...@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com> Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Richard Sandiford wrote: > >> Insn patterns shouldn't check can_create_pseudo_p, because there's no >> guarantee that the associated split happens before RA. In this case >> it should be safe to reuse operand 0 after RA if you change it to: > > The goal is to only create and split this pattern before register allocation. > It's a transient pattern, combine creates it, and it is split immediately > after. > > Maybe !reload_completed is clearer as that is what several other > define_insn_and_split patterns do? But the concept of a transient pattern doesn't really exist. We shouldn't rely for correctness on a split being applied before RA. If all we want to do is match and split something that combine generates, it would be better to do it as a pure define_split. Thanks, Richard pr5546v4.patch Description: pr5546v4.patch
RE: [PATCH] [Aarch64] Variable shift count truncation issues
I updated the patch to resolve the big endian issues as recommended by Richard. I also changed all uses of 'can_create_pseudo_p' to '!reload_completed' to make the intent clearer. Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk? 2017-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. * config/aarch64/predicates.md (subreg_lowpart_operator): New predicate. 2016-05-22 Kyrylo Tkachov <kyrylo.tkac...@arm.com> Michael Collison <michael.colli...@arm.com> PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. -Original Message- From: Richard Sandiford [mailto:richard.sandif...@linaro.org] Sent: Tuesday, May 23, 2017 7:25 AM To: Wilco Dijkstra <wilco.dijks...@arm.com> Cc: Michael Collison <michael.colli...@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com> Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Richard Sandiford wrote: > >> Insn patterns shouldn't check can_create_pseudo_p, because there's no >> guarantee that the associated split happens before RA. In this case >> it should be safe to reuse operand 0 after RA if you change it to: > > The goal is to only create and split this pattern before register allocation. > It's a transient pattern, combine creates it, and it is split immediately > after. > > Maybe !reload_completed is clearer as that is what several other > define_insn_and_split patterns do? But the concept of a transient pattern doesn't really exist. We shouldn't rely for correctness on a split being applied before RA. If all we want to do is match and split something that combine generates, it would be better to do it as a pure define_split. Thanks, Richard pr5546v3.patch Description: pr5546v3.patch
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Wilco Dijkstrawrites: > Richard Sandiford wrote: > >> Insn patterns shouldn't check can_create_pseudo_p, because there's no >> guarantee that the associated split happens before RA. In this case it >> should be safe to reuse operand 0 after RA if you change it to: > > The goal is to only create and split this pattern before register allocation. > It's a transient pattern, combine creates it, and it is split immediately > after. > > Maybe !reload_completed is clearer as that is what several other > define_insn_and_split patterns do? But the concept of a transient pattern doesn't really exist. We shouldn't rely for correctness on a split being applied before RA. If all we want to do is match and split something that combine generates, it would be better to do it as a pure define_split. Thanks, Richard
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Richard Sandiford wrote: > Insn patterns shouldn't check can_create_pseudo_p, because there's no > guarantee that the associated split happens before RA. In this case it > should be safe to reuse operand 0 after RA if you change it to: The goal is to only create and split this pattern before register allocation. It's a transient pattern, combine creates it, and it is split immediately after. Maybe !reload_completed is clearer as that is what several other define_insn_and_split patterns do? Wilco
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Hi Michael, On 19 May 2017 at 09:21, Richard Sandifordwrote: > Thanks for doing this. Just a couple of comments about the .md stuff: > > Michael Collison writes: >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index 5adc5ed..c6ae670 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -3999,6 +3999,92 @@ >>} >> ) >> >> +;; When the LSL, LSR, ASR, ROR instructions operate on all register >> arguments >> +;; they truncate the shift/rotate amount by the size of the registers they >> +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise >> away >> +;; such redundant masking instructions. GCC can do that automatically when >> +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD >> +;; because some of the SISD shift alternatives don't perform this >> truncations. >> +;; So this pattern exists to catch such cases. >> + >> +(define_insn "*aarch64__reg_3_mask1" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (SHIFT:GPI >> + (match_operand:GPI 1 "register_operand" "r") >> + (subreg:QI (and:GPI >> + (match_operand:GPI 2 "register_operand" "r") >> + (match_operand 3 "const_int_operand" "n")) 0)))] >> + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0" >> + "\t%0, %1, %2" >> + [(set_attr "type" "shift_reg")] >> +) > > (subreg:QI (...) 0) is only correct for little endian. For big endian > it needs to be 3 for SI or 7 for DI. You could probably handle that > using something like: > > (match_operator:QI 2 "lowpart_subreg" > [(and:GPI ...)]) > > and defining a lowpart_subreg predicate that checks the SUBREG_BYTE. > Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn > condition. > I can confirm that the new tests pass on little-endian, but fail on big. Thanks, Christophe >> +(define_insn_and_split "*aarch64_reg_3_neg_mask2" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (SHIFT:GPI >> + (match_operand:GPI 1 "register_operand" "r") >> + (subreg:QI (neg:SI (and:SI >> + (match_operand:SI 2 "register_operand" "r") >> + (match_operand 3 "const_int_operand" "n"))) 0)))] >> + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0) >> + && can_create_pseudo_p ()" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> +rtx tmp = gen_reg_rtx (SImode); >> + >> +emit_insn (gen_negsi2 (tmp, operands[2])); >> +rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0); >> +emit_insn (gen_3 (operands[0], operands[1], tmp2)); >> +DONE; >> + } >> +) > > Insn patterns shouldn't check can_create_pseudo_p, because there's no > guarantee that the associated split happens before RA. In this case it > should be safe to reuse operand 0 after RA if you change it to: > > [(set (match_operand:GPI 0 "register_operand" "=") > ...)] > > and then: > > rtx tmp = (can_create_pseudo_p () > ? gen_reg_rtx (SImode) > : gen_lowpart (SImode, operands[0])); > > Thanks, > Richard
Re: [PATCH] [Aarch64] Variable shift count truncation issues
Thanks for doing this. Just a couple of comments about the .md stuff: Michael Collisonwrites: > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 5adc5ed..c6ae670 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -3999,6 +3999,92 @@ >} > ) > > +;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments > +;; they truncate the shift/rotate amount by the size of the registers they > +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise away > +;; such redundant masking instructions. GCC can do that automatically when > +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD > +;; because some of the SISD shift alternatives don't perform this > truncations. > +;; So this pattern exists to catch such cases. > + > +(define_insn "*aarch64__reg_3_mask1" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (subreg:QI (and:GPI > + (match_operand:GPI 2 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n")) 0)))] > + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0" > + "\t%0, %1, %2" > + [(set_attr "type" "shift_reg")] > +) (subreg:QI (...) 0) is only correct for little endian. For big endian it needs to be 3 for SI or 7 for DI. You could probably handle that using something like: (match_operator:QI 2 "lowpart_subreg" [(and:GPI ...)]) and defining a lowpart_subreg predicate that checks the SUBREG_BYTE. Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn condition. > +(define_insn_and_split "*aarch64_reg_3_neg_mask2" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (SHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (subreg:QI (neg:SI (and:SI > + (match_operand:SI 2 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n"))) 0)))] > + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0) > + && can_create_pseudo_p ()" > + "#" > + "&& true" > + [(const_int 0)] > + { > +rtx tmp = gen_reg_rtx (SImode); > + > +emit_insn (gen_negsi2 (tmp, operands[2])); > +rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0); > +emit_insn (gen_3 (operands[0], operands[1], tmp2)); > +DONE; > + } > +) Insn patterns shouldn't check can_create_pseudo_p, because there's no guarantee that the associated split happens before RA. In this case it should be safe to reuse operand 0 after RA if you change it to: [(set (match_operand:GPI 0 "register_operand" "=") ...)] and then: rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) : gen_lowpart (SImode, operands[0])); Thanks, Richard
[PATCH] [Aarch64] Variable shift count truncation issues
This patch improves code generation for shifts with and operations that can be omitted based on the size of the operation. AArch64 variable shifts only use the low 5-6 bits so masking operations that clear higher bits can be removed. When the shift instructions operate on all register arguments they truncate the shift/rotate amount by the size of the registers they operate on: 32 for W-regs, 63 for X-regs. This allows us to optimize away such redundant masking instructions. The patch only applies to shifts on integer instructions; vector shifts are excluded. unsigned int f1 (unsigned int x, int y) { return x << (y & 31); } unsigned long f3 (unsigned long bit_addr) { unsigned long bitnumb = bit_addr & 63; return (1L << bitnumb); } With trunk at -O2 we generate: f1: and w1, w1, 31 lsl w0, w0, w1 ret .size f1, .-f1 .align 2 .p2align 3,,7 .global f3 .type f3, %function f3: and x0, x0, 63 mov x1, 1 lsl x0, x1, x0 ret with the patch we generate: f1: lsl w0, w0, w1 ret f3: mov x1, 1 lsl x0, x1, x0 ret Okay for trunk? 2017-05-17 Kyrylo TkachovMichael Collison PR target/70119 * config/aarch64/aarch64.md (*aarch64__reg_3_mask1): New pattern. (*aarch64_reg_3_neg_mask2): New pattern. (*aarch64_reg_3_minus_mask): New pattern. (*aarch64__reg_di3_mask2): New pattern. * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost of shift when the shift amount is masked with constant equal to the size of the mode. 2017-05-17 Kyrylo Tkachov Michael Collison PR target/70119 * gcc.target/aarch64/var_shift_mask_1.c: New test. pr70119.patch Description: pr70119.patch