Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
On Mon, 2017-11-20 at 15:24 +0100, Christophe Lyon wrote: > > As a result of this patch, we now have: > XPASS: gfortran.dg/pr45636.f90 -O scan-tree-dump-times forwprop2 > "memset" 0 > instead of: > XFAIL: gfortran.dg/pr45636.f90 -O scan-tree-dump-times forwprop2 > "memset" 0 (found 2 times) > > Is it expected? > > Christophe I think it is. I hadn't seen this failure before but I may not have tested my change with Fortran. I am going to consider fixing this an obvious change and will check in the fix (removing aarch64 from the list of xfails) today. Steve Ellcey sell...@cavium.com
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
On 17 November 2017 at 22:41, James Greenhalghwrote: > On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote: >> PR 81356 points out that doing a __builtin_strcpy of an empty string on >> aarch64 does a copy from memory instead of just writing out a zero byte. >> In looking at this I found that it was because of >> aarch64_use_by_pieces_infrastructure_p, which returns false for >> STORE_BY_PIECES. The comment says: >> >> /* STORE_BY_PIECES can be used when copying a constant string, but >> in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). >> For now we always fail this and let the move_by_pieces code copy >> the string from read-only memory. */ >> >> But this doesn't seem to be the case anymore. When I remove this function >> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code >> for __builtin_strcpy of a constant string seems to be either better or the >> same. The only time I got more instructions after removing this function >> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk >> instructions to create the source followed by a store instead of doing a >> load/store of 8 bytes. The comment may have been applicable for >> -mstrict-align at one time but it doesn't seem to be the case now. I still >> get better code without this routine under that option as well. > > I've been trying to remember why this is the way it is, It looks like it dates > back to the initial port, and the only note I have on it points at an > incorrect PR number. So, I think this is probably a safe and sensible > choice. > > OK. > As a result of this patch, we now have: XPASS: gfortran.dg/pr45636.f90 -O scan-tree-dump-times forwprop2 "memset" 0 instead of: XFAIL: gfortran.dg/pr45636.f90 -O scan-tree-dump-times forwprop2 "memset" 0 (found 2 times) Is it expected? Christophe > Reviewed-by: James Greenhalgh > > Thanks, > James > >> >> Bootstrapped and tested without regressions, OK to checkin? >> >> Steve Ellcey >> sell...@cavium.com >> >> >> >> 2017-09-15 Steve Ellcey >> >> PR target/81356 >> * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p): >> Remove. >> (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. >> >> >> 2017-09-15 Steve Ellcey >> >> * gcc.target/aarch64/pr81356.c: New test. > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 1c14008..fc72236 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void) >>return (HOST_WIDE_INT_1 << 36); >> } >> >> -static bool >> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, >> - unsigned int align, >> - enum by_pieces_operation op, >> - bool speed_p) >> -{ >> - /* STORE_BY_PIECES can be used when copying a constant string, but >> - in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). >> - For now we always fail this and let the move_by_pieces code copy >> - the string from read-only memory. */ >> - if (op == STORE_BY_PIECES) >> -return false; >> - >> - return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); >> -} >> - >> static rtx >> aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq, >> int code, tree treeop0, tree treeop1) >> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p >> #undef TARGET_LEGITIMIZE_ADDRESS >> #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address >> >> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P >> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ >> - aarch64_use_by_pieces_infrastructure_p >> - >> #undef TARGET_SCHED_CAN_SPECULATE_INSN >> #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn >> > >> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c >> b/gcc/testsuite/gcc.target/aarch64/pr81356.c >> index e69de29..9fd6baa 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c >> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +void f(char *a) >> +{ >> + __builtin_strcpy (a, ""); >> +} >> + >> +/* { dg-final { scan-assembler-not "ldrb" } } */ >
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote: > PR 81356 points out that doing a __builtin_strcpy of an empty string on > aarch64 does a copy from memory instead of just writing out a zero byte. > In looking at this I found that it was because of > aarch64_use_by_pieces_infrastructure_p, which returns false for > STORE_BY_PIECES. The comment says: > > /* STORE_BY_PIECES can be used when copying a constant string, but > in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). > For now we always fail this and let the move_by_pieces code copy > the string from read-only memory. */ > > But this doesn't seem to be the case anymore. When I remove this function > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code > for __builtin_strcpy of a constant string seems to be either better or the > same. The only time I got more instructions after removing this function > was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk > instructions to create the source followed by a store instead of doing a > load/store of 8 bytes. The comment may have been applicable for > -mstrict-align at one time but it doesn't seem to be the case now. I still > get better code without this routine under that option as well. I've been trying to remember why this is the way it is, It looks like it dates back to the initial port, and the only note I have on it points at an incorrect PR number. So, I think this is probably a safe and sensible choice. OK. Reviewed-by: James GreenhalghThanks, James > > Bootstrapped and tested without regressions, OK to checkin? > > Steve Ellcey > sell...@cavium.com > > > > 2017-09-15 Steve Ellcey > > PR target/81356 > * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p): > Remove. > (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. > > > 2017-09-15 Steve Ellcey > > * gcc.target/aarch64/pr81356.c: New test. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 1c14008..fc72236 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void) >return (HOST_WIDE_INT_1 << 36); > } > > -static bool > -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, > - unsigned int align, > - enum by_pieces_operation op, > - bool speed_p) > -{ > - /* STORE_BY_PIECES can be used when copying a constant string, but > - in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). > - For now we always fail this and let the move_by_pieces code copy > - the string from read-only memory. */ > - if (op == STORE_BY_PIECES) > -return false; > - > - return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); > -} > - > static rtx > aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq, > int code, tree treeop0, tree treeop1) > @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p > #undef TARGET_LEGITIMIZE_ADDRESS > #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address > > -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P > -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ > - aarch64_use_by_pieces_infrastructure_p > - > #undef TARGET_SCHED_CAN_SPECULATE_INSN > #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c > b/gcc/testsuite/gcc.target/aarch64/pr81356.c > index e69de29..9fd6baa 100644 > --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c > +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +void f(char *a) > +{ > + __builtin_strcpy (a, ""); > +} > + > +/* { dg-final { scan-assembler-not "ldrb" } } */
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
re-re-ping. Steve Ellcey sell...@cavium.com On Tue, 2017-10-24 at 11:16 -0700, Steve Ellcey wrote: > Re-ping. > > Steve Ellcey > sell...@cavium.com > > On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote: > > > > Ping. > > > > Steve Ellcey > > sell...@cavium.com > > > > > > On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote: > > > > > > > > > PR 81356 points out that doing a __builtin_strcpy of an empty > > > string on > > > aarch64 does a copy from memory instead of just writing out a > > > zero byte. > > > In looking at this I found that it was because of > > > aarch64_use_by_pieces_infrastructure_p, which returns false for > > > STORE_BY_PIECES. The comment says: > > > > > > /* STORE_BY_PIECES can be used when copying a constant string, > > > but > > > in that case each 64-bit chunk takes 5 insns instead of 2 > > > (LDR/STR). > > > For now we always fail this and let the move_by_pieces code > > > copy > > > the string from read-only memory. */ > > > > > > But this doesn't seem to be the case anymore. When I remove this > > > function > > > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it > > > the code > > > for __builtin_strcpy of a constant string seems to be either > > > better or the > > > same. The only time I got more instructions after removing this > > > function > > > was on an 8 byte __builtin_strcpy where we now generate a mov and > > > 3 movk > > > instructions to create the source followed by a store instead of > > > doing a > > > load/store of 8 bytes. The comment may have been applicable for > > > -mstrict-align at one time but it doesn't seem to be the case > > > now. I still > > > get better code without this routine under that option as well. > > > > > > Bootstrapped and tested without regressions, OK to checkin? > > > > > > Steve Ellcey > > > sell...@cavium.com > > > > > > > > > > > > 2017-09-15 Steve Ellcey> > > > > > PR target/81356 > > > * config/aarch64/aarch64.c > > > (aarch64_use_by_pieces_infrastructure_p): > > > Remove. > > > (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. > > > > > > > > > 2017-09-15 Steve Ellcey > > > > > > * gcc.target/aarch64/pr81356.c: New test.
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
Re-ping. Steve Ellcey sell...@cavium.com On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote: > Ping. > > Steve Ellcey > sell...@cavium.com > > > On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote: > > > > PR 81356 points out that doing a __builtin_strcpy of an empty string on > > aarch64 does a copy from memory instead of just writing out a zero byte. > > In looking at this I found that it was because of > > aarch64_use_by_pieces_infrastructure_p, which returns false for > > STORE_BY_PIECES. The comment says: > > > > /* STORE_BY_PIECES can be used when copying a constant string, but > > in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). > > For now we always fail this and let the move_by_pieces code copy > > the string from read-only memory. */ > > > > But this doesn't seem to be the case anymore. When I remove this function > > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code > > for __builtin_strcpy of a constant string seems to be either better or the > > same. The only time I got more instructions after removing this function > > was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk > > instructions to create the source followed by a store instead of doing a > > load/store of 8 bytes. The comment may have been applicable for > > -mstrict-align at one time but it doesn't seem to be the case now. I still > > get better code without this routine under that option as well. > > > > Bootstrapped and tested without regressions, OK to checkin? > > > > Steve Ellcey > > sell...@cavium.com > > > > > > > > 2017-09-15 Steve Ellcey> > > > PR target/81356 > > * config/aarch64/aarch64.c > > (aarch64_use_by_pieces_infrastructure_p): > > Remove. > > (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. > > > > > > 2017-09-15 Steve Ellcey > > > > * gcc.target/aarch64/pr81356.c: New test.
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
I think the change is good. But I don’t have the permission for approval… Qing > On Sep 25, 2017, at 12:36 PM, Steve Ellceywrote: > > Ping. > > Steve Ellcey > sell...@cavium.com > > > On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote: >> PR 81356 points out that doing a __builtin_strcpy of an empty string on >> aarch64 does a copy from memory instead of just writing out a zero byte. >> In looking at this I found that it was because of >> aarch64_use_by_pieces_infrastructure_p, which returns false for >> STORE_BY_PIECES. The comment says: >> >> /* STORE_BY_PIECES can be used when copying a constant string, but >> in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). >> For now we always fail this and let the move_by_pieces code copy >> the string from read-only memory. */ >> >> But this doesn't seem to be the case anymore. When I remove this function >> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code >> for __builtin_strcpy of a constant string seems to be either better or the >> same. The only time I got more instructions after removing this function >> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk >> instructions to create the source followed by a store instead of doing a >> load/store of 8 bytes. The comment may have been applicable for >> -mstrict-align at one time but it doesn't seem to be the case now. I still >> get better code without this routine under that option as well. >> >> Bootstrapped and tested without regressions, OK to checkin? >> >> Steve Ellcey >> sell...@cavium.com >> >> >> >> 2017-09-15 Steve Ellcey >> >> PR target/81356 >> * config/aarch64/aarch64.c >> (aarch64_use_by_pieces_infrastructure_p): >> Remove. >> (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. >> >> >> 2017-09-15 Steve Ellcey >> >> * gcc.target/aarch64/pr81356.c: New test.
Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
Ping. Steve Ellcey sell...@cavium.com On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote: > PR 81356 points out that doing a __builtin_strcpy of an empty string on > aarch64 does a copy from memory instead of just writing out a zero byte. > In looking at this I found that it was because of > aarch64_use_by_pieces_infrastructure_p, which returns false for > STORE_BY_PIECES. The comment says: > > /* STORE_BY_PIECES can be used when copying a constant string, but > in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). > For now we always fail this and let the move_by_pieces code copy > the string from read-only memory. */ > > But this doesn't seem to be the case anymore. When I remove this function > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code > for __builtin_strcpy of a constant string seems to be either better or the > same. The only time I got more instructions after removing this function > was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk > instructions to create the source followed by a store instead of doing a > load/store of 8 bytes. The comment may have been applicable for > -mstrict-align at one time but it doesn't seem to be the case now. I still > get better code without this routine under that option as well. > > Bootstrapped and tested without regressions, OK to checkin? > > Steve Ellcey > sell...@cavium.com > > > > 2017-09-15 Steve Ellcey> > PR target/81356 > * config/aarch64/aarch64.c > (aarch64_use_by_pieces_infrastructure_p): > Remove. > (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. > > > 2017-09-15 Steve Ellcey > > * gcc.target/aarch64/pr81356.c: New test.
[PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb
PR 81356 points out that doing a __builtin_strcpy of an empty string on aarch64 does a copy from memory instead of just writing out a zero byte. In looking at this I found that it was because of aarch64_use_by_pieces_infrastructure_p, which returns false for STORE_BY_PIECES. The comment says: /* STORE_BY_PIECES can be used when copying a constant string, but in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). For now we always fail this and let the move_by_pieces code copy the string from read-only memory. */ But this doesn't seem to be the case anymore. When I remove this function and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code for __builtin_strcpy of a constant string seems to be either better or the same. The only time I got more instructions after removing this function was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk instructions to create the source followed by a store instead of doing a load/store of 8 bytes. The comment may have been applicable for -mstrict-align at one time but it doesn't seem to be the case now. I still get better code without this routine under that option as well. Bootstrapped and tested without regressions, OK to checkin? Steve Ellcey sell...@cavium.com 2017-09-15 Steve EllceyPR target/81356 * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p): Remove. (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define. 2017-09-15 Steve Ellcey * gcc.target/aarch64/pr81356.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1c14008..fc72236 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void) return (HOST_WIDE_INT_1 << 36); } -static bool -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, - unsigned int align, - enum by_pieces_operation op, - bool speed_p) -{ - /* STORE_BY_PIECES can be used when copying a constant string, but - in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR). - For now we always fail this and let the move_by_pieces code copy - the string from read-only memory. */ - if (op == STORE_BY_PIECES) -return false; - - return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); -} - static rtx aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq, int code, tree treeop0, tree treeop1) @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_LEGITIMIZE_ADDRESS #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ - aarch64_use_by_pieces_infrastructure_p - #undef TARGET_SCHED_CAN_SPECULATE_INSN #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c index e69de29..9fd6baa 100644 --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void f(char *a) +{ + __builtin_strcpy (a, ""); +} + +/* { dg-final { scan-assembler-not "ldrb" } } */