Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-11-20 Thread Steve Ellcey
On Mon, 2017-11-20 at 15:24 +0100, Christophe Lyon wrote:
> 
> As a result of this patch, we now have:
> XPASS: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
> "memset" 0
> instead of:
> XFAIL: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
> "memset" 0 (found 2 times)
> 
> Is it expected?
> 
> Christophe

I think it is.  I hadn't seen this failure before but I may not have
tested my change with Fortran.  I am going to consider fixing this an
obvious change and will check in the fix (removing aarch64 from the
list of xfails) today.

Steve Ellcey
sell...@cavium.com


Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-11-20 Thread Christophe Lyon
On 17 November 2017 at 22:41, James Greenhalgh  wrote:
> On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>>
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>  For now we always fail this and let the move_by_pieces code copy
>>  the string from read-only memory.  */
>>
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>
> I've been trying to remember why this is the way it is, It looks like it dates
> back to the initial port, and the only note I have on it points at an
> incorrect PR number. So, I think this is probably a safe and sensible
> choice.
>
> OK.
>

As a result of this patch, we now have:
XPASS: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2 "memset" 0
instead of:
XFAIL: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
"memset" 0 (found 2 times)

Is it expected?

Christophe


> Reviewed-by: James Greenhalgh 
>
> Thanks,
> James
>
>>
>> Bootstrapped and tested without regressions, OK to checkin?
>>
>> Steve Ellcey
>> sell...@cavium.com
>>
>>
>>
>> 2017-09-15  Steve Ellcey  
>>
>>   PR target/81356
>>   * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
>>   Remove.
>>   (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>>
>>
>> 2017-09-15  Steve Ellcey  
>>
>>   * gcc.target/aarch64/pr81356.c: New test.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 1c14008..fc72236 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
>>return (HOST_WIDE_INT_1 << 36);
>>  }
>>
>> -static bool
>> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
>> - unsigned int align,
>> - enum by_pieces_operation op,
>> - bool speed_p)
>> -{
>> -  /* STORE_BY_PIECES can be used when copying a constant string, but
>> - in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>> - For now we always fail this and let the move_by_pieces code copy
>> - the string from read-only memory.  */
>> -  if (op == STORE_BY_PIECES)
>> -return false;
>> -
>> -  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
>> -}
>> -
>>  static rtx
>>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
>>   int code, tree treeop0, tree treeop1)
>> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
>>  #undef TARGET_LEGITIMIZE_ADDRESS
>>  #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
>>
>> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>> -  aarch64_use_by_pieces_infrastructure_p
>> -
>>  #undef TARGET_SCHED_CAN_SPECULATE_INSN
>>  #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
>>
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> index e69de29..9fd6baa 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +void f(char *a)
>> +{
>> +  __builtin_strcpy (a, "");
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "ldrb" } } */
>


Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-11-17 Thread James Greenhalgh
On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote:
> PR 81356 points out that doing a __builtin_strcpy of an empty string on
> aarch64 does a copy from memory instead of just writing out a zero byte.
> In looking at this I found that it was because of
> aarch64_use_by_pieces_infrastructure_p, which returns false for
> STORE_BY_PIECES.  The comment says:
> 
>   /* STORE_BY_PIECES can be used when copying a constant string, but
>  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>  For now we always fail this and let the move_by_pieces code copy
>  the string from read-only memory.  */
> 
> But this doesn't seem to be the case anymore.  When I remove this function
> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> for __builtin_strcpy of a constant string seems to be either better or the
> same.  The only time I got more instructions after removing this function
> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> instructions to create the source followed by a store instead of doing a
> load/store of 8 bytes.  The comment may have been applicable for
> -mstrict-align at one time but it doesn't seem to be the case now.  I still
> get better code without this routine under that option as well.

I've been trying to remember why this is the way it is, It looks like it dates
back to the initial port, and the only note I have on it points at an
incorrect PR number. So, I think this is probably a safe and sensible
choice.

OK.

Reviewed-by: James Greenhalgh 

Thanks,
James

> 
> Bootstrapped and tested without regressions, OK to checkin?
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 
> 2017-09-15  Steve Ellcey  
> 
>   PR target/81356
>   * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
>   Remove.
>   (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> 
> 
> 2017-09-15  Steve Ellcey  
> 
>   * gcc.target/aarch64/pr81356.c: New test.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c14008..fc72236 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
>return (HOST_WIDE_INT_1 << 36);
>  }
>  
> -static bool
> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
> - unsigned int align,
> - enum by_pieces_operation op,
> - bool speed_p)
> -{
> -  /* STORE_BY_PIECES can be used when copying a constant string, but
> - in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
> - For now we always fail this and let the move_by_pieces code copy
> - the string from read-only memory.  */
> -  if (op == STORE_BY_PIECES)
> -return false;
> -
> -  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
> -}
> -
>  static rtx
>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
>   int code, tree treeop0, tree treeop1)
> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_LEGITIMIZE_ADDRESS
>  #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
>  
> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
> -  aarch64_use_by_pieces_infrastructure_p
> -
>  #undef TARGET_SCHED_CAN_SPECULATE_INSN
>  #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
>  

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c 
> b/gcc/testsuite/gcc.target/aarch64/pr81356.c
> index e69de29..9fd6baa 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void f(char *a)
> +{
> +  __builtin_strcpy (a, "");
> +}
> +
> +/* { dg-final { scan-assembler-not "ldrb" } } */



Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-11-15 Thread Steve Ellcey
re-re-ping.

Steve Ellcey
sell...@cavium.com

On Tue, 2017-10-24 at 11:16 -0700, Steve Ellcey wrote:
> Re-ping.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote:
> > 
> > Ping.
> > 
> > Steve Ellcey
> > sell...@cavium.com
> > 
> > 
> > On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> > > 
> > > 
> > > PR 81356 points out that doing a __builtin_strcpy of an empty
> > > string on
> > > aarch64 does a copy from memory instead of just writing out a
> > > zero byte.
> > > In looking at this I found that it was because of
> > > aarch64_use_by_pieces_infrastructure_p, which returns false for
> > > STORE_BY_PIECES.  The comment says:
> > > 
> > >   /* STORE_BY_PIECES can be used when copying a constant string,
> > > but
> > >  in that case each 64-bit chunk takes 5 insns instead of 2
> > > (LDR/STR).
> > >  For now we always fail this and let the move_by_pieces code
> > > copy
> > >  the string from read-only memory.  */
> > > 
> > > But this doesn't seem to be the case anymore.  When I remove this
> > > function
> > > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it
> > > the code
> > > for __builtin_strcpy of a constant string seems to be either
> > > better or the
> > > same.  The only time I got more instructions after removing this
> > > function
> > > was on an 8 byte __builtin_strcpy where we now generate a mov and
> > > 3 movk
> > > instructions to create the source followed by a store instead of
> > > doing a
> > > load/store of 8 bytes.  The comment may have been applicable for
> > > -mstrict-align at one time but it doesn't seem to be the case
> > > now.  I still
> > > get better code without this routine under that option as well.
> > > 
> > > Bootstrapped and tested without regressions, OK to checkin?
> > > 
> > > Steve Ellcey
> > > sell...@cavium.com
> > > 
> > > 
> > > 
> > > 2017-09-15  Steve Ellcey  
> > > 
> > >   PR target/81356
> > >   * config/aarch64/aarch64.c
> > > (aarch64_use_by_pieces_infrastructure_p):
> > >   Remove.
> > >   (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> > > 
> > > 
> > > 2017-09-15  Steve Ellcey  
> > > 
> > >   * gcc.target/aarch64/pr81356.c: New test.


Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-10-24 Thread Steve Ellcey
Re-ping.

Steve Ellcey
sell...@cavium.com

On Mon, 2017-09-25 at 10:36 -0700, Steve Ellcey wrote:
> Ping.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> > 
> > PR 81356 points out that doing a __builtin_strcpy of an empty string on
> > aarch64 does a copy from memory instead of just writing out a zero byte.
> > In looking at this I found that it was because of
> > aarch64_use_by_pieces_infrastructure_p, which returns false for
> > STORE_BY_PIECES.  The comment says:
> > 
> >   /* STORE_BY_PIECES can be used when copying a constant string, but
> >  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
> >  For now we always fail this and let the move_by_pieces code copy
> >  the string from read-only memory.  */
> > 
> > But this doesn't seem to be the case anymore.  When I remove this function
> > and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> > for __builtin_strcpy of a constant string seems to be either better or the
> > same.  The only time I got more instructions after removing this function
> > was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> > instructions to create the source followed by a store instead of doing a
> > load/store of 8 bytes.  The comment may have been applicable for
> > -mstrict-align at one time but it doesn't seem to be the case now.  I still
> > get better code without this routine under that option as well.
> > 
> > Bootstrapped and tested without regressions, OK to checkin?
> > 
> > Steve Ellcey
> > sell...@cavium.com
> > 
> > 
> > 
> > 2017-09-15  Steve Ellcey  
> > 
> > PR target/81356
> > * config/aarch64/aarch64.c
> > (aarch64_use_by_pieces_infrastructure_p):
> > Remove.
> > (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> > 
> > 
> > 2017-09-15  Steve Ellcey  
> > 
> > * gcc.target/aarch64/pr81356.c: New test.


Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-10-03 Thread Qing Zhao
I think the change is good.  But I don’t have the permission for approval…

Qing
> On Sep 25, 2017, at 12:36 PM, Steve Ellcey  wrote:
> 
> Ping.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>> 
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>  For now we always fail this and let the move_by_pieces code copy
>>  the string from read-only memory.  */
>> 
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>> 
>> Bootstrapped and tested without regressions, OK to checkin?
>> 
>> Steve Ellcey
>> sell...@cavium.com
>> 
>> 
>> 
>> 2017-09-15  Steve Ellcey  
>> 
>>  PR target/81356
>>  * config/aarch64/aarch64.c
>> (aarch64_use_by_pieces_infrastructure_p):
>>  Remove.
>>  (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>> 
>> 
>> 2017-09-15  Steve Ellcey  
>> 
>>  * gcc.target/aarch64/pr81356.c: New test.



Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-09-25 Thread Steve Ellcey
Ping.

Steve Ellcey
sell...@cavium.com


On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
> PR 81356 points out that doing a __builtin_strcpy of an empty string on
> aarch64 does a copy from memory instead of just writing out a zero byte.
> In looking at this I found that it was because of
> aarch64_use_by_pieces_infrastructure_p, which returns false for
> STORE_BY_PIECES.  The comment says:
> 
>   /* STORE_BY_PIECES can be used when copying a constant string, but
>  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>  For now we always fail this and let the move_by_pieces code copy
>  the string from read-only memory.  */
> 
> But this doesn't seem to be the case anymore.  When I remove this function
> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
> for __builtin_strcpy of a constant string seems to be either better or the
> same.  The only time I got more instructions after removing this function
> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
> instructions to create the source followed by a store instead of doing a
> load/store of 8 bytes.  The comment may have been applicable for
> -mstrict-align at one time but it doesn't seem to be the case now.  I still
> get better code without this routine under that option as well.
> 
> Bootstrapped and tested without regressions, OK to checkin?
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 
> 2017-09-15  Steve Ellcey  
> 
>   PR target/81356
>   * config/aarch64/aarch64.c
> (aarch64_use_by_pieces_infrastructure_p):
>   Remove.
>   (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
> 
> 
> 2017-09-15  Steve Ellcey  
> 
>   * gcc.target/aarch64/pr81356.c: New test.


[PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-09-15 Thread Steve Ellcey
PR 81356 points out that doing a __builtin_strcpy of an empty string on
aarch64 does a copy from memory instead of just writing out a zero byte.
In looking at this I found that it was because of
aarch64_use_by_pieces_infrastructure_p, which returns false for
STORE_BY_PIECES.  The comment says:

  /* STORE_BY_PIECES can be used when copying a constant string, but
 in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
 For now we always fail this and let the move_by_pieces code copy
 the string from read-only memory.  */

But this doesn't seem to be the case anymore.  When I remove this function
and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
for __builtin_strcpy of a constant string seems to be either better or the
same.  The only time I got more instructions after removing this function
was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
instructions to create the source followed by a store instead of doing a
load/store of 8 bytes.  The comment may have been applicable for
-mstrict-align at one time but it doesn't seem to be the case now.  I still
get better code without this routine under that option as well.

Bootstrapped and tested without regressions, OK to checkin?

Steve Ellcey
sell...@cavium.com



2017-09-15  Steve Ellcey  

PR target/81356
* config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
Remove.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.


2017-09-15  Steve Ellcey  

* gcc.target/aarch64/pr81356.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c14008..fc72236 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
   return (HOST_WIDE_INT_1 << 36);
 }
 
-static bool
-aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
-	unsigned int align,
-	enum by_pieces_operation op,
-	bool speed_p)
-{
-  /* STORE_BY_PIECES can be used when copying a constant string, but
- in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
- For now we always fail this and let the move_by_pieces code copy
- the string from read-only memory.  */
-  if (op == STORE_BY_PIECES)
-return false;
-
-  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
-}
-
 static rtx
 aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
 			int code, tree treeop0, tree treeop1)
@@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
 
-#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
-#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
-  aarch64_use_by_pieces_infrastructure_p
-
 #undef TARGET_SCHED_CAN_SPECULATE_INSN
 #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c b/gcc/testsuite/gcc.target/aarch64/pr81356.c
index e69de29..9fd6baa 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void f(char *a)
+{
+  __builtin_strcpy (a, "");
+}
+
+/* { dg-final { scan-assembler-not "ldrb" } } */