Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Bin.Cheng
On Wed, Aug 2, 2017 at 10:54 AM, Martin Liška  wrote:
> On 08/02/2017 11:45 AM, Bin.Cheng wrote:
>> Hi Martin,
>> With r250771, GCC failed to build glibc for arm/aarch64 linux cross 
>> toolchain:
>
> Hi.
>
> Sorry for the breakage, I accidentally installed wrong version of patch.
> Should be fixed in r250789.
Thanks!

Thanks,
bin
>
> M.


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Martin Liška
On 08/02/2017 11:45 AM, Bin.Cheng wrote:
> Hi Martin,
> With r250771, GCC failed to build glibc for arm/aarch64 linux cross toolchain:

Hi.

Sorry for the breakage, I accidentally installed wrong version of patch.
Should be fixed in r250789.

M.


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Bin.Cheng
On Wed, Aug 2, 2017 at 8:26 AM, Martin Liška  wrote:
> On 08/02/2017 09:16 AM, Jakub Jelinek wrote:
>> On Wed, Aug 02, 2017 at 09:13:40AM +0200, Martin Liška wrote:
>>> On 08/01/2017 09:50 PM, Jakub Jelinek wrote:
 On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
> Hello.
>
> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> (with a known constant as source)
> so that we use same type of expansion (direct insns emission, direct 
> emission with a loop instruction and
> library call). As mentioned in the PR, glibc does not provide an 
> optimized version for majority of targets.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 This broke e.g.
 FAIL: gcc.dg/20050503-1.c scan-assembler-not call
 on i686-linux, the result is significantly worse.
 Also, while perhaps majority of targets don't provide optimized version,
 some targets do, including i?86/x86_64, and if the memcpy would be expanded
 as a call, it is much better to just emit mempcpy call instead.
 Just look at the testcase, because of this misoptimization we suddenly 
 can't
 use a tail call.

 Jakub

>>>
>>> I see. That said, should I introduce some target hook that will tell 
>>> whether to expand to
>>> 'return memcpy(dst, src,l) + dst;' or call library mempcpy routine?
>>
>> If some targets aren't willing to provide fast mempcpy in libc, then yes I
>> guess.  And, for -Os you should never do the former, that isn't going to be
>> shorter (at least unless the memcpy is expanded inline and is shorter than
>> the call + addition).
>
> Good, I will work on that.
>
>>
>> BTW, do we have folding of mempcpy to memcpy if the result is ignored (no
>> lhs)?
>
> Yes, we do it, I've just verified that.
>
> Martin

Hi Martin,
With r250771, GCC failed to build glibc for arm/aarch64 linux cross toolchain:

during RTL pass: expand
loadlocale.c: In function ‘_nl_load_locale’:
loadlocale.c:199:7: internal compiler error: in emit_move_insn, at expr.c:3704
   __mempcpy (__mempcpy (__mempcpy (newp, file->filename, filenamelen),
   ^~~~
0x80902b emit_move_insn(rtx_def*, rtx_def*)
/test/source/gcc/gcc/expr.c:3703
0x6d2271 expand_builtin_memory_copy_args
/test/source/gcc/gcc/builtins.c:3514
0x6d48d7 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
/test/source/gcc/gcc/builtins.c:6847
0x80454c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/test/source/gcc/gcc/expr.c:10848
0x6f8a9c expand_expr
/test/source/gcc/gcc/expr.h:276
0x6f8a9c expand_call_stmt
/test/source/gcc/gcc/cfgexpand.c:2664
0x6f8a9c expand_gimple_stmt_1
/test/source/gcc/gcc/cfgexpand.c:3583
0x6f8a9c expand_gimple_stmt
/test/source/gcc/gcc/cfgexpand.c:3749
0x6f9c1a expand_gimple_basic_block
/test/source/gcc/gcc/cfgexpand.c:5751
0x6ff986 execute
/test/source/gcc/gcc/cfgexpand.c:6358
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

I filed PR81666 for tracking.

Thanks,
bin


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Martin Liška
On 08/02/2017 09:16 AM, Jakub Jelinek wrote:
> On Wed, Aug 02, 2017 at 09:13:40AM +0200, Martin Liška wrote:
>> On 08/01/2017 09:50 PM, Jakub Jelinek wrote:
>>> On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
 Hello.

 Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
 (with a known constant as source)
 so that we use same type of expansion (direct insns emission, direct 
 emission with a loop instruction and
 library call). As mentioned in the PR, glibc does not provide an optimized 
 version for majority of targets.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> This broke e.g.
>>> FAIL: gcc.dg/20050503-1.c scan-assembler-not call
>>> on i686-linux, the result is significantly worse.
>>> Also, while perhaps majority of targets don't provide optimized version,
>>> some targets do, including i?86/x86_64, and if the memcpy would be expanded
>>> as a call, it is much better to just emit mempcpy call instead.
>>> Just look at the testcase, because of this misoptimization we suddenly can't
>>> use a tail call.
>>>
>>> Jakub
>>>
>>
>> I see. That said, should I introduce some target hook that will tell whether 
>> to expand to
>> 'return memcpy(dst, src,l) + dst;' or call library mempcpy routine?
> 
> If some targets aren't willing to provide fast mempcpy in libc, then yes I
> guess.  And, for -Os you should never do the former, that isn't going to be
> shorter (at least unless the memcpy is expanded inline and is shorter than
> the call + addition).

Good, I will work on that.

> 
> BTW, do we have folding of mempcpy to memcpy if the result is ignored (no
> lhs)?

Yes, we do it, I've just verified that.

Martin

> 
>   Jakub
> 



Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Jakub Jelinek
On Wed, Aug 02, 2017 at 09:13:40AM +0200, Martin Liška wrote:
> On 08/01/2017 09:50 PM, Jakub Jelinek wrote:
> > On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
> >> Hello.
> >>
> >> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> >> (with a known constant as source)
> >> so that we use same type of expansion (direct insns emission, direct 
> >> emission with a loop instruction and
> >> library call). As mentioned in the PR, glibc does not provide an optimized 
> >> version for majority of targets.
> >>
> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> > 
> > This broke e.g.
> > FAIL: gcc.dg/20050503-1.c scan-assembler-not call
> > on i686-linux, the result is significantly worse.
> > Also, while perhaps majority of targets don't provide optimized version,
> > some targets do, including i?86/x86_64, and if the memcpy would be expanded
> > as a call, it is much better to just emit mempcpy call instead.
> > Just look at the testcase, because of this misoptimization we suddenly can't
> > use a tail call.
> > 
> > Jakub
> > 
> 
> I see. That said, should I introduce some target hook that will tell whether 
> to expand to
> 'return memcpy(dst, src,l) + dst;' or call library mempcpy routine?

If some targets aren't willing to provide fast mempcpy in libc, then yes I
guess.  And, for -Os you should never do the former, that isn't going to be
shorter (at least unless the memcpy is expanded inline and is shorter than
the call + addition).

BTW, do we have folding of mempcpy to memcpy if the result is ignored (no
lhs)?

Jakub


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-02 Thread Martin Liška
On 08/01/2017 09:50 PM, Jakub Jelinek wrote:
> On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
>> Hello.
>>
>> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
>> (with a known constant as source)
>> so that we use same type of expansion (direct insns emission, direct 
>> emission with a loop instruction and
>> library call). As mentioned in the PR, glibc does not provide an optimized 
>> version for majority of targets.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> This broke e.g.
> FAIL: gcc.dg/20050503-1.c scan-assembler-not call
> on i686-linux, the result is significantly worse.
> Also, while perhaps majority of targets don't provide optimized version,
> some targets do, including i?86/x86_64, and if the memcpy would be expanded
> as a call, it is much better to just emit mempcpy call instead.
> Just look at the testcase, because of this misoptimization we suddenly can't
> use a tail call.
> 
>   Jakub
> 

I see. That said, should I introduce some target hook that will tell whether to 
expand to
'return memcpy(dst, src,l) + dst;' or call library mempcpy routine?

Martin


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-01 Thread H.J. Lu
On Tue, Aug 1, 2017 at 12:50 PM, Jakub Jelinek  wrote:
> On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
>> Hello.
>>
>> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
>> (with a known constant as source)
>> so that we use same type of expansion (direct insns emission, direct 
>> emission with a loop instruction and
>> library call). As mentioned in the PR, glibc does not provide an optimized 
>> version for majority of targets.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> This broke e.g.
> FAIL: gcc.dg/20050503-1.c scan-assembler-not call
> on i686-linux, the result is significantly worse.
> Also, while perhaps majority of targets don't provide optimized version,
> some targets do, including i?86/x86_64, and if the memcpy would be expanded
> as a call, it is much better to just emit mempcpy call instead.
> Just look at the testcase, because of this misoptimization we suddenly can't
> use a tail call.
>
> Jakub

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81657

-- 
H.J.


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-01 Thread Jakub Jelinek
On Thu, Jul 20, 2017 at 08:59:29AM +0200, Martin Liška wrote:
> Hello.
> 
> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> (with a known constant as source)
> so that we use same type of expansion (direct insns emission, direct emission 
> with a loop instruction and
> library call). As mentioned in the PR, glibc does not provide an optimized 
> version for majority of targets.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

This broke e.g.
FAIL: gcc.dg/20050503-1.c scan-assembler-not call
on i686-linux, the result is significantly worse.
Also, while perhaps majority of targets don't provide optimized version,
some targets do, including i?86/x86_64, and if the memcpy would be expanded
as a call, it is much better to just emit mempcpy call instead.
Just look at the testcase, because of this misoptimization we suddenly can't
use a tail call.

Jakub


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-08-01 Thread H.J. Lu
On Wed, Jul 19, 2017 at 11:59 PM, Martin Liška  wrote:
> Hello.
>
> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> (with a known constant as source)
> so that we use same type of expansion (direct insns emission, direct emission 
> with a loop instruction and
> library call). As mentioned in the PR, glibc does not provide an optimized 
> version for majority of targets.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
> Martin
>
> gcc/testsuite/ChangeLog:
>
> 2017-07-17  Martin Liska  
>
> PR middle-end/70140
> * gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
>
> gcc/ChangeLog:
>
> 2017-07-17  Martin Liska  
>
> PR middle-end/70140
> * builtins.c (expand_builtin_memcpy_args): Remove.
> (expand_builtin_memcpy): Call newly added function
> expand_builtin_memory_copy_args.
> (expand_builtin_memcpy_with_bounds): Likewise.
> (expand_builtin_mempcpy): Remove last argument.
> (expand_builtin_mempcpy_with_bounds): Likewise.
> (expand_builtin_memory_copy_args): New function created from
> expand_builtin_mempcpy_args with small modifications.
> (expand_builtin_mempcpy_args): Remove.
> (expand_builtin_stpcpy): Remove unused argument.
> (expand_builtin): Likewise.
> (expand_builtin_with_bounds): Likewise.
> ---
>  gcc/builtins.c  | 268 
> ++--
>  gcc/testsuite/gcc.dg/string-opt-1.c |  51 ++-
>  2 files changed, 147 insertions(+), 172 deletions(-)
>
>

I got many mempcpy test failures on Fedora 26/x86-64:

#0  0xf7ddef50 in abort () from /lib/libc.so.6
#1  0x08048a95 in test (buf3=0x804b060 , buf4=0x804b0e0  "",
buf6=0x8048d3c "rstuvwxyz", n=0)
at 
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-2.c:49
#2  0x08048b77 in main_test ()
at 
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-2.c:152
#3  0x08048bc7 in main ()
at 
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/lib/main.c:10
(gdb) f 1
#1  0x08048a95 in test (buf3=0x804b060 , buf4=0x804b0e0  "",
buf6=0x8048d3c "rstuvwxyz", n=0)
at 
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-2.c:49
49abort ();
(gdb)


-- 
H.J.


Re: [PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-07-31 Thread Jeff Law
On 07/20/2017 12:59 AM, Martin Liška wrote:
> Hello.
> 
> Following patch does sharing of expansion for mem{p,}cpy and also strpcy 
> (with a known constant as source)
> so that we use same type of expansion (direct insns emission, direct emission 
> with a loop instruction and
> library call). As mentioned in the PR, glibc does not provide an optimized 
> version for majority of targets.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-07-17  Martin Liska  
> 
>   PR middle-end/70140
>   * gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
> 
> gcc/ChangeLog:
> 
> 2017-07-17  Martin Liska  
> 
>   PR middle-end/70140
>   * builtins.c (expand_builtin_memcpy_args): Remove.
>   (expand_builtin_memcpy): Call newly added function
>   expand_builtin_memory_copy_args.
>   (expand_builtin_memcpy_with_bounds): Likewise.
>   (expand_builtin_mempcpy): Remove last argument.
>   (expand_builtin_mempcpy_with_bounds): Likewise.
>   (expand_builtin_memory_copy_args): New function created from
>   expand_builtin_mempcpy_args with small modifications.
>   (expand_builtin_mempcpy_args): Remove.
>   (expand_builtin_stpcpy): Remove unused argument.
>   (expand_builtin): Likewise.
>   (expand_builtin_with_bounds): Likewise.


> 
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 2deef725620..016f68d2cb6 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3493,94 +3419,103 @@ expand_builtin_mempcpy_with_bounds (tree exp, rtx 
> target, machine_mode mode)
>  }
>  }
>  
> -/* Helper function to do the actual work for expand_builtin_mempcpy.  The
> -   arguments to the builtin_mempcpy call DEST, SRC, and LEN are broken out
> -   so that this can also be called without constructing an actual CALL_EXPR.
> -   The other arguments and return value are the same as for
> -   expand_builtin_mempcpy.  */
> +/* Helper function to do the actual work for expand of memory copy family
> +   functions (memcpy, mempcpy, stpcpy).  Expansing should assign LEN bytes
"Expansing"?

OK with the nit fixed.


Jeff




[PATCH] Make mempcpy more optimal (PR middle-end/70140).

2017-07-20 Thread Martin Liška
Hello.

Following patch does sharing of expansion for mem{p,}cpy and also strpcy (with 
a known constant as source)
so that we use same type of expansion (direct insns emission, direct emission 
with a loop instruction and
library call). As mentioned in the PR, glibc does not provide an optimized 
version for majority of targets.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/testsuite/ChangeLog:

2017-07-17  Martin Liska  

PR middle-end/70140
* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.

gcc/ChangeLog:

2017-07-17  Martin Liska  

PR middle-end/70140
* builtins.c (expand_builtin_memcpy_args): Remove.
(expand_builtin_memcpy): Call newly added function
expand_builtin_memory_copy_args.
(expand_builtin_memcpy_with_bounds): Likewise.
(expand_builtin_mempcpy): Remove last argument.
(expand_builtin_mempcpy_with_bounds): Likewise.
(expand_builtin_memory_copy_args): New function created from
expand_builtin_mempcpy_args with small modifications.
(expand_builtin_mempcpy_args): Remove.
(expand_builtin_stpcpy): Remove unused argument.
(expand_builtin): Likewise.
(expand_builtin_with_bounds): Likewise.
---
 gcc/builtins.c  | 268 ++--
 gcc/testsuite/gcc.dg/string-opt-1.c |  51 ++-
 2 files changed, 147 insertions(+), 172 deletions(-)


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2deef725620..016f68d2cb6 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -121,12 +121,12 @@ static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, machine_mode);
 static rtx expand_builtin_memchr (tree, rtx);
 static rtx expand_builtin_memcpy (tree, rtx);
 static rtx expand_builtin_memcpy_with_bounds (tree, rtx);
-static rtx expand_builtin_memcpy_args (tree, tree, tree, rtx, tree);
+static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
+	rtx target, tree exp, int endp);
 static rtx expand_builtin_memmove (tree, rtx);
-static rtx expand_builtin_mempcpy (tree, rtx, machine_mode);
-static rtx expand_builtin_mempcpy_with_bounds (tree, rtx, machine_mode);
-static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx,
-	machine_mode, int, tree);
+static rtx expand_builtin_mempcpy (tree, rtx);
+static rtx expand_builtin_mempcpy_with_bounds (tree, rtx);
+static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, int);
 static rtx expand_builtin_strcat (tree, rtx);
 static rtx expand_builtin_strcpy (tree, rtx);
 static rtx expand_builtin_strcpy_args (tree, tree, rtx);
@@ -2961,81 +2961,6 @@ determine_block_size (tree len, rtx len_rtx,
 			  GET_MODE_MASK (GET_MODE (len_rtx)));
 }
 
-/* Helper function to do the actual work for expand_builtin_memcpy.  */
-
-static rtx
-expand_builtin_memcpy_args (tree dest, tree src, tree len, rtx target, tree exp)
-{
-  const char *src_str;
-  unsigned int src_align = get_pointer_alignment (src);
-  unsigned int dest_align = get_pointer_alignment (dest);
-  rtx dest_mem, src_mem, dest_addr, len_rtx;
-  HOST_WIDE_INT expected_size = -1;
-  unsigned int expected_align = 0;
-  unsigned HOST_WIDE_INT min_size;
-  unsigned HOST_WIDE_INT max_size;
-  unsigned HOST_WIDE_INT probable_max_size;
-
-  /* If DEST is not a pointer type, call the normal function.  */
-  if (dest_align == 0)
-return NULL_RTX;
-
-  /* If either SRC is not a pointer type, don't do this
- operation in-line.  */
-  if (src_align == 0)
-return NULL_RTX;
-
-  if (currently_expanding_gimple_stmt)
-stringop_block_profile (currently_expanding_gimple_stmt,
-			_align, _size);
-
-  if (expected_align < dest_align)
-expected_align = dest_align;
-  dest_mem = get_memory_rtx (dest, len);
-  set_mem_align (dest_mem, dest_align);
-  len_rtx = expand_normal (len);
-  determine_block_size (len, len_rtx, _size, _size,
-			_max_size);
-  src_str = c_getstr (src);
-
-  /* If SRC is a string constant and block move would be done
- by pieces, we can avoid loading the string from memory
- and only stored the computed constants.  */
-  if (src_str
-  && CONST_INT_P (len_rtx)
-  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
-  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
-			  CONST_CAST (char *, src_str),
-			  dest_align, false))
-{
-  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
-  builtin_memcpy_read_str,
-  CONST_CAST (char *, src_str),
-  dest_align, false, 0);
-  dest_mem = force_operand (XEXP (dest_mem, 0), target);
-  dest_mem = convert_memory_address (ptr_mode, dest_mem);
-  return dest_mem;
-}
-
-  src_mem = get_memory_rtx (src, len);
-  set_mem_align (src_mem, src_align);
-
-  /* Copy word part most expediently.  */
-  dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,