Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-07-05 Thread Jeff Law
On 04/10/2018 06:27 AM, Martin Liška wrote:
> On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
>> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-28  Martin Liska  
>>>
>>> * gcc.dg/string-opt-1.c:
>> I guess you really didn't mean to keep the above entry around, just the one
>> below, right?
> Sure, fixed.
> 
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-14  Martin Liska  
>>>
>>> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>>> and others.
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>> x86_64-*-linux*)
>>> tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h 
>>> i386/linux64.h"
>>> extra_options="${extra_options} linux-android.opt"
>>> +   extra_objs="${extra_objs} x86-linux.o"
>>> ;;
>> The should go into the i[34567]86-*-linux*) case too (outside of the
>> if test x$enable_targets = xall; then conditional).
>> Or maybe better, remove the above and do it in:
>> i[34567]86-*-linux* | x86_64-*-linux*)
>> extra_objs="${extra_objs} cet.o"
>> tmake_file="$tmake_file i386/t-linux i386/t-cet"
>> ;;
>> spot, just add x86-linux.o next to cet.o.
> Done.
> 
>>> --- a/gcc/config/i386/linux.h
>>> +++ b/gcc/config/i386/linux.h
>>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>>  
>>>  #undef MUSL_DYNAMIC_LINKER
>>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>>> index f2d913e30ac..d855f5cc239 100644
>>> --- a/gcc/config/i386/linux64.h
>>> +++ b/gcc/config/i386/linux64.h
>>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
>>> If not, see
>>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>>  #undef MUSL_DYNAMIC_LINKERX32
>>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> And the above two changes should be replaced by a change in
>> gcc/config/i386/linux-common.h.
> Likewise.
> 
>>> +#include "coretypes.h"
>>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
>> Why do you need cp/cp-tree.h?  That is just too weird.
>> The function just uses libc_speed (in core-types.h, built_in_function
>> (likewise), OPTION_GLIBC (config/linux.h).
> I ended up with minimal set of includes:
> 
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> #include "backend.h"
> #include "tree.h"
> 
> I'm retesting the patch.
> 
> Martin
> 
>>  Jakub
>>
> 
> 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch
> 
> 
> From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 14 Mar 2018 09:44:18 +0100
> Subject: [PATCH] Introduce new libc_func_speed target hook (PR
>  middle-end/81657).
> 
> gcc/ChangeLog:
> 
> 2018-03-14  Martin Liska  
> 
>   PR middle-end/81657
>   * builtins.c (expand_builtin_memory_copy_args): Handle situation
>   when libc library provides a fast mempcpy implementation/
>   * config/linux-protos.h (ix86_linux_libc_func_speed): New.
>   (TARGET_LIBC_FUNC_SPEED): Likewise.
>   * config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
>   macro.
>   * config/i386/t-linux: Add x86-linux.o.
>   * config.gcc: Likewise.
>   * config/i386/x86-linux.c: New file.
>   * coretypes.h (enum libc_speed): Likewise.
>   * doc/tm.texi: Document new target hook.
>   * doc/tm.texi.in: Likewise.
>   * expr.c (emit_block_move_hints): Handle libc bail out argument.
>   * expr.h (emit_block_move_hints): Add new parameters.
>   * target.def: Add new hook.
>   * targhooks.c (enum libc_speed): New enum.
>   (default_libc_func_speed): Provide a default hook
>   implementation.
>   * targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  
> 
>   * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>   and others.
This looks pretty reasonable now.  Let's go with it.  If we need to
adjust other targets we certainly can fault them in as their properties
are discovered/updated.

jeff


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Jakub Jelinek
On Fri, Apr 13, 2018 at 12:10:33PM +, Wilco Dijkstra wrote:
> > Anyway, here is what I think Richard was asking for, that I'm currently
> > bootstrapping/regtesting.  It can be easily combined with Martin's target
> > hook if needed, or do it only for
> > endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
> 
> This patch causes regressions on AArch64 since it now always calls mempcpy
> again, so yes either it would need to be done only for tailcalls (which fixes 
> the

No, it only uses mempcpy if we'd otherwise call memcpy library function and user
wrote mempcpy.  AFAIK 7.x and earlier behaved that way too, so it isn't a
regression, regression is only from released GCC versions.  And, a fix is
easy, just implement fast mempcpy on aarch64 ;)

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread H.J. Lu
On Fri, Apr 13, 2018 at 5:10 AM, Wilco Dijkstra  wrote:

> This patch causes regressions on AArch64 since it now always calls mempcpy
> again, so yes either it would need to be done only for tailcalls (which fixes 
> the
> reported regression) or we need Martin's change too so each target can state 
> whether
> they have a fast mempcpy or not.
>

You should open a bug report to track it.

-- 
H.J.


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Wilco Dijkstra
Jakub Jelinek wrote:  
>On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
>> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got 
>> > added,
>> > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
>> > isn't hard if you have asm version of memcpy and one spare register.
>> 
>> More mempcpy implementations have been added in recent years indeed, but 
>> almost all
>> add an extra copy of the memcpy code rather than using a single combined 
>> implementation.
>> That means it is still better to call memcpy (which is frequently used and 
>> thus likely in L1/L2)
>> rather than mempcpy (which is more likely to be cold and thus not cached).
>
> That really depends, usually when some app uses mempcpy, it uses it very
> heavily.

But it would have to not use memcpy nearby. Do you have examples of apps using
mempcpy significantly? GLIBC is the only case I've seen that uses mempcpy.

> And all the proposed patches do is honor what the user asked, if
> you use memcpy () + n, we aren't transforming that into mempcpy behind the
> user's back.

We transform a lot of calls behind the user's back so that's not a plausible 
argument
for "honoring" the original call. Eg. (void)mempcpy always gets changed to 
memcpy,
bcopy to memmove, bzero to memset, strchr (s, 0) into strlen(s) + s - the list 
is long
and there are plenty cases where these expansions block tailcalls.

> Anyway, here is what I think Richard was asking for, that I'm currently
> bootstrapping/regtesting.  It can be easily combined with Martin's target
> hook if needed, or do it only for
> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)

This patch causes regressions on AArch64 since it now always calls mempcpy
again, so yes either it would need to be done only for tailcalls (which fixes 
the
reported regression) or we need Martin's change too so each target can state 
whether
they have a fast mempcpy or not.

Wilco

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Richard Biener
On April 13, 2018 12:35:54 AM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Apr 12, 2018 at 07:37:22PM +0200, Jakub Jelinek wrote:
>> On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
>> > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy
>got added,
>> > > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent
>mempcpy
>> > > isn't hard if you have asm version of memcpy and one spare
>register.
>> > 
>> > More mempcpy implementations have been added in recent years
>indeed, but almost all
>> > add an extra copy of the memcpy code rather than using a single
>combined implementation.
>> > That means it is still better to call memcpy (which is frequently
>used and thus likely in L1/L2)
>> > rather than mempcpy (which is more likely to be cold and thus not
>cached).
>> 
>> That really depends, usually when some app uses mempcpy, it uses it
>very
>> heavily.  And all the proposed patches do is honor what the user
>asked, if
>> you use memcpy () + n, we aren't transforming that into mempcpy
>behind the
>> user's back.
>> 
>> Anyway, here is what I think Richard was asking for, that I'm
>currently
>> bootstrapping/regtesting.  It can be easily combined with Martin's
>target
>> hook if needed, or do it only for
>> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
>> etc.
>> 
>> 2018-04-12  Martin Liska  
>>  Jakub Jelinek  
>> 
>>  PR middle-end/81657
>>  * expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET.
>>  * expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET.
>>  * builtins.c (expand_builtin_memory_copy_args): Use
>>  BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target,
>>  handle dest_addr == pc_rtx.
>> 
>>  * gcc.dg/string-opt-1.c: Remove bogus comment.  Expect a mempcpy
>>  call.
>
>Successfully bootstrapped/regtested on x86_64-linux and i686-linux.

OK. 

Thanks, 
Richard. 

>   Jakub



Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 07:37:22PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
> > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got 
> > > added,
> > > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
> > > isn't hard if you have asm version of memcpy and one spare register.
> > 
> > More mempcpy implementations have been added in recent years indeed, but 
> > almost all
> > add an extra copy of the memcpy code rather than using a single combined 
> > implementation.
> > That means it is still better to call memcpy (which is frequently used and 
> > thus likely in L1/L2)
> > rather than mempcpy (which is more likely to be cold and thus not cached).
> 
> That really depends, usually when some app uses mempcpy, it uses it very
> heavily.  And all the proposed patches do is honor what the user asked, if
> you use memcpy () + n, we aren't transforming that into mempcpy behind the
> user's back.
> 
> Anyway, here is what I think Richard was asking for, that I'm currently
> bootstrapping/regtesting.  It can be easily combined with Martin's target
> hook if needed, or do it only for
> endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
> etc.
> 
> 2018-04-12  Martin Liska  
>   Jakub Jelinek  
> 
>   PR middle-end/81657
>   * expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET.
>   * expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET.
>   * builtins.c (expand_builtin_memory_copy_args): Use
>   BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target,
>   handle dest_addr == pc_rtx.
> 
>   * gcc.dg/string-opt-1.c: Remove bogus comment.  Expect a mempcpy
>   call.

Successfully bootstrapped/regtested on x86_64-linux and i686-linux.

Jakub


[PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 05:29:35PM +, Wilco Dijkstra wrote:
> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
> > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
> > isn't hard if you have asm version of memcpy and one spare register.
> 
> More mempcpy implementations have been added in recent years indeed, but 
> almost all
> add an extra copy of the memcpy code rather than using a single combined 
> implementation.
> That means it is still better to call memcpy (which is frequently used and 
> thus likely in L1/L2)
> rather than mempcpy (which is more likely to be cold and thus not cached).

That really depends, usually when some app uses mempcpy, it uses it very
heavily.  And all the proposed patches do is honor what the user asked, if
you use memcpy () + n, we aren't transforming that into mempcpy behind the
user's back.

Anyway, here is what I think Richard was asking for, that I'm currently
bootstrapping/regtesting.  It can be easily combined with Martin's target
hook if needed, or do it only for
endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp)
etc.

2018-04-12  Martin Liska  
Jakub Jelinek  

PR middle-end/81657
* expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET.
* expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET.
* builtins.c (expand_builtin_memory_copy_args): Use
BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target,
handle dest_addr == pc_rtx.

* gcc.dg/string-opt-1.c: Remove bogus comment.  Expect a mempcpy
call.

--- gcc/expr.h.jj   2018-01-12 11:35:51.424222835 +0100
+++ gcc/expr.h  2018-04-12 18:38:07.377464114 +0200
@@ -100,7 +100,11 @@ enum block_op_methods
   BLOCK_OP_NO_LIBCALL,
   BLOCK_OP_CALL_PARM,
   /* Like BLOCK_OP_NORMAL, but the libcall can be tail call optimized.  */
-  BLOCK_OP_TAILCALL
+  BLOCK_OP_TAILCALL,
+  /* Like BLOCK_OP_NO_LIBCALL, but instead of emitting a libcall return
+ pc_rtx to indicate nothing has been emitted and let the caller handle
+ it.  */
+  BLOCK_OP_NO_LIBCALL_RET
 };
 
 typedef rtx (*by_pieces_constfn) (void *, HOST_WIDE_INT, scalar_int_mode);
--- gcc/expr.c.jj   2018-04-06 19:19:14.954130838 +0200
+++ gcc/expr.c  2018-04-12 18:39:58.866536619 +0200
@@ -1565,7 +1565,7 @@ emit_block_move_hints (rtx x, rtx y, rtx
   unsigned HOST_WIDE_INT max_size,
   unsigned HOST_WIDE_INT probable_max_size)
 {
-  bool may_use_call;
+  int may_use_call;
   rtx retval = 0;
   unsigned int align;
 
@@ -1577,7 +1577,7 @@ emit_block_move_hints (rtx x, rtx y, rtx
 {
 case BLOCK_OP_NORMAL:
 case BLOCK_OP_TAILCALL:
-  may_use_call = true;
+  may_use_call = 1;
   break;
 
 case BLOCK_OP_CALL_PARM:
@@ -1589,7 +1589,11 @@ emit_block_move_hints (rtx x, rtx y, rtx
   break;
 
 case BLOCK_OP_NO_LIBCALL:
-  may_use_call = false;
+  may_use_call = 0;
+  break;
+
+case BLOCK_OP_NO_LIBCALL_RET:
+  may_use_call = -1;
   break;
 
 default:
@@ -1625,6 +1629,9 @@ emit_block_move_hints (rtx x, rtx y, rtx
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
 {
+  if (may_use_call < 0)
+   return pc_rtx;
+
   /* Since x and y are passed to a libcall, mark the corresponding
 tree EXPR as addressable.  */
   tree y_expr = MEM_EXPR (y);
--- gcc/builtins.c.jj   2018-04-12 13:35:34.328395156 +0200
+++ gcc/builtins.c  2018-04-12 18:42:01.846616598 +0200
@@ -3650,12 +3650,16 @@ expand_builtin_memory_copy_args (tree de
   set_mem_align (src_mem, src_align);
 
   /* Copy word part most expediently.  */
-  dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
-CALL_EXPR_TAILCALL (exp)
-&& (endp == 0 || target == const0_rtx)
-? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
+  enum block_op_methods method = BLOCK_OP_NORMAL;
+  if (CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx))
+method = BLOCK_OP_TAILCALL;
+  if (endp == 1 && target != const0_rtx)
+method = BLOCK_OP_NO_LIBCALL_RET;
+  dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, method,
 expected_align, expected_size,
 min_size, max_size, probable_max_size);
+  if (dest_addr == pc_rtx)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
--- gcc/testsuite/gcc.dg/string-opt-1.c.jj  2017-08-01 19:23:09.923512205 
+0200
+++ gcc/testsuite/gcc.dg/string-opt-1.c 2018-04-12 18:57:10.940217129 +0200
@@ -1,4 +1,3 @@
-/* Ensure mempcpy is "optimized" into memcpy followed by addition.  */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
 
@@ -48,5 +47,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 04:30:07PM +, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> > On Thu, Apr 12, 2018 at 03:53:13PM +, Wilco Dijkstra wrote:
> 
> >> The tailcall issue is just a distraction. Historically the handling of 
> >> mempcpy 
> >> has been horribly inefficient in both GCC and GLIBC for practically all 
> >> targets.
> >> This is why it was decided to defer to memcpy.
> >
> > I guess we need to agree to disagree.  But we have a P1 PR that we need to
> > resolve and it is one of the last 6 blockers we have.  I'm not suggesting to
> > revert PR70140, just let use mempcpy libcall if it is what the user wrote 
> > and
> > we aren't expanding it inline.
> 
> Frankly I don't see why it is a P1 regression. Do you have a benchmark that

That is how regression priorities are defined.

> >> So generally it's a good idea to change mempcpy into memcpy by default. 
> >> It's
> >> not slower than calling mempcpy even if you have a fast implementation, 
> >> it's faster
> >> if you use an up to date GLIBC which calls memcpy, and it's significantly 
> >> better
> >> when using an old GLIBC.
> >
> > mempcpy is quite good on many targets even in old GLIBCs.
> 
> Only true if with "many" you mean x86, x86_64 and IIRC sparc.

Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
isn't hard if you have asm version of memcpy and one spare register.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
> >For -Os that is easily measurable regression, for -O2 it depends on the
> >relative speed of memcpy vs. mempcpy and whether one or both of them
> >are in
> >I-cache or not.
> 
> Well, then simply unconditionally not generate a libcall from the move 
> expander? 

We need to generate libcall for many callers and in fact, we don't have a
mode nor a way to tell the caller that we haven't emitted anything.

What we could do is add another enumerator to enum block_op_methods that
would be like BLOCK_OP_NO_LIBCALL, but would not use emit_block_move_via_loop
if move_by_pieces nor emit_block_move_via_movmem can be used, and say
instead return const0_rtx or pc_rtx or some way to tell the caller that
it hasn't emitted anything and in expand_builtin_memory_copy_args
pass for endp == 1 && target != const0_rtx that new BLOCK_OP_NO_LIBCALL_LOOP
to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or pc_rtx
or whatever is chosen).

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote:
>On Thu, Apr 12, 2018 at 04:30:07PM +, Wilco Dijkstra wrote:
>> Jakub Jelinek wrote:

>> Frankly I don't see why it is a P1 regression. Do you have a benchmark that
>
>That is how regression priorities are defined.

How can one justify considering this a release blocker without hard numbers?
If this is a 1% regression on a large body of code it would be very serious, if 
0.01% - 
not so much.

>> >> So generally it's a good idea to change mempcpy into memcpy by default. 
>> >> It's
>> >> not slower than calling mempcpy even if you have a fast implementation, 
>> >> it's faster
>> >> if you use an up to date GLIBC which calls memcpy, and it's significantly 
>> >> better
>> >> when using an old GLIBC.
>> >
>> > mempcpy is quite good on many targets even in old GLIBCs.
>> 
>> Only true if with "many" you mean x86, x86_64 and IIRC sparc.
>
> Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
> in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
> isn't hard if you have asm version of memcpy and one spare register.

More mempcpy implementations have been added in recent years indeed, but almost 
all
add an extra copy of the memcpy code rather than using a single combined 
implementation.
That means it is still better to call memcpy (which is frequently used and thus 
likely in L1/L2)
rather than mempcpy (which is more likely to be cold and thus not cached).

Wilco

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 03:53:13PM +, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> >> Not sure if I missed some important part of the discussion but
> >> for the testcase we want to preserve the tailcall, right?  So
> >> it would be enough to set avoid_libcall to
> >> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> >> stpcpy)?
> 
> The tailcall issue is just a distraction. Historically the handling of 
> mempcpy  
> has been horribly inefficient in both GCC and GLIBC for practically all 
> targets.
> This is why it was decided to defer to memcpy.

I guess we need to agree to disagree.  But we have a P1 PR that we need to
resolve and it is one of the last 6 blockers we have.  I'm not suggesting to
revert PR70140, just let use mempcpy libcall if it is what the user wrote and
we aren't expanding it inline.

> So generally it's a good idea to change mempcpy into memcpy by default. It's

No.

> not slower than calling mempcpy even if you have a fast implementation, it's 
> faster
> if you use an up to date GLIBC which calls memcpy, and it's significantly 
> better
> when using an old GLIBC.

mempcpy is quite good on many targets even in old GLIBCs.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:53:13PM +, Wilco Dijkstra wrote:

>> The tailcall issue is just a distraction. Historically the handling of 
>> mempcpy 
>> has been horribly inefficient in both GCC and GLIBC for practically all 
>> targets.
>> This is why it was decided to defer to memcpy.
>
> I guess we need to agree to disagree.  But we have a P1 PR that we need to
> resolve and it is one of the last 6 blockers we have.  I'm not suggesting to
> revert PR70140, just let use mempcpy libcall if it is what the user wrote and
> we aren't expanding it inline.

Frankly I don't see why it is a P1 regression. Do you have a benchmark that
regresses significantly (a few percent, not by a few bytes)? I already showed
the AArch64 results for GLIBC, do you have x86 results that prove things are
much worse?

>> So generally it's a good idea to change mempcpy into memcpy by default. It's
>> not slower than calling mempcpy even if you have a fast implementation, it's 
>> faster
>> if you use an up to date GLIBC which calls memcpy, and it's significantly 
>> better
>> when using an old GLIBC.
>
> mempcpy is quite good on many targets even in old GLIBCs.

Only true if with "many" you mean x86, x86_64 and IIRC sparc.

Wilco


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread H.J. Lu
On Thu, Apr 12, 2018 at 8:53 AM, Wilco Dijkstra  wrote:

> So generally it's a good idea to change mempcpy into memcpy by default. It's
> not slower than calling mempcpy even if you have a fast implementation, it's 
> faster
> if you use an up to date GLIBC which calls memcpy, and it's significantly 
> better
> when using an old GLIBC.
>

It is a BAD idea for x86.   We don't want to turn mempcpy to to memcpy on
x86.  PERIOD.

-- 
H.J.


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Richard Biener
On April 12, 2018 5:38:44 PM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
>> >For -Os that is easily measurable regression, for -O2 it depends on
>the
>> >relative speed of memcpy vs. mempcpy and whether one or both of them
>> >are in
>> >I-cache or not.
>> 
>> Well, then simply unconditionally not generate a libcall from the
>move expander? 
>
>We need to generate libcall for many callers and in fact, we don't have
>a
>mode nor a way to tell the caller that we haven't emitted anything.
>
>What we could do is add another enumerator to enum block_op_methods
>that
>would be like BLOCK_OP_NO_LIBCALL, but would not use
>emit_block_move_via_loop
>if move_by_pieces nor emit_block_move_via_movmem can be used, and say
>instead return const0_rtx or pc_rtx or some way to tell the caller that
>it hasn't emitted anything and in expand_builtin_memory_copy_args
>pass for endp == 1 && target != const0_rtx that new
>BLOCK_OP_NO_LIBCALL_LOOP
>to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or
>pc_rtx
>or whatever is chosen).

Yes. Emit the "original" call whenever inline expansion fails. 

Richard. 

>   Jakub



Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
>> Not sure if I missed some important part of the discussion but
>> for the testcase we want to preserve the tailcall, right?  So
>> it would be enough to set avoid_libcall to
>> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
>> stpcpy)?

The tailcall issue is just a distraction. Historically the handling of mempcpy  
has been horribly inefficient in both GCC and GLIBC for practically all targets.
This is why it was decided to defer to memcpy.

For example small constant mempcpy was not expanded inline like memcpy
until PR70140 was fixed. Except for a few targets which have added an
optimized mempcpy, the default mempcpy implementation in almost all
released GLIBCs is much slower than memcpy (due to using a badly written
C implementation).

Recent GLIBCs now call the optimized memcpy - this is better but still adds
extra call/return overheads. So to improve that the GLIBC headers have an
inline that changes any call to mempcpy into memcpy (this is the default but
can be disabled on a per-target basis).

Obviously it is best to do this optimization in GCC, which is what we finally do
in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is
not common - in all of GLIBC the inlining on AArch64 adds 166 extra instructions
and 12 callee-save registers. This is a small codesize cost to avoid the 
overhead
of calling the generic C version.

> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.

The question is who is going to write the 30+ mempcpy implementations for all
those targets which don't have one? And who says doing this is actually going 
to 
improve performance? Having mempcpy+memcpy typically means more Icache
misses in code that uses both.

So generally it's a good idea to change mempcpy into memcpy by default. It's
not slower than calling mempcpy even if you have a fast implementation, it's 
faster
if you use an up to date GLIBC which calls memcpy, and it's significantly better
when using an old GLIBC.

Wilco


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Richard Biener
On April 12, 2018 4:31:12 PM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
>> Well, but that wouldn't be a fix for a regression and IMHO there's
>> no reason for a really lame mempcpy.  If targets disgree well,
>
>It is a regression as well, in the past we've emitted mempcpy when user
>wrote mempcpy, now we don't.
>
>E.g.
>extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
>void bar (void *, void *, void *);
>
>void
>foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
>{
>  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
>}
>
>is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in
>foo, while
>on the trunk uses 3 memcpy calls and 96 bytes in foo.
>
>For -Os that is easily measurable regression, for -O2 it depends on the
>relative speed of memcpy vs. mempcpy and whether one or both of them
>are in
>I-cache or not.

Well, then simply unconditionally not generate a libcall from the move 
expander? 

>
>> then they get what they deserve.
>> 
>> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
>> the default non-stupid one kicks in (it exactly looks like my C
>> version)
>
>   Jakub



Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
> Well, but that wouldn't be a fix for a regression and IMHO there's
> no reason for a really lame mempcpy.  If targets disgree well,

It is a regression as well, in the past we've emitted mempcpy when user
wrote mempcpy, now we don't.

E.g.
extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
void bar (void *, void *, void *);

void
foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
{
  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
}

is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in foo, 
while
on the trunk uses 3 memcpy calls and 96 bytes in foo.

For -Os that is easily measurable regression, for -O2 it depends on the
relative speed of memcpy vs. mempcpy and whether one or both of them are in
I-cache or not.

> then they get what they deserve.
> 
> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
> the default non-stupid one kicks in (it exactly looks like my C
> version)

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> Not sure if I missed some important part of the discussion but
> for the testcase we want to preserve the tailcall, right?  So
> it would be enough to set avoid_libcall to
> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> stpcpy)?

For the testcase yes.  There the question is if some targets have so lame
mempcpy that using a tailcall to mempcpy is slower over avoiding the
tailcall (and on aarch64 it looked like maintainer's choice to have lame
mempcpy and hope the compiler will avoid it at all costs).  On the other
side, that change has been forced over to all targets, even when they don't
have lame mempcpy.
So, the tailcall is one issue, and we can either use mempcpy if endp
and CALL_EXPR_TAILCALL, or only do that if -Os.

And another issue is mempcpy uses in other contexts, here again I think x86
has good enough mempcpy that if I use
foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
but not so on targets with lame mempcpy.

My preference would be to have non-lame mempcpy etc. on all targets, but the
aarch64 folks disagree.

So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
either FAST_SPEED for mempcpy (regardless of the context), or not
SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
for it to be used every time it makes sense from caller POV, and have the
default something in between (only use it in tail calls).

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Richard Biener
On Thu, 12 Apr 2018, Jakub Jelinek wrote:

> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> > Not sure if I missed some important part of the discussion but
> > for the testcase we want to preserve the tailcall, right?  So
> > it would be enough to set avoid_libcall to
> > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> > stpcpy)?
> 
> For the testcase yes.  There the question is if some targets have so lame
> mempcpy that using a tailcall to mempcpy is slower over avoiding the
> tailcall (and on aarch64 it looked like maintainer's choice to have lame
> mempcpy and hope the compiler will avoid it at all costs).  On the other
> side, that change has been forced over to all targets, even when they don't
> have lame mempcpy.
> So, the tailcall is one issue, and we can either use mempcpy if endp
> and CALL_EXPR_TAILCALL, or only do that if -Os.
> 
> And another issue is mempcpy uses in other contexts, here again I think x86
> has good enough mempcpy that if I use
> foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
> but not so on targets with lame mempcpy.
> 
> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.
> 
> So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
> either FAST_SPEED for mempcpy (regardless of the context), or not
> SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
> so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
> for it to be used every time it makes sense from caller POV, and have the
> default something in between (only use it in tail calls).

Well, but that wouldn't be a fix for a regression and IMHO there's
no reason for a really lame mempcpy.  If targets disgree well,
then they get what they deserve.

I don't see any aarch64 specific mempcpy in glibc btw so hopefully
the default non-stupid one kicks in (it exactly looks like my C
version)

Richard.

>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Richard Biener
On Thu, 12 Apr 2018, Martin Liška wrote:

> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

Not sure if I missed some important part of the discussion but
for the testcase we want to preserve the tailcall, right?  So
it would be enough to set avoid_libcall to
endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
stpcpy)?

I'm not sure I like the interfacing of that to emit_block_move_hints
very much.  I'd have used sth like BLOCK_OP_ABORT_ON_LIBCALL
and extend the interface in a way to return what kind of method it
chose rather than just a bool.

Not sure what gcc.dg/20050503-1.c did on non-x86 targets - the
test runs on all archs but only x86 is ever tested for a result.

So - I think tail-calling is prefered, and somehow in the PR
the discussion wandered off to whether there's fast implementations
or not - but the testcase looks for a tailcall where the source
was a tailcall, that should be authorative for the "default"
decision when the hook isn't implemented or doesn't cover the case.

IMO target libraries have to be quite stupid if they have anything
slower than

void *mempcpy (void *dest, const void *src, size_t n)
{
  return memcpy (dest, src, n) + n;
}

which should be not (very much) slower than a non-tailcall memcpy call.

So -- remove the hook and instead use CALL_EXPR_TAILCALL (exp) instead
of its result.

Thanks,
Richard.

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Jan Hubicka
> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

OK for i386 bits.
Honza
> 
> Thanks,
> Martin


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Martin Liška
Hi.

I'm reminding review request from Richi for generic part
and Uros/Honza for target part.

Thanks,
Martin


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-10 Thread Martin Liška
On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-28  Martin Liska  
>>
>>  * gcc.dg/string-opt-1.c:
> 
> I guess you really didn't mean to keep the above entry around, just the one
> below, right?

Sure, fixed.

> 
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-14  Martin Liska  
>>
>>  * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>>  and others.
> 
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>  x86_64-*-linux*)
>>  tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h 
>> i386/linux64.h"
>>  extra_options="${extra_options} linux-android.opt"
>> +extra_objs="${extra_objs} x86-linux.o"
>>  ;;
> 
> The should go into the i[34567]86-*-linux*) case too (outside of the
> if test x$enable_targets = xall; then conditional).
> Or maybe better, remove the above and do it in:
> i[34567]86-*-linux* | x86_64-*-linux*)
> extra_objs="${extra_objs} cet.o"
> tmake_file="$tmake_file i386/t-linux i386/t-cet"
> ;;
> spot, just add x86-linux.o next to cet.o.

Done.

> 
>> --- a/gcc/config/i386/linux.h
>> +++ b/gcc/config/i386/linux.h
>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>  
>>  #undef MUSL_DYNAMIC_LINKER
>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>> index f2d913e30ac..d855f5cc239 100644
>> --- a/gcc/config/i386/linux64.h
>> +++ b/gcc/config/i386/linux64.h
>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
>> If not, see
>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>  #undef MUSL_DYNAMIC_LINKERX32
>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> 
> And the above two changes should be replaced by a change in
> gcc/config/i386/linux-common.h.

Likewise.

> 
>> +#include "coretypes.h"
>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
> 
> Why do you need cp/cp-tree.h?  That is just too weird.
> The function just uses libc_speed (in core-types.h, built_in_function
> (likewise), OPTION_GLIBC (config/linux.h).

I ended up with minimal set of includes:

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"

I'm retesting the patch.

Martin

> 
>   Jakub
> 

>From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c  | 15 ++-
 gcc/config.gcc  |  2 +-
 gcc/config/i386/i386.c  |  5 
 gcc/config/i386/linux-common.h  |  2 ++
 gcc/config/i386/t-linux |  6 +
 gcc/config/i386/x86-linux.c | 52 +
 gcc/config/linux-protos.h   |  1 +
 gcc/coretypes.h |  7 +
 gcc/doc/tm.texi |  4 +++
 gcc/doc/tm.texi.in  |  1 +
 gcc/expr.c  | 11 +++-
 gcc/expr.h  |  3 ++-
 gcc/target.def  |  7 +
 gcc/targhooks.c |  9 +++
 gcc/targhooks.h |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 16 files changed, 125 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-10 Thread Jakub Jelinek
On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
> gcc/testsuite/ChangeLog:
> 
> 2018-03-28  Martin Liska  
> 
>   * gcc.dg/string-opt-1.c:

I guess you really didn't mean to keep the above entry around, just the one
below, right?

> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  
> 
>   * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>   and others.

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>   x86_64-*-linux*)
>   tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h 
> i386/linux64.h"
>   extra_options="${extra_options} linux-android.opt"
> + extra_objs="${extra_objs} x86-linux.o"
>   ;;

The should go into the i[34567]86-*-linux*) case too (outside of the
if test x$enable_targets = xall; then conditional).
Or maybe better, remove the above and do it in:
i[34567]86-*-linux* | x86_64-*-linux*)
extra_objs="${extra_objs} cet.o"
tmake_file="$tmake_file i386/t-linux i386/t-cet"
;;
spot, just add x86-linux.o next to cet.o.

> --- a/gcc/config/i386/linux.h
> +++ b/gcc/config/i386/linux.h
> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #undef MUSL_DYNAMIC_LINKER
>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
> index f2d913e30ac..d855f5cc239 100644
> --- a/gcc/config/i386/linux64.h
> +++ b/gcc/config/i386/linux64.h
> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>  #undef MUSL_DYNAMIC_LINKERX32
>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed

And the above two changes should be replaced by a change in
gcc/config/i386/linux-common.h.

> +#include "coretypes.h"
> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */

Why do you need cp/cp-tree.h?  That is just too weird.
The function just uses libc_speed (in core-types.h, built_in_function
(likewise), OPTION_GLIBC (config/linux.h).

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-09 Thread Martin Liška
Hi.

I'm sending updated version of the patch.

Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives 
regression tests.

Martin
>From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c  | 15 ++-
 gcc/config.gcc  |  1 +
 gcc/config/i386/i386.c  |  5 
 gcc/config/i386/linux.h |  2 ++
 gcc/config/i386/linux64.h   |  2 ++
 gcc/config/i386/t-linux |  6 +
 gcc/config/i386/x86-linux.c | 54 +
 gcc/config/linux-protos.h   |  1 +
 gcc/coretypes.h |  7 +
 gcc/doc/tm.texi |  4 +++
 gcc/doc/tm.texi.in  |  1 +
 gcc/expr.c  | 11 +++-
 gcc/expr.h  |  3 ++-
 gcc/target.def  |  7 +
 gcc/targhooks.c |  9 +++
 gcc/targhooks.h |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 17 files changed, 129 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+ In situations when a libc library provides fast implementation
+ of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+= (endp == 1
+   && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+   && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
  CALL_EXPR_TAILCALL (exp)
  && (endp == 0 || target == const0_rtx)
  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
  expected_align, expected_size,
- min_size, max_size, probable_max_size);
+ min_size, max_size, probable_max_size,
+ avoid_libcall ? _avoided : NULL);
+
+  if (libcall_avoided)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..6445ff569b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 	x86_64-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
 		extra_options="${extra_options} linux-android.opt"
+		extra_objs="${extra_objs} x86-linux.o"
 	  	;;
 	x86_64-*-kfreebsd*-gnu)
 		tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index 69f97f15b0d..6c59cbd6d62 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-04 Thread Martin Liška
PING^1

On 03/29/2018 02:31 PM, Martin Liška wrote:
> On 03/29/2018 02:25 PM, Jakub Jelinek wrote:
>> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:
>>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
 On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
> --- a/gcc/config/linux.c
> +++ b/gcc/config/linux.c
> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>  return false;
>    }
> +
> +/* This hook determines whether a function from libc has a fast 
> implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C 
> library
> +   as this combination provides fast implementation of mempcpy function. 
>  */
> +
> +enum libc_speed
> +ix86_linux_libc_func_speed (int fn)

 Putting a ix86_ function into config/linux.c used by most linux targets is
 weird.  Either we multiple linux targets with mempcpy fast, then name it
 somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
 And yes, we do care about i?86-linux.  Or it is for x86 only, and then
 it shouldn't be in config/linux.c, but either e.g. static inline in
 config/i386/linux.h, or we need config/i386/linux.c if we don't have it
 already.
>>>
>>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can 
>>> you please
>>
>> Can't you just put it into gcc/config/i386/linux-common.h as static inline,
>> so that it is optimized away whenever not needed?
> 
> I would like to put it there, but:
> 
> g++ -c   -g  -DIN_GCC -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc 
> -I../../gcc/build -I../../gcc/../include  -I../../gcc/../libcpp/include  \
> -o build/genpreds.o ../../gcc/genpreds.c
> In file included from ./tm.h:38:0,
>  from ../../gcc/genpreds.c:26:
> ../../gcc/config/i386/linux-common.h: In function ‘libc_speed 
> ix86_linux_libc_func_speed(int)’:
> ../../gcc/config/i386/linux-common.h:137:8: error: use of enum 
> ‘built_in_function’ without previous declaration
>    enum built_in_function f = (built_in_function)fn;
>     ^
> ../../gcc/config/i386/linux-common.h:137:31: error: ‘built_in_function’ was 
> not declared in this scope
>    enum built_in_function f = (built_in_function)fn;
>    ^
> ../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: 
> ‘machine_function’
>    enum built_in_function f = (built_in_function)fn;
>    ^
>    machine_function
> ../../gcc/config/i386/linux-common.h:144:12: error: ‘BUILT_IN_MEMPCPY’ was 
> not declared in this scope
>    case BUILT_IN_MEMPCPY:
>     ^~~~
> Martin
> 
>>
>> If you really want to add a c file, it better not be called linux.c, because
>> linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
>> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
>> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
>> config/t-linux).
>>
>>> help me how to conditionally build the file?
>>
>> Jakub
>>
> 



Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-29 Thread Jakub Jelinek
On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
> > On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
> > > --- a/gcc/config/linux.c
> > > +++ b/gcc/config/linux.c
> > > @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
> > > return false;
> > >   }
> > > +
> > > +/* This hook determines whether a function from libc has a fast 
> > > implementation
> > > +   FN is present at the runtime.  We override it for i386 and glibc C 
> > > library
> > > +   as this combination provides fast implementation of mempcpy function. 
> > >  */
> > > +
> > > +enum libc_speed
> > > +ix86_linux_libc_func_speed (int fn)
> > 
> > Putting a ix86_ function into config/linux.c used by most linux targets is
> > weird.  Either we multiple linux targets with mempcpy fast, then name it
> > somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
> > And yes, we do care about i?86-linux.  Or it is for x86 only, and then
> > it shouldn't be in config/linux.c, but either e.g. static inline in
> > config/i386/linux.h, or we need config/i386/linux.c if we don't have it
> > already.
> 
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can 
> you please

Can't you just put it into gcc/config/i386/linux-common.h as static inline,
so that it is optimized away whenever not needed?

If you really want to add a c file, it better not be called linux.c, because
linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
config/t-linux).

> help me how to conditionally build the file?

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-29 Thread Martin Liška

On 03/29/2018 02:25 PM, Jakub Jelinek wrote:

On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:

On 03/28/2018 06:36 PM, Jakub Jelinek wrote:

On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:

--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
 return false;
   }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)


Putting a ix86_ function into config/linux.c used by most linux targets is
weird.  Either we multiple linux targets with mempcpy fast, then name it
somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
And yes, we do care about i?86-linux.  Or it is for x86 only, and then
it shouldn't be in config/linux.c, but either e.g. static inline in
config/i386/linux.h, or we need config/i386/linux.c if we don't have it
already.


I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you 
please


Can't you just put it into gcc/config/i386/linux-common.h as static inline,
so that it is optimized away whenever not needed?


I would like to put it there, but:

g++ -c   -g  -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
-DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc 
-I../../gcc/build -I../../gcc/../include  -I../../gcc/../libcpp/include  \
-o build/genpreds.o ../../gcc/genpreds.c
In file included from ./tm.h:38:0,
 from ../../gcc/genpreds.c:26:
../../gcc/config/i386/linux-common.h: In function ‘libc_speed 
ix86_linux_libc_func_speed(int)’:
../../gcc/config/i386/linux-common.h:137:8: error: use of enum 
‘built_in_function’ without previous declaration
   enum built_in_function f = (built_in_function)fn;
^
../../gcc/config/i386/linux-common.h:137:31: error: ‘built_in_function’ was not 
declared in this scope
   enum built_in_function f = (built_in_function)fn;
   ^
../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: 
‘machine_function’
   enum built_in_function f = (built_in_function)fn;
   ^
   machine_function
../../gcc/config/i386/linux-common.h:144:12: error: ‘BUILT_IN_MEMPCPY’ was not 
declared in this scope
   case BUILT_IN_MEMPCPY:
^~~~
Martin



If you really want to add a c file, it better not be called linux.c, because
linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
config/t-linux).


help me how to conditionally build the file?


Jakub





Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-29 Thread H.J. Lu
On Thu, Mar 29, 2018 at 4:28 AM, Martin Liška  wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>
>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>>>
>>> --- a/gcc/config/linux.c
>>> +++ b/gcc/config/linux.c
>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>   return false;
>>>   }
>>> +
>>> +/* This hook determines whether a function from libc has a fast
>>> implementation
>>> +   FN is present at the runtime.  We override it for i386 and glibc C
>>> library
>>> +   as this combination provides fast implementation of mempcpy function.
>>> */
>>> +
>>> +enum libc_speed
>>> +ix86_linux_libc_func_speed (int fn)
>>
>>
>> Putting a ix86_ function into config/linux.c used by most linux targets is
>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>> it shouldn't be in config/linux.c, but either e.g. static inline in
>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>> already.
>
>
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can
> you please
> help me how to conditionally build the file?

config.gcc:  extra_objs="${extra_objs} linux.o"
config.gcc: extra_objs="$extra_objs powerpcspe-linux.o"
config.gcc: extra_objs="$extra_objs rs6000-linux.o"


-- 
H.J.


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-29 Thread Martin Liška

On 03/28/2018 06:36 PM, Jakub Jelinek wrote:

On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:

--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
  
return false;

  }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)


Putting a ix86_ function into config/linux.c used by most linux targets is
weird.  Either we multiple linux targets with mempcpy fast, then name it
somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
And yes, we do care about i?86-linux.  Or it is for x86 only, and then
it shouldn't be in config/linux.c, but either e.g. static inline in
config/i386/linux.h, or we need config/i386/linux.c if we don't have it
already.


I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you 
please
help me how to conditionally build the file?

Martin



Jakub





Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-28 Thread Jakub Jelinek
On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
> --- a/gcc/config/linux.c
> +++ b/gcc/config/linux.c
> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>  
>return false;
>  }
> +
> +/* This hook determines whether a function from libc has a fast 
> implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_linux_libc_func_speed (int fn)

Putting a ix86_ function into config/linux.c used by most linux targets is
weird.  Either we multiple linux targets with mempcpy fast, then name it
somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
And yes, we do care about i?86-linux.  Or it is for x86 only, and then
it shouldn't be in config/linux.c, but either e.g. static inline in
config/i386/linux.h, or we need config/i386/linux.c if we don't have it
already.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-28 Thread Martin Liška

On 03/28/2018 04:31 PM, Jakub Jelinek wrote:

On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin Liška wrote:

2018-03-14  Martin Liska  

PR middle-end/81657
* builtins.c (expand_builtin_memory_copy_args): Handle situation
when libc library provides a fast mempcpy implementation/
* config/i386/i386-protos.h (gnu_libc_func_speed): New.
* config/i386/i386.c (enum libc_speed): Likewise.
(ix86_libc_func_speed): Likewise.
(TARGET_LIBC_FUNC_SPEED): Likewise.
* coretypes.h (enum libc_speed): Likewise.
* doc/tm.texi: Document new target hook.
* doc/tm.texi.in: Likewise.
* expr.c (emit_block_move_hints): Handle libc bail out argument.
* expr.h (emit_block_move_hints): Add new parameters.
* target.def: Add new hook.
* targhooks.c (enum libc_speed): New enum.
(default_libc_func_speed): Provide a default hook
implementation.
* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
and others.
/* Copy word part most expediently.  */
+  bool libcall_avoided = false;
dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 CALL_EXPR_TAILCALL (exp)
 && (endp == 0 || target == const0_rtx)
 ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 expected_align, expected_size,
-min_size, max_size, probable_max_size);
+min_size, max_size, probable_max_size,
+avoid_libcall ? _avoided: NULL);


Missing space before :.


Fixed.


+
+  if (libcall_avoided)
+return NULL_RTX;
  
--- a/gcc/config/i386/i386-protos.h

+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
  
  extern bool ix86_using_red_zone (void);
  
+extern enum libc_speed gnu_libc_func_speed (int fn);


Here you declare gnu_libc_func_speed.


--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
  && (!cfun->machine->has_local_indirect_jump
  || cfun->machine->indirect_branch_type == indirect_branch_keep));
  }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_libc_func_speed (int fn)


But define a different function.


You are right.




+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+return LIBC_UNKNOWN_SPEED;


OPTION_GLIBC is only defined if linux.h is included, so I think you break
all other x86 targets this way.


That said the proper way is probably to define the function in linux64.c
(should we also use it on i386, or nobody cares?). And corresponding declaration
will be in config/linux-protos.h.

The hook should have linux in the name,

perhaps defined only in config/i386/linux.h and redefined in i386.c through
#ifdef SUBTARGET_LIBC_FUNC_SPEED
#undef TARGET_LIBC_FUNC_SPEED
#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
#endif
or something similar.


I adopted this suggested mechanism.
Thanks for review.

Martin



Otherwise LGTM, but please get approval also from Richi (for the generic part) 
and
Uros (for the backend side).

Jakub



>From 4378fa93aafac3d28eaae48c9a41a2f2ef5e1d18 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	* config/linux.c (ix86_linux_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-28 Thread Jakub Jelinek
On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin Liška wrote:
> 2018-03-14  Martin Liska  
> 
>   PR middle-end/81657
>   * builtins.c (expand_builtin_memory_copy_args): Handle situation
>   when libc library provides a fast mempcpy implementation/
>   * config/i386/i386-protos.h (gnu_libc_func_speed): New.
>   * config/i386/i386.c (enum libc_speed): Likewise.
>   (ix86_libc_func_speed): Likewise.
>   (TARGET_LIBC_FUNC_SPEED): Likewise.
>   * coretypes.h (enum libc_speed): Likewise.
>   * doc/tm.texi: Document new target hook.
>   * doc/tm.texi.in: Likewise.
>   * expr.c (emit_block_move_hints): Handle libc bail out argument.
>   * expr.h (emit_block_move_hints): Add new parameters.
>   * target.def: Add new hook.
>   * targhooks.c (enum libc_speed): New enum.
>   (default_libc_func_speed): Provide a default hook
>   implementation.
>   * targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  
> 
>   * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>   and others.
>/* Copy word part most expediently.  */
> +  bool libcall_avoided = false;
>dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>CALL_EXPR_TAILCALL (exp)
>&& (endp == 0 || target == const0_rtx)
>? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>expected_align, expected_size,
> -  min_size, max_size, probable_max_size);
> +  min_size, max_size, probable_max_size,
> +  avoid_libcall ? _avoided: NULL);

Missing space before :.
> +
> +  if (libcall_avoided)
> +return NULL_RTX;
>  
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
>  
>  extern bool ix86_using_red_zone (void);
>  
> +extern enum libc_speed gnu_libc_func_speed (int fn);

Here you declare gnu_libc_func_speed.

> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
> && (!cfun->machine->has_local_indirect_jump
> || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +
> +/* This hook determines whether a function from libc has a fast 
> implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_libc_func_speed (int fn)

But define a different function.

> +{
> +  enum built_in_function f = (built_in_function)fn;
> +
> +  if (!OPTION_GLIBC)
> +return LIBC_UNKNOWN_SPEED;

OPTION_GLIBC is only defined if linux.h is included, so I think you break
all other x86 targets this way.  The hook should have linux in the name,
perhaps defined only in config/i386/linux.h and redefined in i386.c through
#ifdef SUBTARGET_LIBC_FUNC_SPEED
#undef TARGET_LIBC_FUNC_SPEED
#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
#endif
or something similar.

Otherwise LGTM, but please get approval also from Richi (for the generic part) 
and
Uros (for the backend side).

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-28 Thread Martin Liška

On 03/21/2018 11:34 AM, Jakub Jelinek wrote:

On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin Liška wrote:

The variable is not named very well, shouldn't it be avoid_libcall or
something similar?  Perhaps the variable should have a comment describing
what it is.  Do you need separate argument for that bool and
is_move_done, rather than just the flag being that some pointer to bool is
non-NULL?


Can you please explain me how to replace the 2 new arguments?


So you have one bool arg and one pointer arg into which the function
stores true and optionally based on that other bool arg and other conditions
stores false.
I'm suggesting just one bool * argument, which is NULL if the bool arg would
be false, and non-NULL otherwise.  The single argument still could be
called bool *avoid_libcall, and you'd just
if (avoid_libcall) { *avoid_libcall = true; return retval; }
instead of emitting a libcall, the caller would initialize the bool
variable to false.

Jakub



Got it. I'm sending updated version of the patch. Hope I've had enough fantasy 
to
write it nice.

Tested on both ppc64le and x86_64.

Martin
>From d766330364aa2a23512f7d4e60491b634c5d0523 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c  | 15 ++-
 gcc/config/i386/i386-protos.h   |  2 ++
 gcc/config/i386/i386.c  | 24 
 gcc/coretypes.h |  7 +++
 gcc/doc/tm.texi |  4 
 gcc/doc/tm.texi.in  |  1 +
 gcc/expr.c  | 11 ++-
 gcc/expr.h  |  3 ++-
 gcc/target.def  |  7 +++
 gcc/targhooks.c |  9 +
 gcc/targhooks.h |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 12 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..d3cd93ffbfa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+ In situations when a libc library provides fast implementation
+ of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+= (endp == 1
+   && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+   && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
  CALL_EXPR_TAILCALL (exp)
  && (endp == 0 || target == const0_rtx)
  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
  expected_align, expected_size,
- min_size, max_size, probable_max_size);
+ min_size, max_size, probable_max_size,
+ avoid_libcall ? _avoided: NULL);
+
+  if (libcall_avoided)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..aadf9fa5ac3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	  || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+/* This hook 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-21 Thread Jakub Jelinek
On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin Liška wrote:
> > The variable is not named very well, shouldn't it be avoid_libcall or
> > something similar?  Perhaps the variable should have a comment describing
> > what it is.  Do you need separate argument for that bool and
> > is_move_done, rather than just the flag being that some pointer to bool is
> > non-NULL?
> 
> Can you please explain me how to replace the 2 new arguments?

So you have one bool arg and one pointer arg into which the function
stores true and optionally based on that other bool arg and other conditions
stores false.
I'm suggesting just one bool * argument, which is NULL if the bool arg would
be false, and non-NULL otherwise.  The single argument still could be
called bool *avoid_libcall, and you'd just
if (avoid_libcall) { *avoid_libcall = true; return retval; }
instead of emitting a libcall, the caller would initialize the bool
variable to false.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-18 Thread Martin Liška

PING^1

On 03/14/2018 02:54 PM, Martin Liška wrote:

On 03/14/2018 02:07 PM, Jakub Jelinek wrote:

On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:

--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, 
tree len,
src_mem = get_memory_rtx (src, len);
set_mem_align (src_mem, src_align);
  
+  bool is_move_done;

+
/* Copy word part most expediently.  */


This comment supposedly belongs right above the emit_block_move_hints call.


+  bool bail_out_libcall = endp == 1
+&& targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;


Formatting.  && belongs below endp.  So either:
   bool bail_out_libcall
 = (endp == 1
&& ...);
or
   bool bail_out_libcall = false;
   if (endp == 1
   && ...)
 bail_out_libcall = true;
?
The variable is not named very well, shouldn't it be avoid_libcall or
something similar?  Perhaps the variable should have a comment describing
what it is.  Do you need separate argument for that bool and
is_move_done, rather than just the flag being that some pointer to bool is
non-NULL?


Can you please explain me how to replace the 2 new arguments?




dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 CALL_EXPR_TAILCALL (exp)
 && (endp == 0 || target == const0_rtx)
 ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 expected_align, expected_size,
-min_size, max_size, probable_max_size);
+min_size, max_size, probable_max_size,
+bail_out_libcall, _move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+ we have a target that provides a fast implementation
+ of mempcpy routine.  */
+  if (!is_move_done)
+return NULL_RTX;
  
if (dest_addr == 0)

  {
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
  && (!cfun->machine->has_local_indirect_jump
  || cfun->machine->indirect_branch_type == indirect_branch_keep));
  }
+


Missing function comment here.  For target hooks, usually there is a copy of
the target.def comment, perhaps with further details on why it is overridden
and what it does.

--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
EXCESS_PRECISION_TYPE_FAST
  };
  


Missing comment describing what it is, plus it the enumerators are too
generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?

--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class 
ATTRIBUTE_UNUSED)
return false;
  }
  


Again, missing function comment.


+enum libc_speed
+default_libc_func_speed (int)
+{
+  return UNKNOWN_SPEED;
+}
+



--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,5 @@ main (void)
return 0;
  }
  
-/* { dg-final { scan-assembler-not "\" } } */

-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } 
} } } } */


First of all, I don't really understand this, I'd expect both the
two old dg-final lines to be used as is for non-x86 targets and another two
for x86_64, and more importantly, the target hook is only for glibc, not for
musl/bionic etc., nor for non-linux, so you probably need some effective
target for it for whether it is glibc rather than musl/bionic, and use
...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
patterns so that it succeeds on both variants of the target hook, but then
does the test test anything at all?


I fixed that by preserving the old 2 old-finals and then I added a new for 
x86_64 target.
Apart from the comment above I've fixed all nits and mempcpy is not used when 
LHS == NULL.

Martin



Jakub





Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread Martin Liška
On 03/14/2018 02:07 PM, Jakub Jelinek wrote:
> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree 
>> src, tree len,
>>src_mem = get_memory_rtx (src, len);
>>set_mem_align (src_mem, src_align);
>>  
>> +  bool is_move_done;
>> +
>>/* Copy word part most expediently.  */
> 
> This comment supposedly belongs right above the emit_block_move_hints call.
> 
>> +  bool bail_out_libcall = endp == 1
>> +&& targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
> 
> Formatting.  && belongs below endp.  So either:
>   bool bail_out_libcall
> = (endp == 1
>&& ...);
> or
>   bool bail_out_libcall = false;
>   if (endp == 1
>   && ...)
> bail_out_libcall = true;
> ?
> The variable is not named very well, shouldn't it be avoid_libcall or
> something similar?  Perhaps the variable should have a comment describing
> what it is.  Do you need separate argument for that bool and
> is_move_done, rather than just the flag being that some pointer to bool is
> non-NULL?

Can you please explain me how to replace the 2 new arguments?

> 
>>dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>   CALL_EXPR_TAILCALL (exp)
>>   && (endp == 0 || target == const0_rtx)
>>   ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>   expected_align, expected_size,
>> - min_size, max_size, probable_max_size);
>> + min_size, max_size, probable_max_size,
>> + bail_out_libcall, _move_done);
>> +
>> +  /* Bail out when a mempcpy call would be expanded as libcall and when
>> + we have a target that provides a fast implementation
>> + of mempcpy routine.  */
>> +  if (!is_move_done)
>> +return NULL_RTX;
>>  
>>if (dest_addr == 0)
>>  {
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>>&& (!cfun->machine->has_local_indirect_jump
>>|| cfun->machine->indirect_branch_type == indirect_branch_keep));
>>  }
>> +
> 
> Missing function comment here.  For target hooks, usually there is a copy of
> the target.def comment, perhaps with further details on why it is overridden
> and what it does.
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -384,6 +384,13 @@ enum excess_precision_type
>>EXCESS_PRECISION_TYPE_FAST
>>  };
>>  
> 
> Missing comment describing what it is, plus it the enumerators are too
> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
>> --- a/gcc/targhooks.c
>> +++ b/gcc/targhooks.c
>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class 
>> fn_class ATTRIBUTE_UNUSED)
>>return false;
>>  }
>>  
> 
> Again, missing function comment.
> 
>> +enum libc_speed
>> +default_libc_func_speed (int)
>> +{
>> +  return UNKNOWN_SPEED;
>> +}
>> +
> 
>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
>> @@ -48,5 +48,5 @@ main (void)
>>return 0;
>>  }
>>  
>> -/* { dg-final { scan-assembler-not "\" } } */
>> -/* { dg-final { scan-assembler "memcpy" } } */
>> +/* { dg-final { scan-assembler-not "\" { target { i?86-*-* 
>> x86_64-*-* } } } } */
>> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* 
>> } } } } } */
> 
> First of all, I don't really understand this, I'd expect both the
> two old dg-final lines to be used as is for non-x86 targets and another two
> for x86_64, and more importantly, the target hook is only for glibc, not for
> musl/bionic etc., nor for non-linux, so you probably need some effective
> target for it for whether it is glibc rather than musl/bionic, and use
> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
> patterns so that it succeeds on both variants of the target hook, but then
> does the test test anything at all?

I fixed that by preserving the old 2 old-finals and then I added a new for 
x86_64 target.
Apart from the comment above I've fixed all nits and mempcpy is not used when 
LHS == NULL.

Martin

> 
>   Jakub
> 

>From 26979038ce9500015f957afd896146022a38490b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread Jakub Jelinek
On Wed, Mar 14, 2018 at 02:08:07PM +0100, Martin Liška wrote:
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > index d82e2232d7b..91e1c87f83f 100644
> > --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > @@ -62,7 +62,7 @@ main_test (void)
> >mempcpy (p + 5, s3, 1);
> >if (memcmp (p, "ABCDEFg", 8))
> >  abort ();
> > -  mempcpy (p + 6, s1 + 1, l1);
> > +  memcpy (p + 6, s1 + 1, l1);
> >if (memcmp (p, "ABCDEF2", 8))
> >  abort ();
> >  }
> > 
> > This is a mempcpy test.  Why is mempcpy changed to memcpy?
> > 
> 
> Because this mempcpy is not optimized out to memcpy and it then aborts.

Why it isn't optimized to memcpy?  If the lhs is unused, it always should be
folded to memcpy, regardless of whether mempcpy is fast or not (I assume no
target has slow memcpy and fast mempcpy).

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread Jakub Jelinek
On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, 
> tree len,
>src_mem = get_memory_rtx (src, len);
>set_mem_align (src_mem, src_align);
>  
> +  bool is_move_done;
> +
>/* Copy word part most expediently.  */

This comment supposedly belongs right above the emit_block_move_hints call.

> +  bool bail_out_libcall = endp == 1
> +&& targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;

Formatting.  && belongs below endp.  So either:
  bool bail_out_libcall
= (endp == 1
   && ...);
or
  bool bail_out_libcall = false;
  if (endp == 1
  && ...)
bail_out_libcall = true;
?
The variable is not named very well, shouldn't it be avoid_libcall or
something similar?  Perhaps the variable should have a comment describing
what it is.  Do you need separate argument for that bool and
is_move_done, rather than just the flag being that some pointer to bool is
non-NULL?

>dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>CALL_EXPR_TAILCALL (exp)
>&& (endp == 0 || target == const0_rtx)
>? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>expected_align, expected_size,
> -  min_size, max_size, probable_max_size);
> +  min_size, max_size, probable_max_size,
> +  bail_out_libcall, _move_done);
> +
> +  /* Bail out when a mempcpy call would be expanded as libcall and when
> + we have a target that provides a fast implementation
> + of mempcpy routine.  */
> +  if (!is_move_done)
> +return NULL_RTX;
>  
>if (dest_addr == 0)
>  {
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
> && (!cfun->machine->has_local_indirect_jump
> || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +

Missing function comment here.  For target hooks, usually there is a copy of
the target.def comment, perhaps with further details on why it is overridden
and what it does.
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -384,6 +384,13 @@ enum excess_precision_type
>EXCESS_PRECISION_TYPE_FAST
>  };
>  

Missing comment describing what it is, plus it the enumerators are too
generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class 
> ATTRIBUTE_UNUSED)
>return false;
>  }
>  

Again, missing function comment.

> +enum libc_speed
> +default_libc_func_speed (int)
> +{
> +  return UNKNOWN_SPEED;
> +}
> +

> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
> @@ -48,5 +48,5 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-assembler-not "\" } } */
> -/* { dg-final { scan-assembler "memcpy" } } */
> +/* { dg-final { scan-assembler-not "\" { target { i?86-*-* 
> x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* 
> } } } } } */

First of all, I don't really understand this, I'd expect both the
two old dg-final lines to be used as is for non-x86 targets and another two
for x86_64, and more importantly, the target hook is only for glibc, not for
musl/bionic etc., nor for non-linux, so you probably need some effective
target for it for whether it is glibc rather than musl/bionic, and use
...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
patterns so that it succeeds on both variants of the target hook, but then
does the test test anything at all?

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread Martin Liška
On 03/14/2018 01:57 PM, H.J. Lu wrote:
> On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška  wrote:
>> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C 
> libraries.
> Of course, in this case you need another target hook, that is dependent 
> both
> on the target backend and C library.
>
> It would be nice to make the target hook a little bit more generic as 
> well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative 
> costs.

 Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've 
 added new hook
 definition to gcc/config/gnu-user.h that will point to 
 gnu_libc_function_implementation.
 I would like to implement the function in gcc/targhooks.c, but I don't 
 know how to
 make ifdef according to target?
>>>
>>> Put there just the default implementation (everything is UNKNOWN?).
>>>
 One another issue is that built_in_function is enum defined in tree.h. 
 Thus I'll replace the
 callback argument with int, that will be casted. One last issue: am I 
 right that I'll have to define
 TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to 
 no_c99_libc_has_function)?
>>>
>>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>>
>>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>>
>>>   Jakub
>>>
>>
>> Hi.
>>
>> I'm sending V2 that can survive bootstrap and regression tests on both 
>> x86_64 and ppc64le.
>>
>> Martin
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> index d82e2232d7b..91e1c87f83f 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> @@ -62,7 +62,7 @@ main_test (void)
>mempcpy (p + 5, s3, 1);
>if (memcmp (p, "ABCDEFg", 8))
>  abort ();
> -  mempcpy (p + 6, s1 + 1, l1);
> +  memcpy (p + 6, s1 + 1, l1);
>if (memcmp (p, "ABCDEF2", 8))
>  abort ();
>  }
> 
> This is a mempcpy test.  Why is mempcpy changed to memcpy?
> 

Because this mempcpy is not optimized out to memcpy and it then aborts.
It's proper to leave here mempcpy I believe.

Martin


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread H.J. Lu
On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška  wrote:
> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
 Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
 where in particular linux_libc_has_function deals with various C libraries.
 Of course, in this case you need another target hook, that is dependent 
 both
 on the target backend and C library.

 It would be nice to make the target hook a little bit more generic as well,
 e.g. pass it enum builtin_function and query if it is fast, slow or
 unknown, or even some kind of cost, where the caller could ask for cost of
 BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative 
 costs.
>>>
>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added 
>>> new hook
>>> definition to gcc/config/gnu-user.h that will point to 
>>> gnu_libc_function_implementation.
>>> I would like to implement the function in gcc/targhooks.c, but I don't know 
>>> how to
>>> make ifdef according to target?
>>
>> Put there just the default implementation (everything is UNKNOWN?).
>>
>>> One another issue is that built_in_function is enum defined in tree.h. Thus 
>>> I'll replace the
>>> callback argument with int, that will be casted. One last issue: am I right 
>>> that I'll have to define
>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to 
>>> no_c99_libc_has_function)?
>>
>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>
>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>
>>   Jakub
>>
>
> Hi.
>
> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 
> and ppc64le.
>
> Martin

diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
index d82e2232d7b..91e1c87f83f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
@@ -62,7 +62,7 @@ main_test (void)
   mempcpy (p + 5, s3, 1);
   if (memcmp (p, "ABCDEFg", 8))
 abort ();
-  mempcpy (p + 6, s1 + 1, l1);
+  memcpy (p + 6, s1 + 1, l1);
   if (memcmp (p, "ABCDEF2", 8))
 abort ();
 }

This is a mempcpy test.  Why is mempcpy changed to memcpy?

-- 
H.J.


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-14 Thread Martin Liška
On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>> where in particular linux_libc_has_function deals with various C libraries.
>>> Of course, in this case you need another target hook, that is dependent both
>>> on the target backend and C library.
>>>
>>> It would be nice to make the target hook a little bit more generic as well,
>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>
>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added 
>> new hook
>> definition to gcc/config/gnu-user.h that will point to 
>> gnu_libc_function_implementation.
>> I would like to implement the function in gcc/targhooks.c, but I don't know 
>> how to
>> make ifdef according to target?
> 
> Put there just the default implementation (everything is UNKNOWN?).
> 
>> One another issue is that built_in_function is enum defined in tree.h. Thus 
>> I'll replace the
>> callback argument with int, that will be casted. One last issue: am I right 
>> that I'll have to define
>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to 
>> no_c99_libc_has_function)?
> 
> And define the i386/x86_64 glibc one in config/i386/*.h, check there
> OPTION_GLIBC and only in that case return something other than UNKNOWN.
> 
> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
> 
>   Jakub
> 

Hi.

I'm sending V2 that can survive bootstrap and regression tests on both x86_64 
and ppc64le.

Martin
>From 222c7c205a7afc144dc123d2b378a057dcf8816f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  

	* gcc.c-torture/execute/builtins/mempcpy.c (main_test): Adjust
	to not use mempcpy.
	* gcc.dg/string-opt-1.c: Adjust for i386 target.
---
 gcc/builtins.c   | 13 -
 gcc/config/i386/i386-protos.h|  2 ++
 gcc/config/i386/i386.c   | 20 
 gcc/coretypes.h  |  7 +++
 gcc/doc/tm.texi  |  4 
 gcc/doc/tm.texi.in   |  1 +
 gcc/expr.c   | 16 +++-
 gcc/expr.h   |  4 +++-
 gcc/target.def   |  7 +++
 gcc/targhooks.c  |  6 ++
 gcc/targhooks.h  |  1 +
 .../gcc.c-torture/execute/builtins/mempcpy.c |  2 +-
 gcc/testsuite/gcc.dg/string-opt-1.c  |  4 ++--
 13 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..eb038dd45b3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
+  bool bail_out_libcall = endp == 1
+&& targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
  CALL_EXPR_TAILCALL (exp)
  && (endp == 0 || target == const0_rtx)
  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
  expected_align, expected_size,
- min_size, max_size, probable_max_size);
+ min_size, max_size, probable_max_size,
+ bail_out_libcall, _move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+ we have a target that provides a fast implementation
+ of mempcpy routine.  */
+  if (!is_move_done)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
diff 

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-13 Thread Jakub Jelinek
On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
> > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> > where in particular linux_libc_has_function deals with various C libraries.
> > Of course, in this case you need another target hook, that is dependent both
> > on the target backend and C library.
> > 
> > It would be nice to make the target hook a little bit more generic as well,
> > e.g. pass it enum builtin_function and query if it is fast, slow or
> > unknown, or even some kind of cost, where the caller could ask for cost of
> > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
> 
> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added 
> new hook
> definition to gcc/config/gnu-user.h that will point to 
> gnu_libc_function_implementation.
> I would like to implement the function in gcc/targhooks.c, but I don't know 
> how to
> make ifdef according to target?

Put there just the default implementation (everything is UNKNOWN?).

> One another issue is that built_in_function is enum defined in tree.h. Thus 
> I'll replace the
> callback argument with int, that will be casted. One last issue: am I right 
> that I'll have to define
> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to 
> no_c99_libc_has_function)?

And define the i386/x86_64 glibc one in config/i386/*.h, check there
OPTION_GLIBC and only in that case return something other than UNKNOWN.

And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-13 Thread Martin Liška

On 03/13/2018 09:32 AM, Jakub Jelinek wrote:

On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:

On 03/12/2018 10:39 AM, Marc Glisse wrote:

On Mon, 12 Mar 2018, Martin Liška wrote:


This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  

 PR middle-end/81657
 * builtins.c (expand_builtin_memory_copy_args): Add new
 arguments.
 * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
 New macro.


Shouldn't the macro be defined in a more specific case, for instance glibc on 
x86? Or do all known libc on x86 happen to provide a fast mempcpy?


That's Marc a very good question. Do we already have a glibc-related target 
macros/hooks?
If so, I would add this as one of these.


Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
where in particular linux_libc_has_function deals with various C libraries.
Of course, in this case you need another target hook, that is dependent both
on the target backend and C library.

It would be nice to make the target hook a little bit more generic as well,
e.g. pass it enum builtin_function and query if it is fast, slow or
unknown, or even some kind of cost, where the caller could ask for cost of
BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.


Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new 
hook
definition to gcc/config/gnu-user.h that will point to 
gnu_libc_function_implementation.
I would like to implement the function in gcc/targhooks.c, but I don't know how 
to
make ifdef according to target?

One another issue is that built_in_function is enum defined in tree.h. Thus 
I'll replace the
callback argument with int, that will be casted. One last issue: am I right 
that I'll have to define
TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to 
no_c99_libc_has_function)?

Thanks,
Martin



Jakub





Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-13 Thread Richard Biener
On Tue, Mar 13, 2018 at 9:32 AM, Jakub Jelinek  wrote:
> On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
>> On 03/12/2018 10:39 AM, Marc Glisse wrote:
>> > On Mon, 12 Mar 2018, Martin Liška wrote:
>> >
>> > > This is fix for the PR that introduces a new target macro. Using the 
>> > > macro
>> > > one can say that a target has a fast mempcpy and thus it's preferred to 
>> > > be used
>> > > if possible.
>> > >
>> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression 
>> > > tests.
>> > > I also tested on x86_64-linux-gnu.
>> > >
>> > > Ready to be installed?
>> > > Martin
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > 2018-03-08  Martin Liska  
>> > >
>> > > PR middle-end/81657
>> > > * builtins.c (expand_builtin_memory_copy_args): Add new
>> > > arguments.
>> > > * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>> > > New macro.
>> >
>> > Shouldn't the macro be defined in a more specific case, for instance glibc 
>> > on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
>>
>> That's Marc a very good question. Do we already have a glibc-related target 
>> macros/hooks?
>> If so, I would add this as one of these.
>
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C libraries.
> Of course, in this case you need another target hook, that is dependent both
> on the target backend and C library.
>
> It would be nice to make the target hook a little bit more generic as well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Or maybe a hook returning the alternative to use?  Pass it BUILT_IN_X
and get back
the same or related builtin?

Richard.

> Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-13 Thread Jakub Jelinek
On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
> On 03/12/2018 10:39 AM, Marc Glisse wrote:
> > On Mon, 12 Mar 2018, Martin Liška wrote:
> > 
> > > This is fix for the PR that introduces a new target macro. Using the macro
> > > one can say that a target has a fast mempcpy and thus it's preferred to 
> > > be used
> > > if possible.
> > > 
> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> > > I also tested on x86_64-linux-gnu.
> > > 
> > > Ready to be installed?
> > > Martin
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2018-03-08  Martin Liska  
> > > 
> > > PR middle-end/81657
> > > * builtins.c (expand_builtin_memory_copy_args): Add new
> > > arguments.
> > > * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> > > New macro.
> > 
> > Shouldn't the macro be defined in a more specific case, for instance glibc 
> > on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
> 
> That's Marc a very good question. Do we already have a glibc-related target 
> macros/hooks?
> If so, I would add this as one of these.

Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
where in particular linux_libc_has_function deals with various C libraries.
Of course, in this case you need another target hook, that is dependent both
on the target backend and C library.

It would be nice to make the target hook a little bit more generic as well,
e.g. pass it enum builtin_function and query if it is fast, slow or
unknown, or even some kind of cost, where the caller could ask for cost of
BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-13 Thread Martin Liška

On 03/12/2018 10:39 AM, Marc Glisse wrote:

On Mon, 12 Mar 2018, Martin Liška wrote:


This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  

PR middle-end/81657
* builtins.c (expand_builtin_memory_copy_args): Add new
arguments.
* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
New macro.


Shouldn't the macro be defined in a more specific case, for instance glibc on 
x86? Or do all known libc on x86 happen to provide a fast mempcpy?


That's Marc a very good question. Do we already have a glibc-related target 
macros/hooks?
If so, I would add this as one of these.

Thanks,
Martin




* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
* doc/tm.texi: Likewise.
* doc/tm.texi.in: Likewise.
* expr.c (compare_by_pieces): Add support for bail out.
(emit_block_move_hints): Likewise.
* expr.h (emit_block_move_hints): Add new arguments.




Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-12 Thread Jakub Jelinek
On Mon, Mar 12, 2018 at 09:57:09AM +0100, Martin Liška wrote:
> Hi.
> 
> This is fix for the PR that introduces a new target macro. Using the macro
> one can say that a target has a fast mempcpy and thus it's preferred to be 
> used
> if possible.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.

I think we shouldn't be adding new target macros, better would be a target
hook that returns this information.

Jakub


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-12 Thread Richard Biener
On Mon, Mar 12, 2018 at 9:57 AM, Martin Liška  wrote:
> Hi.
>
> This is fix for the PR that introduces a new target macro. Using the macro

Don't add new target macros, use target hooks.

> one can say that a target has a fast mempcpy and thus it's preferred to be 
> used
> if possible.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.
>
> Ready to be installed?
> Martin
>
> gcc/ChangeLog:
>
> 2018-03-08  Martin Liska  
>
> PR middle-end/81657
> * builtins.c (expand_builtin_memory_copy_args): Add new
> arguments.
> * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> New macro.
> * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
> * doc/tm.texi: Likewise.
> * doc/tm.texi.in: Likewise.
> * expr.c (compare_by_pieces): Add support for bail out.
> (emit_block_move_hints): Likewise.
> * expr.h (emit_block_move_hints): Add new arguments.
>
> gcc/testsuite/ChangeLog:
>
> 2018-03-09  Martin Liska  
>
> PR middle-end/81657
> * gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
> target.
> ---
>  gcc/builtins.c  | 13 -
>  gcc/config/i386/i386.h  |  3 +++
>  gcc/defaults.h  |  7 +++
>  gcc/doc/tm.texi |  5 +
>  gcc/doc/tm.texi.in  |  5 +
>  gcc/expr.c  | 16 +++-
>  gcc/expr.h  |  4 +++-
>  gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
>  8 files changed, 52 insertions(+), 5 deletions(-)
>
>


Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-12 Thread Marc Glisse

On Mon, 12 Mar 2018, Martin Liška wrote:


This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  

PR middle-end/81657
* builtins.c (expand_builtin_memory_copy_args): Add new
arguments.
* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
New macro.


Shouldn't the macro be defined in a more specific case, for instance glibc 
on x86? Or do all known libc on x86 happen to provide a fast mempcpy?



* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
* doc/tm.texi: Likewise.
* doc/tm.texi.in: Likewise.
* expr.c (compare_by_pieces): Add support for bail out.
(emit_block_move_hints): Likewise.
* expr.h (emit_block_move_hints): Add new arguments.


--
Marc Glisse


[PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-03-12 Thread Martin Liška
Hi.

This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  

PR middle-end/81657
* builtins.c (expand_builtin_memory_copy_args): Add new
arguments.
* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
New macro.
* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
* doc/tm.texi: Likewise.
* doc/tm.texi.in: Likewise.
* expr.c (compare_by_pieces): Add support for bail out.
(emit_block_move_hints): Likewise.
* expr.h (emit_block_move_hints): Add new arguments.

gcc/testsuite/ChangeLog:

2018-03-09  Martin Liska  

PR middle-end/81657
* gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
target.
---
 gcc/builtins.c  | 13 -
 gcc/config/i386/i386.h  |  3 +++
 gcc/defaults.h  |  7 +++
 gcc/doc/tm.texi |  5 +
 gcc/doc/tm.texi.in  |  5 +
 gcc/expr.c  | 16 +++-
 gcc/expr.h  |  4 +++-
 gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
 8 files changed, 52 insertions(+), 5 deletions(-)


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..c2ca36934f7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
  CALL_EXPR_TAILCALL (exp)
  && (endp == 0 || target == const0_rtx)
  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
  expected_align, expected_size,
- min_size, max_size, probable_max_size);
+ min_size, max_size, probable_max_size,
+ TARGET_HAS_FAST_MEMPCPY_ROUTINE
+ && endp == 1,
+ _move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+ we have a target that provides a fast implementation
+ of mempcpy routine.  */
+  if (!is_move_done)
+return NULL_RTX;
 
   if (dest_addr == 0)
 {
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e43edd77b56..8744d706fd7 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1914,6 +1914,9 @@ typedef struct ix86_args {
 
 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
 
+/* C library provides fast implementation of mempcpy function.  */
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 1
+
 /* Define if shifts truncate the shift count which implies one can
omit a sign-extension or zero-extension of a shift count.
 
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 78a08a33f12..2e5caac8dcd 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1340,6 +1340,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define SET_RATIO(speed) MOVE_RATIO (speed)
 #endif
 
+/* By default do not generate libcall to mempcpy and rather use
+   libcall to memcpy and adjustment of return value.  */
+
+#ifndef TARGET_HAS_FAST_MEMPCPY_ROUTINE
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 0
+#endif
+
 /* Supply a default definition of STACK_SAVEAREA_MODE for emit_stack_save.
Normally move_insn, so Pmode stack pointer.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0c8a3f3298c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6627,6 +6627,11 @@ optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..e7ef85ab78e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4560,6 +4560,11 @@ optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/expr.c b/gcc/expr.c