Re: [PATCH] rs6000: Testcases for rl*i*

2016-11-25 Thread David Edelsohn
On Fri, Nov 25, 2016 at 8:51 PM, Segher Boessenkool
 wrote:
> These testcases test that we generate the expected code for all of the
> rl*i* instructions, that is, rotate-and-mask and rotate-and-mask-insert
> for immediate rotation counts.  All the testcases do rotate, shift left,
> as well as shift right; if that results in an instruction that does not
> exist the testcases generate a multiplication instead, so that we can
> detect if this is handled properly.
>
> Many 32-bit instructions zero-extend their result properly in 64-bit
> mode, but the rs6000 port does not yet know.  These testcases test the
> status quo, so they will need updating when ever we handle this.
>
> The rlwimi-*.c tests all generate suboptimal code in 64-bit mode (some
> end up as xor-and-xor, some as separate rotate and insert).  A patch to
> fix this will hit gcc-patches soon; these testcases will fail until it
> is fixed.
>
> Is this okay for trunk?
>
>
> Segher
>
>
> 2016-11-26  Segher Boessenkool  
>
> gcc/testsuite/
> * gcc.target/powerpc/rldic-0.c: New testcase.
> * gcc.target/powerpc/rldic-1.c: New testcase.
> * gcc.target/powerpc/rldic-2.c: New testcase.
> * gcc.target/powerpc/rldicl-0.c: New testcase.
> * gcc.target/powerpc/rldicl-1.c: New testcase.
> * gcc.target/powerpc/rldicl-2.c: New testcase.
> * gcc.target/powerpc/rldicr-0.c: New testcase.
> * gcc.target/powerpc/rldicr-1.c: New testcase.
> * gcc.target/powerpc/rldicr-2.c: New testcase.
> * gcc.target/powerpc/rldicx.h: New file.
> * gcc.target/powerpc/rldimi-0.c: New testcase.
> * gcc.target/powerpc/rldimi-1.c: New testcase.
> * gcc.target/powerpc/rldimi-2.c: New testcase.
> * gcc.target/powerpc/rldimi.h: New file.
> * gcc.target/powerpc/rlwimi-0.c: New testcase.
> * gcc.target/powerpc/rlwimi-1.c: New testcase.
> * gcc.target/powerpc/rlwimi-2.c: New testcase.
> * gcc.target/powerpc/rlwimi.h: New file.
> * gcc.target/powerpc/rlwinm-0.c: New testcase.
> * gcc.target/powerpc/rlwinm-1.c: New testcase.
> * gcc.target/powerpc/rlwinm-2.c: New testcase.
> * gcc.target/powerpc/rlwinm.h: New file.

Okay.

I assume that you want to add the testcases after the fix for rlwimi.

Thanks, David


[PATCH] rs6000: Testcases for rl*i*

2016-11-25 Thread Segher Boessenkool
These testcases test that we generate the expected code for all of the
rl*i* instructions, that is, rotate-and-mask and rotate-and-mask-insert
for immediate rotation counts.  All the testcases do rotate, shift left,
as well as shift right; if that results in an instruction that does not
exist the testcases generate a multiplication instead, so that we can
detect if this is handled properly.

Many 32-bit instructions zero-extend their result properly in 64-bit
mode, but the rs6000 port does not yet know.  These testcases test the
status quo, so they will need updating when ever we handle this.

The rlwimi-*.c tests all generate suboptimal code in 64-bit mode (some
end up as xor-and-xor, some as separate rotate and insert).  A patch to
fix this will hit gcc-patches soon; these testcases will fail until it
is fixed.

Is this okay for trunk?


Segher


2016-11-26  Segher Boessenkool  

gcc/testsuite/
* gcc.target/powerpc/rldic-0.c: New testcase.
* gcc.target/powerpc/rldic-1.c: New testcase.
* gcc.target/powerpc/rldic-2.c: New testcase.
* gcc.target/powerpc/rldicl-0.c: New testcase.
* gcc.target/powerpc/rldicl-1.c: New testcase.
* gcc.target/powerpc/rldicl-2.c: New testcase.
* gcc.target/powerpc/rldicr-0.c: New testcase.
* gcc.target/powerpc/rldicr-1.c: New testcase.
* gcc.target/powerpc/rldicr-2.c: New testcase.
* gcc.target/powerpc/rldicx.h: New file.
* gcc.target/powerpc/rldimi-0.c: New testcase.
* gcc.target/powerpc/rldimi-1.c: New testcase.
* gcc.target/powerpc/rldimi-2.c: New testcase.
* gcc.target/powerpc/rldimi.h: New file.
* gcc.target/powerpc/rlwimi-0.c: New testcase.
* gcc.target/powerpc/rlwimi-1.c: New testcase.
* gcc.target/powerpc/rlwimi-2.c: New testcase.
* gcc.target/powerpc/rlwimi.h: New file.
* gcc.target/powerpc/rlwinm-0.c: New testcase.
* gcc.target/powerpc/rlwinm-1.c: New testcase.
* gcc.target/powerpc/rlwinm-2.c: New testcase.
* gcc.target/powerpc/rlwinm.h: New file.

---
 gcc/testsuite/gcc.target/powerpc/rldic-0.c  |  16 
 gcc/testsuite/gcc.target/powerpc/rldic-1.c  |  17 
 gcc/testsuite/gcc.target/powerpc/rldic-2.c  |  16 
 gcc/testsuite/gcc.target/powerpc/rldicl-0.c |  17 
 gcc/testsuite/gcc.target/powerpc/rldicl-1.c |  16 
 gcc/testsuite/gcc.target/powerpc/rldicl-2.c |  17 
 gcc/testsuite/gcc.target/powerpc/rldicr-0.c |  15 
 gcc/testsuite/gcc.target/powerpc/rldicr-1.c |  16 
 gcc/testsuite/gcc.target/powerpc/rldicr-2.c |  15 
 gcc/testsuite/gcc.target/powerpc/rldicx.h   | 117 
 gcc/testsuite/gcc.target/powerpc/rldimi-0.c |  15 
 gcc/testsuite/gcc.target/powerpc/rldimi-1.c |  15 
 gcc/testsuite/gcc.target/powerpc/rldimi-2.c |  15 
 gcc/testsuite/gcc.target/powerpc/rldimi.h   | 106 +
 gcc/testsuite/gcc.target/powerpc/rlwimi-0.c |  20 +
 gcc/testsuite/gcc.target/powerpc/rlwimi-1.c |  20 +
 gcc/testsuite/gcc.target/powerpc/rlwimi-2.c |  18 +
 gcc/testsuite/gcc.target/powerpc/rlwimi.h   |  91 ++
 gcc/testsuite/gcc.target/powerpc/rlwinm-0.c |  19 +
 gcc/testsuite/gcc.target/powerpc/rlwinm-1.c |  19 +
 gcc/testsuite/gcc.target/powerpc/rlwinm-2.c |  19 +
 gcc/testsuite/gcc.target/powerpc/rlwinm.h   |  92 ++
 22 files changed, 711 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicx.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm.h

diff --git a/gcc/testsuite/gcc.target/powerpc/rldic-0.c 
b/gcc/testsuite/gcc.target/powerpc/rldic-0.c
new file mode 

Re: [PATCH 3/9] Add option -moutline-msabi-xlogues

2016-11-25 Thread Daniel Santos

Thank you very much for your review!

On 11/25/2016 05:51 PM, Sandra Loosemore wrote:

On 11/22/2016 10:19 PM, Daniel Santos wrote:


diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..f556978 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -528,6 +528,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI)
  EnumValue
  Enum(calling_abi) String(ms) Value(MS_ABI)

+moutline-msabi-xlogues
+Target Report Mask(OUTLINE_MSABI_XLOGUES) Save
+Reduces function size by using out-of-line stubs to save & restore 
registers

+clobberd by differences in Microsoft and System V ABIs.
+


Just as a suggestion (I'm not an i386 maintainer), I'd recommend 
spelling the name of this option -mno-inline-msabi-xlogues instead of 
-moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.


That is certainly more intuitive ("outline" or "out-of-line" vs 
"inline"). I guess I'm not sure what concept is more important, that 
it's not inline or that it's using stubs. Probably the former.





diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8e2f466..4706085 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25004,6 +25004,15 @@ You can control this behavior for specific 
functions by

  using the function attributes @code{ms_abi} and @code{sysv_abi}.
  @xref{Function Attributes}.

+@item -moutline-msabi-xlogues
+@itemx -mno-outline-msabi-xlogues


By convention, we only list the option form that is not the default


OK. This is my first time to work on gcc, so still learning all of the 
conventions.





+@opindex moutline-msabi-xlogues


...but we should have index entries for both.


So basically I got that backwards. :)



+Due to differences in 64-bit ABIs, any Microsoft ABI function that 
calls a
+SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, 
emitting
+fairly lengthy prologues & epilogues.  This option generates 
prologues &
+epilogues that instead call stubs in libgcc to perform these saves & 
restores,

+thus reducing function size at the cost of a few extra instructions.


Please use the word "and" in all three locations here, instead of "&".

-Sandra


Now that I think of it, would "System V" be better than just "SysV"?

This is very helpful, thanks again. I'm working on a meaningful 
testsuite, so I suppose I should just submit a v3 with these changes 
when I'm done with the tests and hopefully I'll have a little more 
feedback on the back end & assembly as well.


Daniel


Re: [PATCH 3/9] Add option -moutline-msabi-xlogues

2016-11-25 Thread Sandra Loosemore

On 11/22/2016 10:19 PM, Daniel Santos wrote:


diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..f556978 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -528,6 +528,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI)
  EnumValue
  Enum(calling_abi) String(ms) Value(MS_ABI)

+moutline-msabi-xlogues
+Target Report Mask(OUTLINE_MSABI_XLOGUES) Save
+Reduces function size by using out-of-line stubs to save & restore registers
+clobberd by differences in Microsoft and System V ABIs.
+


Just as a suggestion (I'm not an i386 maintainer), I'd recommend 
spelling the name of this option -mno-inline-msabi-xlogues instead of 
-moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.



diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8e2f466..4706085 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25004,6 +25004,15 @@ You can control this behavior for specific functions by
  using the function attributes @code{ms_abi} and @code{sysv_abi}.
  @xref{Function Attributes}.

+@item -moutline-msabi-xlogues
+@itemx -mno-outline-msabi-xlogues


By convention, we only list the option form that is not the default


+@opindex moutline-msabi-xlogues


...but we should have index entries for both.


+Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a
+SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, emitting
+fairly lengthy prologues & epilogues.  This option generates prologues &
+epilogues that instead call stubs in libgcc to perform these saves & restores,
+thus reducing function size at the cost of a few extra instructions.


Please use the word "and" in all three locations here, instead of "&".

-Sandra



Re: [PATCH] avoid calling alloca(0)

2016-11-25 Thread Jeff Law

On 11/23/2016 06:15 PM, Martin Sebor wrote:


gcc_assert works only in some instances (e.g., in c-ada-spec.c:191)
but not in others because some actually do make the alloca(0) call
at runtime: at a minimum, lto.c:3285, reg-stack.c:2008, and
tree-ssa-threadedge.c:344 assert during bootstrap.
You might have the wrong line number of reg-stack.c and lto.  You've 
pointed to the start of subst_asm_stack_regs and lto_main respectively. 
It'd probably be better if you posted the line with a bit of context.


I can trivially fix the threadedge issue.

Jeff


Re: [PATCH] avoid calling alloca(0)

2016-11-25 Thread Jeff Law

On 11/24/2016 12:42 AM, Jakub Jelinek wrote:

After reviewing a few more of the XALLOCAVEC calls in the affected
files I suspect that those to alloca(0) pointed out by the warning
may be just a subset that GCC happens to see thanks to constant
propagation.  If that's so then changing this subset to
alloca(N + !N) or some such is probably not the right approach
because it will miss all the other calls where GCC doesn't see that
N is zero.  I think the most robust solution is to do as Bernd
suggests: change XALLOCAVEC as shown above.  That will let us
prevent any and all unsafe assumptions about the result of
alloca(0) being either non-null or distinct.  The other approach
would be to change XALLOCAVEC to add 1 to the byte count.  That
would be in line with what XMALLOC does.


I still fail to see why you want to change anything at least for
hosts where you know XALLOCAVEC is __builtin_alloca.
Show me one place which assumes the result of alloca (0) in gcc sources is
distinct, or non-NULL.  For 0 elements the pointer simply isn't used.
And whether for the malloc based alloca it GCs or not really doesn't matter
for us.
I think for host/build, we ought to assume that alloca is 
__builtin_alloca.  I think we stopped supporting the 
alloca-on-top-of-malloc host/build systems long ago.


But I still think we ought to be "clean" in regard to zero sized 
allocations.  It sounds like an assert may not be sufficient, so we need 
to look at another approach.


Jeff




Re: Pointer Bounds Checker and trailing arrays (PR68270)

2016-11-25 Thread Ilya Enkovich
2016-11-25 15:47 GMT+03:00 Alexander Ivchenko :
> Hi,
>
> The patch below addresses PR68270. could you please take a look?
>
> 2016-11-25  Alexander Ivchenko  
>
>* c-family/c.opt (flag_chkp_flexible_struct_trailing_arrays):
>Add new option.
>* tree-chkp.c (chkp_parse_array_and_component_ref): Forbid
>narrowing when chkp_parse_array_and_component_ref is used and
>the ARRAY_REF points to an array in the end of the struct.
>
>
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 7d8a726..e45d6a2 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1166,6 +1166,11 @@ C ObjC C++ ObjC++ LTO RejectNegative Report
> Var(flag_chkp_narrow_to_innermost_ar
>  Forces Pointer Bounds Checker to use bounds of the innermost arrays in case 
> of
>  nested static arryas access.  By default outermost array is used.
>
> +fchkp-flexible-struct-trailing-arrays
> +C ObjC C++ ObjC++ LTO RejectNegative Report
> Var(flag_chkp_flexible_struct_trailing_arrays)
> +Allow Pointer Bounds Checker to treat all trailing arrays in structures as
> +possibly flexible.

Words 'allow' and 'possibly' are confusing here. This option is about to force
checker to do something, not to give him a choice.

New option has to be documented in invoke.texi. It would also be nice to reflect
changes on GCC MPX wiki page.

We have an attribute to change compiler behavior when this option is not set.
But we have no way to make exceptions when this option is used. Should we
add one?

> +
>  fchkp-optimize
>  C ObjC C++ ObjC++ LTO Report Var(flag_chkp_optimize) Init(-1)
>  Allow Pointer Bounds Checker optimizations.  By default allowed
> diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
> index 2769682..40f99c3 100644
> --- a/gcc/tree-chkp.c
> +++ b/gcc/tree-chkp.c
> @@ -3425,7 +3425,9 @@ chkp_parse_array_and_component_ref (tree node, tree 
> *ptr,
>if (flag_chkp_narrow_bounds
>&& !flag_chkp_narrow_to_innermost_arrray
>&& (!last_comp
> -  || chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1
> +  || (chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1))
> +  && !(flag_chkp_flexible_struct_trailing_arrays
> +   && array_at_struct_end_p (var)

This is incorrect place for fix. Consider code

struct S {
  int a;
  int b[10];
};

struct S s;
int *p = s.b;

Here you need to compute bounds for p and you want your option to take effect
but in this case you won't event reach your new check because there is no
ARRAY_REF. And even if we change it to

int *p = s.b[5];

then it still would be narrowed because s.b would still be written
into 'comp_to_narrow'
variable. Correct place for fix is in chkp_may_narrow_to_field.

Also you should consider fchkp-narrow-to-innermost-arrray option. Should it
be more powerfull or not? I think fchkp-narrow-to-innermost-arrray shouldn't
narrow to variable sized fields. BTW looks like right now bnd_variable_size
attribute is ignored by fchkp-narrow-to-innermost-arrray. This is another
problem and may be fixed in another patch though.

Also patch lacks tests for various situations (with option and without, with
ARRAY_REF and without). In case of new attribute and fix for
fchkp-narrow-to-innermost-arrray behavior additional tests are required.

--
Ilya

>  {
>comp_to_narrow = last_comp;
>break;


[PATCH, fortran, committed] Fix documentation typo

2016-11-25 Thread Janne Blomqvist
Hi,

committed the following as obvious:

Index: ChangeLog
===
--- ChangeLog   (revision 242883)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2016-11-25  Janne Blomqvist  
+
+   * intrinsic.texi: Fix ptrdiff_t typo in ISO_C_BINDING constants
+   table.
+
 2016-11-25  Janus Weil  

PR fortran/60853
Index: intrinsic.texi
===
--- intrinsic.texi  (revision 242883)
+++ intrinsic.texi  (working copy)
@@ -14697,7 +14697,7 @@ Furthermore, if @code{__float128} is sup
 @item @code{INTEGER}@tab @code{C_INT_FAST128_T} @tab
@code{int_fast128_t} @tab Ext.
 @item @code{INTEGER}@tab @code{C_INTMAX_T}  @tab @code{intmax_t}
 @item @code{INTEGER}@tab @code{C_INTPTR_T}  @tab @code{intptr_t}
-@item @code{INTEGER}@tab @code{C_PTRDIFF_T} @tab @code{intptr_t}
@tab TS 29113
+@item @code{INTEGER}@tab @code{C_PTRDIFF_T} @tab @code{ptrdiff_t}
@tab TS 29113
 @item @code{REAL}   @tab @code{C_FLOAT} @tab @code{float}
 @item @code{REAL}   @tab @code{C_DOUBLE}@tab @code{double}
 @item @code{REAL}   @tab @code{C_LONG_DOUBLE}   @tab @code{long double}


-- 
Janne Blomqvist


Hurd port for gcc go PATCH 19-23(23)

2016-11-25 Thread Svante Signell
* src_libgo_runtime_netpoll.goc.diff: Fix restricted word bug.
  Rename variable errno to errno1.
* src_libgo_go_os_os_test.go.diff: Allow EFBIG return code to big
seeks.
* src_libgo_go_syscall_syscall_gnu_test.go: New file:
  Define Type and Whence as 32bit in syscall.Flock_t
* src_libgo_testsuite_gotest.diff: Remove comm option for ps -o.
* add-gnu-to-libgo-headers.diff:
  Add gnu to +build entry in file headers included in the build.
  FIXME:  Index: gcc-6-6.2.1-4.1/src/libgo/go/archive/tar/stat_atim.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/archive/tar/stat_atim.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/archive/tar/stat_atim.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build linux dragonfly openbsd solaris
+// +build linux dragonfly openbsd solaris gnu
 
 package tar
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_resnew.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/cgo_resnew.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_resnew.go
@@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.
 
 // +build cgo,!netgo
-// +build darwin linux,!android netbsd solaris
+// +build darwin linux,!android netbsd solaris gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_socknew.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/cgo_socknew.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_socknew.go
@@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.
 
 // +build cgo,!netgo
-// +build android linux solaris
+// +build android linux solaris gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/hook_cloexec.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/hook_cloexec.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/hook_cloexec.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build freebsd linux
+// +build freebsd linux gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/main_unix_test.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/main_unix_test.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/main_unix_test.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux nacl netbsd openbsd solaris
+// +build darwin dragonfly freebsd linux nacl netbsd openbsd solaris gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sock_cloexec.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/sock_cloexec.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/sock_cloexec.go
@@ -5,7 +5,7 @@
 // This file implements sysSocket and accept for platforms that
 // provide a fast path for setting SetNonblock and CloseOnExec.
 
-// +build freebsd linux
+// +build freebsd linux gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sockoptip_posix.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/sockoptip_posix.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/sockoptip_posix.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd windows
+// +build darwin dragonfly freebsd linux netbsd openbsd windows gnu
 
 package net
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/syscall/exec_unix.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/syscall/exec_unix.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/syscall/exec_unix.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build darwin dragonfly freebsd linux netbsd openbsd solaris gnu
 
 // Fork, exec, wait, etc.
 
Index: gcc-6-6.2.1-4.1/src/libgo/runtime/netpoll_select.c
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/runtime/netpoll_select.c
+++ gcc-6-6.2.1-4.1/src/libgo/runtime/netpoll_select.c
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build solaris
+// +build solaris gnu
 
 #include "config.h"
 
Index: gcc-6-6.2.1-4.1/src/libgo/go/os/exec/lp_unix.go
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/os/exec/lp_unix.go
+++ gcc-6-6.2.1-4.1/src/libgo/go/os/exec/lp_unix.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be 

Hurd port for gcc go PATCH 11-18(23)

2016-11-25 Thread Svante Signell
* src_libgo_go_net_sendfile_gnu.go.diff: New file
* src_libgo_go_net_sock_gnu.go.diff: New file
* src_libgo_go_net_sockopt_gnu.go.diff: New file
* src_libgo_go_net_sockoptip_gnu.go.diff: New file
* src_libgo_go_syscall_libcall_gnu_386.go.diff: New file
* src_libgo_go_syscall_libcall_gnu.go.diff: New file
* src_libgo_go_syscall_libcall_posix-1.go.diff: New file derived from
libcall_posix.go
  Removed the mount call for GNU/Hurd, it exists but use translators.
  Removed the mlockall/munlockall calls for GNU/Hurd, not yet
implemented.
  Removed the madvise call for GNU/Hurd, not yet implemented.
* src_libgo_go_syscall_socket_gnu.go.diff: New fileIndex: gcc-6-6.2.1-4.1/src/libgo/go/net/sendfile_gnu.go
===
--- /dev/null
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/sendfile_gnu.go
@@ -0,0 +1,79 @@
+// Copyright 2011 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package net
+
+import (
+	"io"
+	"os"
+	"syscall"
+)
+
+// maxSendfileSize is the largest chunk size we ask the kernel to copy
+// at a time.
+const maxSendfileSize int = 4 << 20
+
+// sendFile copies the contents of r to c using the sendfile
+// system call to minimize copies.
+//
+// if handled == true, sendFile returns the number of bytes copied and any
+// non-EOF error.
+//
+// if handled == false, sendFile performed no work.
+func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) {
+	var remain int64 = 1 << 62 // by default, copy until EOF
+
+	lr, ok := r.(*io.LimitedReader)
+	if ok {
+		remain, r = lr.N, lr.R
+		if remain <= 0 {
+			return 0, nil, true
+		}
+	}
+	f, ok := r.(*os.File)
+	if !ok {
+		return 0, nil, false
+	}
+
+	if err := c.writeLock(); err != nil {
+		return 0, err, true
+	}
+	defer c.writeUnlock()
+
+	dst := c.sysfd
+	src := int(f.Fd())
+	for remain > 0 {
+		n := maxSendfileSize
+		if int64(n) > remain {
+			n = int(remain)
+		}
+		n, err1 := syscall.Sendfile(dst, src, nil, n)
+		if n > 0 {
+			written += int64(n)
+			remain -= int64(n)
+		}
+		if n == 0 && err1 == nil {
+			break
+		}
+		if err1 == syscall.EAGAIN {
+			if err1 = c.pd.WaitWrite(); err1 == nil {
+continue
+			}
+		}
+		if err1 != nil {
+			// This includes syscall.ENOSYS (no kernel
+			// support) and syscall.EINVAL (fd types which
+			// don't implement sendfile)
+			err = err1
+			break
+		}
+	}
+	if lr != nil {
+		lr.N = remain
+	}
+	if err != nil {
+		err = os.NewSyscallError("sendfile", err)
+	}
+	return written, err, written > 0
+}
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sock_gnu.go
===
--- /dev/null
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/sock_gnu.go
@@ -0,0 +1,14 @@
+// Copyright 2014 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build gnu
+
+package net
+
+import "syscall"
+
+func maxListenerBacklog() int {
+   // From /usr/include/i386-gnu/bits/socket.h
+   return syscall.SOMAXCONN
+}
Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sockopt_gnu.go
===
--- /dev/null
+++ gcc-6-6.2.1-4.1/src/libgo/go/net/sockopt_gnu.go
@@ -0,0 +1,45 @@
+// Copyright 2011 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build gnu
+
+package net
+
+import (
+	"os"
+	"syscall"
+)
+
+func setDefaultSockopts(s, family, sotype int, ipv6only bool) error {
+	if family == syscall.AF_INET6 && sotype != syscall.SOCK_RAW {
+		// Allow both IP versions even if the OS default
+		// is otherwise.  Note that some operating systems
+		// never admit this option.
+		syscall.SetsockoptInt(s, syscall.IPPROTO_IPV6, syscall.IPV6_V6ONLY, boolint(ipv6only))
+	}
+	// Allow broadcast.
+	return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_BROADCAST, 1))
+}
+
+func setDefaultListenerSockopts(s int) error {
+	// Allow reuse of recently-used addresses.
+	return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1))
+}
+
+func setDefaultMulticastSockopts(s int) error {
+	// Allow multicast UDP and raw IP datagram sockets to listen
+	// concurrently across multiple listeners.
+	if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1); err != nil {
+		return os.NewSyscallError("setsockopt", err)
+	}
+	// Allow reuse of recently-used ports.
+	// This option is supported only in descendants of 4.4BSD,
+	// to make an effective multicast application that requires
+	// quick draw possible.
+	// Not supported on GNU/Hurd
+	//if syscall.SO_REUSEPORT != 0 {
+	//	return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEPORT, 1))
+	//}
+	return nil
+}

Hurd port for gcc go PATCH 1-4(23)

2016-11-25 Thread Svante Signell
Hi,

Attached are patches to enable gccgo to build properly on Debian
GNU/Hurd on gcc-6 (6-6.2.1-5).

The first three patches are Debian-specific:

* debian_rules.defs.diff: Enables build of gccgo for GNU/Hurd

Define patches for the generated series file:
* debian_rules.patch.diff:  Enables split-stack support.
* debian_rules.patch.diff:  Does not enable split-stack support.

* src_gcc_config_i386_gnu.h.diff: Enable split-stack support

The test suite results are as follows:
Without split-stack support:
              === go Summary ===
# of expected passes7359
# of unexpected failures7
# of expected failures  1
# of untested testcases 10
# of unsupported tests  2
                === libgo Summary ===
# of expected passes121
# of unexpected failures13

With split-stack support:

                === go Summary ===
# of expected passes7366
# of unexpected failures8
# of expected failures  1
# of untested testcases 6
# of unsupported tests  2
                === libgo Summary ===
# of expected passes120
# of unexpected failures14

All failing go tests and more than half of the libgo tests are runtime
errors due to exception handling not working as expected:
Aborted
runtime_sighandler ...
fatal error: unexpected signal during runtime execution
panic: runtime error: invalid memory address or nil pointer dereference

The above problems are probably due to some remaining issues in
gnumach/hurd/glibc to be solved. According to Samuel Thibault these
problems can be handled late when the patches are accepted upstream or
in Debian gcc. Another more annoying gnumch/hurd/glibc bug is that the
built program go (go-6 in Debian) gets killed when executed from the
shell vi path, but not when issued directly: /usr/bin/go-6 works fine.
 go-6
Segmentation fault (core dumped)
gdb /usr/bin/go-6 -c ./core
warning: Unexpected size of section `.reg2/16883' in core file.
Core was generated by `go-6'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg2/16883' in core file.
#0  0x01e854ae in ?? ()
(gdb) bt
#0  0x01e854ae in ?? ()
#1  0x in ?? ()

Nevertheless, it seems like not so much is left for gccgo working
properly on GNU/Hurd.

Thanks!

--- a/debian/rules.defs	2014-01-07 11:10:44.0 +0100
+++ b/debian/rules.defs	2014-01-07 11:23:47.0 +0100
@@ -933,7 +933,7 @@
 ifeq (,$(filter $(distrelease),lenny etch squeeze dapper hardy jaunty karmic lucid maverick natty oneiric))
   go_no_cpus := $(filter-out arm, $(go_no_cpus))
 endif
-go_no_systems := gnu kfreebsd-gnu
+go_no_systems := kfreebsd
 
 ifneq ($(with_base_only),yes)
   ifneq ($(separate_lang),yes)
@@ -943,7 +943,7 @@
 ifneq (,$(filter $(DEB_TARGET_ARCH_CPU),$(go_no_cpus)))
   with_go := disabled for cpu $(DEB_TARGET_ARCH_CPU)
 endif
-ifneq (,$(findstring $(DEB_TARGET_GNU_SYSTEM),$(go_no_systems)))
+ifneq (,$(findstring $(DEB_TARGET_ARCH_OS),$(go_no_systems)))
   with_go := disabled for system $(DEB_TARGET_GNU_SYSTEM)
 endif
 ifeq ($(go_no_cross)-$(DEB_CROSS),yes-yes)
--- a/debian/rules.patch.orig	2016-11-12 16:50:46.0 +0100
+++ b/debian/rules.patch	2016-11-12 16:55:24.0 +0100
@@ -276,6 +276,27 @@
 
 ifeq ($(DEB_TARGET_ARCH_OS),hurd)
   debian_patches += hurd-changes
+  debian_patches += \
+ src_gcc_config_i386_gnu.h \
+ src_libgo_configure.ac \
+ src_libgo_go_net_sendfile_gnu.go \
+ src_libgo_go_net_sock_gnu.go \
+ src_libgo_go_net_sockopt_gnu.go \
+ src_libgo_go_net_sockoptip_gnu.go \
+ src_libgo_go_syscall_libcall_gnu_386.go \
+ src_libgo_go_syscall_libcall_gnu.go \
+ src_libgo_go_syscall_libcall_posix-1.go \
+ src_libgo_go_syscall_socket_gnu.go \
+ src_libgo_go_syscall_wait.c \
+ src_libgo_Makefile.am \
+ src_libgo_Makefile.in \
+ src_libgo_mksysinfo.sh \
+ src_libgo_runtime_getncpu-gnu.c \
+ src_libgo_runtime_netpoll.goc \
+		  src_libgo_go_os_os_test.go \
+ src_libgo_go_syscall_syscall_gnu_test.go \
+ src_libgo_testsuite_gotest \
+		  add-gnu-to-libgo-headers
 endif
 
 debian_patches += gcc-ice-dump
--- a/debian/rules.patch.orig	2016-11-12 16:50:46.0 +0100
+++ b/debian/rules.patch	2016-11-12 16:55:24.0 +0100
@@ -276,6 +276,26 @@
 
 ifeq ($(DEB_TARGET_ARCH_OS),hurd)
   debian_patches += hurd-changes
+  debian_patches += \
+ src_libgo_configure.ac \
+ src_libgo_go_net_sendfile_gnu.go \
+ src_libgo_go_net_sock_gnu.go \
+ src_libgo_go_net_sockopt_gnu.go \
+ src_libgo_go_net_sockoptip_gnu.go \
+ src_libgo_go_syscall_libcall_gnu_386.go \
+ 

Hurd port for gcc go PATCH 5-10(23)

2016-11-25 Thread Svante Signell
* src_libgo_configure.ac.diff: Define GOOS=gnu and LIBGO_IS_GNU
* src_libgo_Makefile.am.diff: Add support for GNU/Hurd.
* src_libgo_Makefile.in.diff: Update accordingly with autreconf2.64
* src_libgo_mksysinfo.sh.diff: Define SYS_IOCTL to 0 if not defined,
 Fix: #define EWOULDBLOCK EAGAIN in 
 Fix: #define st_dev st_fsid in Index: gcc-6-6.2.1-4.1/src/libgo/configure.ac
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/configure.ac
+++ gcc-6-6.2.1-4.1/src/libgo/configure.ac
@@ -151,6 +151,7 @@ is_openbsd=no
 is_dragonfly=no
 is_rtems=no
 is_solaris=no
+is_gnu=no
 GOOS=unknown
 case ${host} in
   *-*-darwin*)   is_darwin=yes;  GOOS=darwin ;;
@@ -162,6 +163,7 @@ case ${host} in
   *-*-dragonfly*) is_dragonfly=yes; GOOS=dragonfly ;;
   *-*-rtems*)is_rtems=yes;   GOOS=rtems ;;
   *-*-solaris2*) is_solaris=yes; GOOS=solaris ;;
+  *-*-gnu*)  is_gnu=yes; GOOS=gnu ;;
 esac
 AM_CONDITIONAL(LIBGO_IS_DARWIN, test $is_darwin = yes)
 AM_CONDITIONAL(LIBGO_IS_FREEBSD, test $is_freebsd = yes)
@@ -172,6 +174,7 @@ AM_CONDITIONAL(LIBGO_IS_OPENBSD, test $i
 AM_CONDITIONAL(LIBGO_IS_DRAGONFLY, test $is_dragonfly = yes)
 AM_CONDITIONAL(LIBGO_IS_RTEMS, test $is_rtems = yes)
 AM_CONDITIONAL(LIBGO_IS_SOLARIS, test $is_solaris = yes)
+AM_CONDITIONAL(LIBGO_IS_GNU, test $is_gnu = yes)
 AC_SUBST(GOOS)
 
 dnl Test whether we need to use DejaGNU or whether we can use the
Index: gcc-6-6.2.1-4.1/src/libgo/go/syscall/wait.c
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/go/syscall/wait.c
+++ gcc-6-6.2.1-4.1/src/libgo/go/syscall/wait.c
@@ -8,6 +8,9 @@
OS-independent.  */
 
 #include 
+#ifndef WCONTINUED
+#define WCONTINUED 0
+#endif
 #include 
 
 #include "runtime.h"
Index: gcc-6-6.2.1-4.1/src/libgo/Makefile.am
===
--- gcc-6-6.2.1-4.1.orig/src/libgo/Makefile.am
+++ gcc-6-6.2.1-4.1/src/libgo/Makefile.am
@@ -419,6 +419,9 @@ else
 if LIBGO_IS_NETBSD
 runtime_getncpu_file = runtime/getncpu-bsd.c
 else
+if LIBGO_IS_GNU
+runtime_getncpu_file = runtime/getncpu-gnu.c
+else
 runtime_getncpu_file = runtime/getncpu-none.c
 endif
 endif
@@ -426,6 +429,7 @@ endif
 endif
 endif
 endif
+endif
 
 if LIBGO_IS_LINUX
 runtime_netpoll_files = runtime/netpoll_epoll.c
@@ -433,9 +437,13 @@ else
 if LIBGO_IS_SOLARIS
 runtime_netpoll_files = runtime/netpoll_select.c
 else
+if LIBGO_IS_GNU
+runtime_netpoll_files = runtime/netpoll_select.c
+else
 runtime_netpoll_files = runtime/netpoll_kqueue.c
 endif
 endif
+endif
 
 runtime_files = \
 	runtime/go-append.c \
@@ -744,6 +752,14 @@ go_net_sockoptip_file = go/net/sockoptip
 go_net_cgo_sock_file = go/net/cgo_sockold.go
 go_net_cgo_res_file = go/net/cgo_resnew.go
 else
+if LIBGO_IS_GNU
+go_net_cgo_file = go/net/cgo_linux.go
+go_net_sock_file = go/net/sock_gnu.go
+go_net_sockopt_file = go/net/sockopt_gnu.go
+go_net_sockoptip_file = go/net/sockoptip_gnu.go go/net/sockoptip_posix.go
+go_net_cgo_sock_file = go/net/cgo_socknew.go
+go_net_cgo_res_file = go/net/cgo_resnew.go
+else
 go_net_cgo_file = go/net/cgo_bsd.go
 go_net_sock_file = go/net/sock_bsd.go
 go_net_sockopt_file = go/net/sockopt_bsd.go
@@ -755,6 +771,7 @@ endif
 endif
 endif
 endif
+endif
 
 if LIBGO_IS_LINUX
 go_net_sendfile_file = go/net/sendfile_linux.go
@@ -768,11 +785,15 @@ else
 if LIBGO_IS_SOLARIS
 go_net_sendfile_file = go/net/sendfile_solaris.go
 else
+if LIBGO_IS_GNU
+go_net_sendfile_file = go/net/sendfile_gnu.go
+else
 go_net_sendfile_file = go/net/sendfile_stub.go
 endif
 endif
 endif
 endif
+endif
 
 if LIBGO_IS_LINUX
 go_net_interface_file = go/net/interface_linux.go
@@ -794,9 +815,13 @@ else
 if LIBGO_IS_FREEBSD
 go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go
 else
+if LIBGO_IS_GNU
+go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go
+else
 go_net_cloexec_file = go/net/sys_cloexec.go
 endif
 endif
+endif
 
 if LIBGO_IS_OPENBSD
 go_net_tcpsockopt_file = go/net/tcpsockopt_openbsd.go
@@ -889,9 +914,13 @@ else
 if LIBGO_IS_LINUX
 go_os_dir_file = go/os/dir_largefile.go
 else
+if LIBGO_IS_GNU
+go_os_dir_file = go/os/dir_largefile.go
+else
 go_os_dir_file = go/os/dir_regfile.go
 endif
 endif
+endif
 
 if LIBGO_IS_DARWIN
 go_os_getwd_file = go/os/getwd_darwin.go
@@ -911,11 +940,15 @@ else
 if LIBGO_IS_RTEMS
 go_os_sys_file = go/os/sys_uname.go
 else
+if LIBGO_IS_GNU
+go_os_sys_file = go/os/sys_uname.go
+else
 go_os_sys_file = go/os/sys_bsd.go
 endif
 endif
 endif
 endif
+endif
 
 if LIBGO_IS_FREEBSD
 go_os_cloexec_file = go/os/sys_freebsd.go
@@ -937,6 +970,9 @@ else
 if LIBGO_IS_LINUX
 go_os_stat_file = go/os/stat_atim.go
 else
+if LIBGO_IS_GNU
+go_os_stat_file = go/os/stat_atim.go
+else
 if LIBGO_IS_OPENBSD
 go_os_stat_file = go/os/stat_atim.go
 else
@@ -960,12 +996,17 @@ endif
 endif
 endif
 endif
+endif
 
 if LIBGO_IS_LINUX
 go_os_pipe_file = go/os/pipe_linux.go
 else
+if LIBGO_IS_GNU
+go_os_pipe_file = go/os/pipe_linux.go
+else
 

Re: [PATCH] avoid calling alloca(0)

2016-11-25 Thread Jeff Law

On 11/24/2016 12:47 AM, Jakub Jelinek wrote:

On Wed, Nov 23, 2016 at 02:27:05PM -0700, Jeff Law wrote:

I believe we should be warning on trying to allocation 0 bytes of memory via
malloc, realloc or alloca, with the exception of a non-builtin alloca with
no return value, but I think we've covered that elsewhere and Martin's code
will DTRT more by accident than design.


But we aren't going to warn on all places which might call alloca at runtime
with 0 arguments, you would need runtime instrumentation for that (like
ubsan).  You are going to warn about explicit cases and random subset of the
other ones where the user is just unlucky enough that the compiler threaded
some jumps etc.  For the explicit ptr = alloca (0) it is easy to get rid of
it, for the implicit one one has to pessimize code if they want to avoid the
warning.

No, it's a compile-time warning when the 0 size argument can be exposed.

It's not unlike many other warnings in GCC.

jeff


[PATCH] Optimiza aggregate a = b = c = {} (PR c/78408)

2016-11-25 Thread Jakub Jelinek
Hi!

This patch optimizes a = {}; b = a; into a = {}; b = {};
and similarly for memset instead of the first stmt and/or memcpy
instead of the second one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-25  Jakub Jelinek  

PR c/78408
* tree-ssa-ccp.c: Include tree-dfa.h.
(optimize_memcpy): New function.
(pass_fold_builtins::execute): Use it.  Remove useless conditional
break after BUILT_IN_VA_*.

* gcc.dg/pr78408.c: New test.

--- gcc/tree-ssa-ccp.c.jj   2016-11-18 20:04:27.0 +0100
+++ gcc/tree-ssa-ccp.c  2016-11-25 17:54:26.862166658 +0100
@@ -143,6 +143,7 @@ along with GCC; see the file COPYING3.
 #include "stor-layout.h"
 #include "optabs-query.h"
 #include "tree-ssa-ccp.h"
+#include "tree-dfa.h"
 
 /* Possible lattice values.  */
 typedef enum
@@ -2928,6 +2929,113 @@ optimize_atomic_bit_test_and (gimple_stm
   release_ssa_name (lhs);
 }
 
+/* Optimize
+   a = {};
+   b = a;
+   into
+   a = {};
+   b = {};
+   Similarly for memset (, ..., sizeof (a)); instead of a = {};
+   and/or memcpy (, , sizeof (a)); instead of b = a;  */
+
+static void
+optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, tree src)
+{
+  gimple *stmt = gsi_stmt (*gsip);
+  if (gimple_has_volatile_ops (stmt)
+  || TREE_THIS_VOLATILE (dest)
+  || TREE_THIS_VOLATILE (src))
+return;
+
+  tree vuse = gimple_vuse (stmt);
+  if (vuse == NULL)
+return;
+
+  gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
+  tree src2 = NULL_TREE;
+  tree val = integer_zero_node;
+  if (gimple_store_p (defstmt)
+  && gimple_assign_single_p (defstmt)
+  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (defstmt)) == 0
+  && !gimple_clobber_p (defstmt))
+src2 = gimple_assign_lhs (defstmt);
+  else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET)
+  && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR
+  && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST
+  && TREE_CODE (gimple_call_arg (defstmt, 2)) == INTEGER_CST)
+{
+  HOST_WIDE_INT ssize, max_size, off;
+  bool reverse;
+  src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0);
+  get_ref_base_and_extent (src2, , , _size, );
+  if (ssize != max_size
+ || (ssize % BITS_PER_UNIT) != 0
+ || !wi::eq_p (gimple_call_arg (defstmt, 2), ssize / BITS_PER_UNIT))
+   src2 = NULL_TREE;
+  else
+   {
+ val = gimple_call_arg (defstmt, 1);
+ if (!integer_zerop (val) && is_gimple_assign (stmt))
+   src2 = NULL_TREE;
+   }
+}
+
+  if (src2 == NULL_TREE)
+return;
+
+  if (!operand_equal_p (src, src2, 0))
+{
+  /* Handle also
+a = {};
+MEM[(char * {ref-all})] = MEM[(char * {ref-all})];  */
+  if (is_gimple_assign (stmt)
+ && TREE_CODE (src) == MEM_REF
+ && integer_zerop (TREE_OPERAND (src, 1))
+ && TREE_CODE (TREE_OPERAND (src, 0)) == ADDR_EXPR
+ && DECL_P (src2)
+ && operand_equal_p (TREE_OPERAND (TREE_OPERAND (src, 0), 0),
+ src2, 0)
+ && tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (src)),
+DECL_SIZE (src2)))
+   src = TREE_OPERAND (TREE_OPERAND (src, 0), 0);
+  else
+   return;
+}
+  if (refs_may_alias_p (dest, src))
+return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Simplified\n  ");
+  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+  fprintf (dump_file, "after previous\n  ");
+  print_gimple_stmt (dump_file, defstmt, 0, dump_flags);
+}
+
+  if (is_gimple_assign (stmt))
+{
+  tree ctor = build_constructor (TREE_TYPE (dest), NULL);
+  gimple_assign_set_rhs_from_tree (gsip, ctor);
+  update_stmt (stmt);
+}
+  else
+{
+  gcall *call = as_a  (stmt);
+  tree fndecl = builtin_decl_implicit (BUILT_IN_MEMSET);
+  gimple_call_set_fndecl (call, fndecl);
+  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
+  gimple_call_set_arg (call, 1, val);
+  update_stmt (stmt);
+}
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "into\n  ");
+  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+}
+}
+
 /* A simple pass that attempts to fold all builtin functions.  This pass
is run after we've propagated as many constants as we can.  */
 
@@ -2994,6 +3102,9 @@ pass_fold_builtins::execute (function *f
  continue;
}
}
+ else if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
+   optimize_memcpy (, gimple_assign_lhs (stmt),
+gimple_assign_rhs1 (stmt));
  gsi_next ();
  continue;
}
@@ -3109,14 +3220,39 @@ pass_fold_builtins::execute (function *f
 

[committed] Add 2 tests

2016-11-25 Thread Jakub Jelinek
Hi!

In order to close these 2 PRs fixed recently by Segher, I've added the
corresponding tests and am going to close them as fixed.

2016-11-25  Jakub Jelinek  

PR rtl-optimization/78438
PR rtl-optimization/78477
* gcc.c-torture/execute/pr78438.c: New test.
* gcc.c-torture/execute/pr78477.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr78438.c.jj2016-11-25 
20:17:29.448019008 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr78438.c   2016-11-25 
20:17:05.0 +0100
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/78438 */
+
+char a = 0;
+int b = 197412621;
+
+__attribute__ ((noinline, noclone))
+void foo ()
+{
+  a = 0 > (short) (b >> 11);
+}
+
+int
+main ()
+{
+  asm volatile ("" : : : "memory");
+  if (__CHAR_BIT__ != 8 || sizeof (short) != 2 || sizeof (int) < 4)
+return 0;
+  foo ();
+  if (a != 0)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr78477.c.jj2016-11-25 
20:20:26.593705268 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr78477.c   2016-11-25 
20:20:14.0 +0100
@@ -0,0 +1,27 @@
+/* PR rtl-optimization/78477 */
+
+unsigned a;
+unsigned short b;
+
+unsigned
+foo (unsigned x)
+{
+  b = x;
+  a >>= (b & 1);
+  b = 1 | (b << 5);
+  b >>= 15;
+  x = (unsigned char) b > ((2 - (unsigned char) b) & 1);
+  b = 0;
+  return x;
+}
+
+int
+main ()
+{
+  if (__CHAR_BIT__ != 8 || sizeof (short) != 2 || sizeof (int) < 4)
+return 0;
+  unsigned x = foo (12345);
+  if (x != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)

2016-11-25 Thread Jakub Jelinek
On Fri, Nov 25, 2016 at 06:50:17PM +, Richard Sandiford wrote:
> I think this is still wrong for modes that are wide enough to use
> CONST_WIDE_INT but aren't a multiple of a HWI in size.  In that

The loop iterates by value_bit, which is 8 bits.  Thus, indeed, it will
misbehave if there is a CONST_WIDE_INT with a precision not multiple of 8.

> Does that sound like an improvement?  If so I'll file a PR.
> I might have time before GCC 7 if it qualifies.

Yeah, it might be an improvement.

Jakub


Re: [PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)

2016-11-25 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> If we try to simplify a paradoxical subreg of a CONST_WIDE_INT,
> we can ICE, because wi::extract_uhwi doesn't like accessing
> bytes beyond the precision.
>   for (i = 0; i < elem_bitsize; i += value_bit)
> *vp++ = wi::extract_uhwi (val, i, value_bit);
>   for (; i < elem_bitsize; i += value_bit)
> *vp++ = extend;
> looks wrong, because the second loop is useless.  But, we actually
> do want the second loop, to handle the paradoxical bytes.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-25  Jakub Jelinek  
>
>   PR rtl-optimization/78526
>   * simplify-rtx.c (simplify_immed_subreg): Don't use wi::extract_uhwi
>   beyond val's precision.
>
>   * gcc.dg/pr78526.c: New test.

OK, thanks.

I think this is still wrong for modes that are wide enough to use
CONST_WIDE_INT but aren't a multiple of a HWI in size.  In that
case we'll still try to access the integer in a wider precision for the
final iteration of the first loop.  I'm not sure we have any targets
like that though, and either way, I agree this is a strict improvement.

Perhaps we should have extract_uhwi that works for arbitrary bitpos
and width and zero-extends beyond the precision, and an extract_shwi
that does the same but sign-extending beyond the precision?  Then code
like this could use a single loop and could safely access any wide-int
representation (regardless of whether it's stored in sign-extended form).
The extract_*hwi routines would become a bit more expensive though.

Does that sound like an improvement?  If so I'll file a PR.
I might have time before GCC 7 if it qualifies.

Thanks,
Richard


Re: [patch,avr]: Improve code as of PR41076

2016-11-25 Thread Denis Chertykov
2016-11-25 18:34 GMT+03:00 Georg-Johann Lay :
> Mentioned PR is about composing 16-bit values out of 8-bit values which due
> to integer promotions likes to work on 16-bit values.
>
> The patch adds 3 combiner patterns to catch such situations and then split
> after reload.  Some more splitting is performed after reload for:
>
> and, ior, xor of reg-reg operations,
>
> HImode reg = 0
>
> and 3- and 4-byte registers moves (and 2-byte moved if !MOVW).
>
> This gived better code for almost all test cases, yet some test cases could
> still be improved.
>
> I also tried pre-reload splits, but the generated SUBREGs seem to disturb
> register allocation, hence I used post-reload splits.
>
> There is also PR60145 which has a test case where a 4-byte value is composed
> out of 4 bytes, but the patterns to catch that are going to be insane... so
> for now, such 4-byte cases will still result in bloated code.
>
> Patch tested against trunk without regressions.
>
> Ok for trunk?

Ok.
Please apply.

>
> Johann
>
>
> PR 41076
> * config/avr/avr.md SPLIT34): New mode iterator.
> (bitop): New code iterator.
> (*iorhi3.ashift8-*). New insn-and-split patterns.
> (*movhi): Post-reload split reg = 0.
> (*movhi) [!MOVW]: Post-reload split reg = reg.
> (*mov) [SI,SF,PSI,SQ,USQ,SA,USA]: Post-reload split reg = reg.
> (andhi3, andpsi3, andsi3): Post-reload split reg-reg operations.
> (iorhi3, iorpsi3, iorsi3): Same.
> (xorhi3, xorpsi3, xorsi3): Same.
> * config/avr/avr.c (avr_rtx_costs_1) [IOR && HImode]: Adjust rtx
> costs to *iorhi3.ashift8-* patterns.
>


Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-25 Thread Denis Chertykov
I'm sorry for delay.

I have a problem with the patch:
(Stripping trailing CRs from patch; use --binary to disable.)
patching file avr-arch.h
(Stripping trailing CRs from patch; use --binary to disable.)
patching file avr-devices.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file avr-mcus.def
Hunk #1 FAILED at 62.
1 out of 1 hunk FAILED -- saving rejects to file avr-mcus.def.rej
(Stripping trailing CRs from patch; use --binary to disable.)
patching file gen-avr-mmcu-specs.c
Hunk #1 succeeded at 215 (offset 5 lines).
(Stripping trailing CRs from patch; use --binary to disable.)
patching file specs.h
Hunk #1 succeeded at 58 (offset 1 line).
Hunk #2 succeeded at 66 (offset 1 line).

2016-11-22 23:27 GMT+03:00 Georg-Johann Lay :
> Denis Chertykov schrieb:
>>
>> Do you have any objections, George ?
>
>
> No, the last delta rev3 from 2016-11-10 looks fine to me.
>
>
>>
>> 2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi
>> :
>>>
>>> Ping!
>>>
>>> On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote:

 Ping!

 On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote:
>
> On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:
>>
>> On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:
>>>
>>> On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

 On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:
>
> I have updated patch to include the flash size as well. Took that
> info from
> device headers (it was fed into crt's device information note
> section
> also).
>>
>>
>> The new option would render -mn-flash superfluous, but we should
>> keep it for
>> backward compatibility.
>
> Ok.
>>
>> Shouldn't link_pmem_wrap then be removed from link_relax, i.e.
>> from
>> LINK_RELAX_SPEC?  And what happens if relaxation is off?
>
> Yes. Removed link_pmem_wrap from link_relax.
> Disabling relaxation doesn't change -mpmem-wrap-around behavior.
> 
> flashsize-and-wrap-around.patch


> diff --git a/gcc/config/avr/avr-mcus.def
> b/gcc/config/avr/avr-mcus.def
> index 6bcc6ff..9d4aa1a 100644


>  /*

 
>>>
>>>  /* Classic, > 8K, <= 64K.  */
>>> -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
>>> 0x0060, 0x0, 1)
>>> -AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
>>> "__AVR_AT43USB355__",0x0060, 0x0, 1)
>>> -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
>>> "__AVR_AT76C711__",  0x0060, 0x0, 1)
>>> +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
>>> 0x0060, 0x0, 1, 0x6000)
>>> +AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
>>> "__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
>>> +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
>>> "__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
>>> +AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE,
>>> "__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)
>>>  /* Classic, == 128K.  */
>>> -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
>>> 0x0060, 0x0, 2)
>>> -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
>>> "__AVR_ATmega103__", 0x0060, 0x0, 2)
>>> -AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
>>> "__AVR_AT43USB320__",   0x0060, 0x0, 2)
>>> +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
>>> 0x0060, 0x0, 2, 0x2)
>>> +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
>>> "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
>>>  /* Classic + MOVW + JMP/CALL.  */
>>
>>
>> If at43usb320 is in the wrong multilib, then this should be handled as
>> separate issue / patch together with its own PR. Sorry for the
>> confusion.  I
>> just noticed that some fields don't match...
>>
>> It is not even clear to me from the data sheet if avr3 is the correct
>> multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it
>> also has
>> MUL) as there is no reference to the exact instruction set -- Atmochip
>> will
>> know.
>>
>> Moreover, such a change should be sync'ed with avr-libc as all
>> multilib
>> stuff is hand-wired there: no use of --print-foo meta information
>> retrieval
>> by avr-libc :-((
>>
>> I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565
>> for
>> this one.
>>
> Thats better. I've attached the updated patch. If OK, could someone
> commit please?
>
> I'll try if I could find some more info for AT43USB320.
>
> Regards,
> Pitchumani
>
>>
>


one more patch for PR77541

2016-11-25 Thread Vladimir N Makarov
Uros pointed me out that I should use another target for the PR test.  
Here is the patch committed as rev. 242881.


2016-11-25  Vladimir Makarov  

PR rtl-optimization/77541
* gcc.target/i386/pr77541.c: Change target to int128.

Index: gcc.target/i386/pr77541.c
===
--- gcc.target/i386/pr77541.c   (revision 242848)
+++ gcc.target/i386/pr77541.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target int128 } */
 /* { dg-options "-O2 -Wno-psabi" } */

 #define MAGIC 0x0706050403020100



Re: [PATCH v2] aarch64: Add split-stack initial support

2016-11-25 Thread Adhemerval Zanella


On 15/11/2016 16:38, Wilco Dijkstra wrote:
> 
> On 07/11/2016 16:59, Adhemerval Zanella wrote:
>> On 14/10/2016 15:59, Wilco Dijkstra wrote:
> 
>> There is no limit afaik on gold split stack allocation handling,
>> and I think one could be added for each backend (in the method
>> override require to implement it).
>>
>> In fact it is not really required to tie the nop generation with the
>> instruction generated by 'aarch64_internal_mov_immediate', it is
>> just a matter to simplify linker code.  
> 
> If there is no easy limit and you'll still require a nop, I think it is best 
> then
> to emit mov N+movk #0. Then the scheduler won't be able to reorder
> them with the add/sub.

Good call, I have changed the patch to emit a mov N+mov #0 instead
on relying the emit_nop.

> 
>>> Is there any need to detect underflow of x10 or is there a guarantee that 
>>> stacks are
>>> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's 
>>> safe
>>> to do a signed comparison.
>>
>> I do not think so, at least none of current backend that implements
>> split stack do so.
> 
> OK, well a signed comparison like in your new version works for underflow.
> 
> Now to the patch:
> 
> 
> @@ -3316,6 +3339,28 @@ aarch64_expand_prologue (void)
>aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
>callee_adjust != 0 || frame_pointer_needed);
>aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
> +
> +  if (split_stack_arg_pointer_used_p ())
> +{
> +  /* Setup the argument pointer (x10) for -fsplit-stack code.  If
> +  __morestack was called, it will left the arg pointer to the
> +  old stack in x28.  Otherwise, the argument pointer is the top
> +  of current frame.  */
> +  rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);
> +  rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);
> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> +
> +  rtx not_more = gen_label_rtx ();
> +
> +  rtx cmp = gen_rtx_fmt_ee (LT, VOIDmode, cc_reg, const0_rtx);
> +  rtx jump = emit_jump_insn (gen_condjump (cmp, cc_reg, not_more));
> +  JUMP_LABEL (jump) = not_more;
> +  LABEL_NUSES (not_more) += 1;
> +
> +  emit_move_insn (x11, x28);
> +
> +  emit_label (not_more);
> +}
> 
> If you pass the old sp in x11 when called from __morestack you can remove
> the above thunk completely.

Indeed this snippet does not make sense anymore, I removed it.

> 
> +  /* It limits total maximum stack allocation on 2G so its value can be
> + materialized using two instructions at most (movn/movk).  It might be
> + used by the linker to add some extra space for split calling non split
> + stack functions.  */
> +  allocate = cfun->machine->frame.frame_size;
> +  if (allocate > ((HOST_WIDE_INT) 1 << 31))
> +{
> +  sorry ("Stack frame larger than 2G is not supported for 
> -fsplit-stack");
> +  return;
> +}
> 
> Note a 2-instruction mov/movk can generate any immediate up to 4GB and if
> we need even large sizes, we could round up to a multiple of 64KB so that 2
> instructions are enough for a 48-bit stack size...

I think we can set a limit of 4GB (powerpc64 backend limits to 2GB and
it seems fine).  I corrected the comment.

> 
> +  int ninsn = aarch64_internal_mov_immediate (reg10, GEN_INT (-allocate),
> +   true, Pmode);
> +  gcc_assert (ninsn == 1 || ninsn == 2);
> +  if (ninsn == 1)
> +emit_insn (gen_nop ());
> 
> To avoid any issues with the nop being scheduled, it's best to emit an 
> explicit movk
> here (0x if allocate > 0, or 0 if zero) using gen_insv_immdi.

Right, I changed to that.

> 
> +void
> +aarch64_split_stack_space_check (rtx size, rtx label)
> 
> Isn't very similar code used in aarch64_expand_split_stack_prologue? Any 
> possibility
> to share/reuse?

I though about it, but it would require split in two subparts (one
to load __private_ss from TCB and another to jump to __morestack)
and both are basically 4 lines.  In the end I think current approach
should be simpler.

> 
> +static void
> +aarch64_live_on_entry (bitmap regs)
> +{
> +  if (flag_split_stack)
> +bitmap_set_bit (regs, R11_REGNUM);
> +}
> 
> I'm wondering whether you need extra code in aarch64_can_eliminate to deal
> with the argument pointer? Also do we need to define a fixed register, or 
> will GCC
> automatically allocate it to a callee-save if necessary?

Now that you asked I think we can get rid of this live marking.
I used as base for initial patch the powerpc backend and modelled
the argument pointer usage for aarch64 based on it.  But after the
patch iterations and they way register is set I think it is safe
to remove it this constraint.

> 
> +++ b/libgcc/config/aarch64/morestack.S
> 
> +/* Offset from __morestack frame where the arguments size saved and
> +   passed to __generic_morestack.  */
> +#define ARGS_SIZE_SAVE   80
> 
> This define is unused.

[PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)

2016-11-25 Thread Jakub Jelinek
Hi!

If we try to simplify a paradoxical subreg of a CONST_WIDE_INT,
we can ICE, because wi::extract_uhwi doesn't like accessing
bytes beyond the precision.
  for (i = 0; i < elem_bitsize; i += value_bit)
*vp++ = wi::extract_uhwi (val, i, value_bit);
  for (; i < elem_bitsize; i += value_bit)
*vp++ = extend;
looks wrong, because the second loop is useless.  But, we actually
do want the second loop, to handle the paradoxical bytes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-25  Jakub Jelinek  

PR rtl-optimization/78526
* simplify-rtx.c (simplify_immed_subreg): Don't use wi::extract_uhwi
beyond val's precision.

* gcc.dg/pr78526.c: New test.

--- gcc/simplify-rtx.c.jj   2016-11-18 20:04:28.0 +0100
+++ gcc/simplify-rtx.c  2016-11-25 15:07:13.872322389 +0100
@@ -5740,8 +5740,9 @@ simplify_immed_subreg (machine_mode oute
  {
rtx_mode_t val = rtx_mode_t (el, innermode);
unsigned char extend = wi::sign_mask (val);
+   int prec = wi::get_precision (val);
 
-   for (i = 0; i < elem_bitsize; i += value_bit)
+   for (i = 0; i < prec && i < elem_bitsize; i += value_bit)
  *vp++ = wi::extract_uhwi (val, i, value_bit);
for (; i < elem_bitsize; i += value_bit)
  *vp++ = extend;
--- gcc/testsuite/gcc.dg/pr78526.c.jj   2016-11-25 15:15:53.266799450 +0100
+++ gcc/testsuite/gcc.dg/pr78526.c  2016-11-25 15:16:12.004564146 +0100
@@ -0,0 +1,21 @@
+/* PR rtl-optimization/78526 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O -fno-tree-ccp -fno-tree-sra -g -w" } */
+/* { dg-additional-options "-mavx512bw" { target i?86-*-* x86_64-*-* } } */
+
+typedef unsigned U __attribute__ ((vector_size (64)));
+typedef unsigned __int128 V __attribute__ ((vector_size (64)));
+
+static inline V
+bar (U u, U x, V v)
+{
+  v = (V)(U) { 0, ~0 };
+  v[x[0]] <<= u[-63];
+  return v;
+}
+
+V
+foo (U u)
+{
+  return bar (u, (U) {}, (V) {});
+}

Jakub


[C++ PATCH] Fix ICE with scoped enums and -g (PR c++/72808)

2016-11-25 Thread Jakub Jelinek
Hi!

As mentioned in the PR, we ICE during type verification called from
dwarf2out, because a class has different TYPE_FIELDS from its variants.
In particular it contains an extra CONST_DECL.
While CONST_DECLs are ignored for C/C++ in dwarf2out.c, I think they aren't
ignored for Fortran/Ada, so ignoring them in the verifier might be too
risky.

This patch instead fixes the variants immediately.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Another possibility would be just to update TYPE_FIELDS of the variants
instead of calling whole fixup_type_variants.

2016-11-25  Jakub Jelinek  

PR c++/72808
* decl.c (build_enumerator): Call fixup_type_variants on
current_class_type after finish_member_declaration.

* g++.dg/debug/pr72808.C: New test.

--- gcc/cp/decl.c.jj2016-11-23 19:44:40.0 +0100
+++ gcc/cp/decl.c   2016-11-25 13:15:46.407116179 +0100
@@ -14518,6 +14518,7 @@ incremented enumerator value is too larg
current_access_specifier = access_public_node;
 
   finish_member_declaration (decl);
+  fixup_type_variants (current_class_type);
 
   current_access_specifier = saved_cas;
 }
--- gcc/testsuite/g++.dg/debug/pr72808.C.jj 2016-11-25 13:17:06.777091600 
+0100
+++ gcc/testsuite/g++.dg/debug/pr72808.C2016-11-25 13:16:27.0 
+0100
@@ -0,0 +1,24 @@
+// PR c++/72808
+// { dg-do compile }
+// { dg-options "-g -std=c++14" }
+
+struct A
+{
+  virtual void foo ();
+};
+
+struct B : A
+{
+  void foo ();
+  enum C : int;
+};
+
+enum B::C : int
+{
+  D
+};
+
+void
+B::foo ()
+{
+}

Jakub


[PATCH] Fix -fcompare-debug issues in IPA-ICF (PR lto/78211)

2016-11-25 Thread Jakub Jelinek
Hi!

IPA-ICF performs some code-generation visible changes from hash table
traversal, where the hash values can be different between -g and -g0
(I bet it hashes DECL_UID in somewhere, perhaps other things).
The following patch fixes it by adding a vector next to the hash table,
which tracks the groups in the order they were created.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-25  Jakub Jelinek  

PR lto/78211
* ipa-icf.h (sem_item_optimizer): Add m_classes_vec member.
* ipa-icf.c (sem_item_optimizer::sem_item_optimizer): Initialize
it.
(sem_item_optimizer::~sem_item_optimizer): Traverse m_classes_vec
vector instead of traversing m_classes hash table.  Release
m_classes_vec.
(sem_item_optimizer::read_section, sem_item_optimizer::add_class):
Formatting fixes.
(sem_item_optimizer::get_group_by_hash): When inserting a new group,
add it also to m_classes_vec vector.
(sem_item_optimizer::remove_symtab_node,
sem_item_optimizer::build_hash_based_classes,
sem_item_optimizer::parse_nonsingleton_classes): Formatting fixes.
(sem_item_optimizer::subdivide_classes_by_equality,
sem_item_optimizer::subdivide_classes_by_sensitive_refs,
sem_item_optimizer::verify_classes): Traverse m_classes_vec vector
instead of traversing m_classes hash table.  Formatting fixes.
(sem_item_optimizer::traverse_congruence_split,
sem_item_optimizer::do_congruence_step_for_index,
sem_item_optimizer::do_congruence_step): Formatting fixes.
(sem_item_optimizer::process_cong_reduction): Traverse m_classes_vec
vector instead of traversing m_classes hash table.
(sem_item_optimizer::dump_cong_classes): Likewise.  Formatting fixes.
(sem_item_optimizer::merge_classes): Traverse m_classes_vec vector
instead of traversing m_classes hash table.

* g++.dg/ipa/pr78211.C: New test.

--- gcc/ipa-icf.h.jj2016-11-22 11:05:58.0 +0100
+++ gcc/ipa-icf.h   2016-11-25 11:41:18.210144897 +0100
@@ -609,9 +609,12 @@ private:
   /* A set containing all items removed by hooks.  */
   hash_set  m_removed_items_set;
 
-  /* Hashtable of congruence classes */
+  /* Hashtable of congruence classes.  */
   hash_table  m_classes;
 
+  /* Vector of congruence classes.  */
+  vec  m_classes_vec;
+
   /* Count of congruence classes.  */
   unsigned int m_classes_count;
 
--- gcc/ipa-icf.c.jj2016-11-22 11:05:58.0 +0100
+++ gcc/ipa-icf.c   2016-11-25 12:05:31.765505438 +0100
@@ -2284,6 +2284,7 @@ sem_item_optimizer::sem_item_optimizer (
   m_varpool_node_hooks (NULL)
 {
   m_items.create (0);
+  m_classes_vec.create (0);
   bitmap_obstack_initialize (_bmstack);
 }
 
@@ -2292,17 +2293,19 @@ sem_item_optimizer::~sem_item_optimizer
   for (unsigned int i = 0; i < m_items.length (); i++)
 delete m_items[i];
 
-  for (hash_table::iterator it = m_classes.begin 
();
-   it != m_classes.end (); ++it)
+  unsigned int l;
+  congruence_class_group *it;
+  FOR_EACH_VEC_ELT (m_classes_vec, l, it)
 {
-  for (unsigned int i = 0; i < (*it)->classes.length (); i++)
-   delete (*it)->classes[i];
+  for (unsigned int i = 0; i < it->classes.length (); i++)
+   delete it->classes[i];
 
-  (*it)->classes.release ();
-  free (*it);
+  it->classes.release ();
+  free (it);
 }
 
   m_items.release ();
+  m_classes_vec.release ();
 
   bitmap_obstack_release (_bmstack);
 }
@@ -2361,8 +2364,8 @@ void
 sem_item_optimizer::read_section (lto_file_decl_data *file_data,
  const char *data, size_t len)
 {
-  const lto_function_header *header =
-(const lto_function_header *) data;
+  const lto_function_header *header
+= (const lto_function_header *) data;
   const int cfg_offset = sizeof (lto_function_header);
   const int main_offset = cfg_offset + header->cfg_size;
   const int string_offset = main_offset + header->main_size;
@@ -2373,9 +2376,9 @@ sem_item_optimizer::read_section (lto_fi
   lto_input_block ib_main ((const char *) data + main_offset, 0,
   header->main_size, file_data->mode_table);
 
-  data_in =
-lto_data_in_create (file_data, (const char *) data + string_offset,
-   header->string_size, vNULL);
+  data_in
+= lto_data_in_create (file_data, (const char *) data + string_offset,
+ header->string_size, vNULL);
 
   count = streamer_read_uhwi (_main);
 
@@ -2473,9 +2476,9 @@ sem_item_optimizer::add_class (congruenc
 {
   gcc_assert (cls->members.length ());
 
-  congruence_class_group *group = get_group_by_hash (
-   cls->members[0]->get_hash (),
-   cls->members[0]->type);
+  congruence_class_group *group
+= get_group_by_hash (cls->members[0]->get_hash (),
+

Re: [Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic

2016-11-25 Thread Janus Weil
Hi Paul,

> This looks fine to me - OK for trunk.

thanks a lot. Committed as r242880.

Cheers,
Janus



> On 25 November 2016 at 13:40, Janus Weil  wrote:
>> Hi all,
>>
>> here is a patch that fixes a rejects-valid problem with an
>> unlimited-polymorphic variable in a generic procedure. Removing the
>> line with UNLIMITED_POLY fixes the error and the rest of the patch is
>> just slightly refactoring the for-loop.
>>
>> Regtested successfully on x86_64-linux-gnu. Ok for trunk?
>>
>> Cheers,
>> Janus
>>
>>
>> 2016-11-25  Janus Weil  
>>
>> PR fortran/60853
>> * interface.c (gfc_compare_interfaces): Remove bad special case for
>> unlimited polymorphism. Refactor for loop.
>>
>> 2016-11-25  Janus Weil  
>>
>> PR fortran/60853
>> * gfortran.dg/typebound_assignment_8.f90: New test case.
>
>
>
> --
> The difference between genius and stupidity is; genius has its limits.
>
> Albert Einstein


[PATCH] Fix a couple of issues in gimple-ssa-sprintf.c

2016-11-25 Thread Jakub Jelinek
Hi!

Here is an attempt to fix a couple of bugs in gimple-ssa-sprintf.c.
First of all, it assumes size_t is always the same as uintmax_t, which
is not necessarily the case.
Second, it uses static tree {,u}intmax_type_node; variables for caching
those types, but doesn't register them with GC; but their computation
is quite cheap, so I think it isn't worth wasting a GC root for those,
especially if we compute it only in the very rare case when somebody
uses the j modifier.
Third, the code assumes that ptrdiff_t is the signed type for size_t.
E.g. vms is one port where that isn't true, ptrdiff_t can be 64-bit,
while size_t is 32-bit.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-25  Jakub Jelinek  

* gimple-ssa-sprintf.c (build_intmax_type_nodes): Look at
UINTMAX_TYPE rather than SIZE_TYPE.  Add gcc_unreachable if
intmax_t couldn't be determined.
(format_integer): Make {,u}intmax_type_node no longer static,
initialize them only when needed.  For z and t use
signed_or_unsigned_type_for instead of assuming size_t and
ptrdiff_t have the same precision.

--- gcc/gimple-ssa-sprintf.c.jj 2016-11-25 09:49:47.0 +0100
+++ gcc/gimple-ssa-sprintf.c2016-11-25 10:26:58.763114194 +0100
@@ -733,23 +733,23 @@ format_percent (const conversion_spec &,
 }
 
 
-/* Ugh.  Compute intmax_type_node and uintmax_type_node the same way
-   lto/lto-lang.c does it.  This should be available in tree.h.  */
+/* Compute intmax_type_node and uintmax_type_node similarly to how
+   tree.c builds size_type_node.  */
 
 static void
 build_intmax_type_nodes (tree *pintmax, tree *puintmax)
 {
-  if (strcmp (SIZE_TYPE, "unsigned int") == 0)
+  if (strcmp (UINTMAX_TYPE, "unsigned int") == 0)
 {
   *pintmax = integer_type_node;
   *puintmax = unsigned_type_node;
 }
-  else if (strcmp (SIZE_TYPE, "long unsigned int") == 0)
+  else if (strcmp (UINTMAX_TYPE, "long unsigned int") == 0)
 {
   *pintmax = long_integer_type_node;
   *puintmax = long_unsigned_type_node;
 }
-  else if (strcmp (SIZE_TYPE, "long long unsigned int") == 0)
+  else if (strcmp (UINTMAX_TYPE, "long long unsigned int") == 0)
 {
   *pintmax = long_long_integer_type_node;
   *puintmax = long_long_unsigned_type_node;
@@ -762,12 +762,14 @@ build_intmax_type_nodes (tree *pintmax,
char name[50];
sprintf (name, "__int%d unsigned", int_n_data[i].bitsize);
 
-   if (strcmp (name, SIZE_TYPE) == 0)
+   if (strcmp (name, UINTMAX_TYPE) == 0)
  {
*pintmax = int_n_trees[i].signed_type;
*puintmax = int_n_trees[i].unsigned_type;
+   return;
  }
  }
+  gcc_unreachable ();
 }
 }
 
@@ -851,15 +853,8 @@ format_pointer (const conversion_spec 
 static fmtresult
 format_integer (const conversion_spec , tree arg)
 {
-  /* These are available as macros in the C and C++ front ends but,
- sadly, not here.  */
-  static tree intmax_type_node;
-  static tree uintmax_type_node;
-
-  /* Initialize the intmax nodes above the first time through here.  */
-  if (!intmax_type_node)
-build_intmax_type_nodes (_type_node, _type_node);
-
+  tree intmax_type_node;
+  tree uintmax_type_node;
   /* Set WIDTH and PRECISION to either the values in the format
  specification or to zero.  */
   int width = spec.have_width ? spec.width : 0;
@@ -909,19 +904,20 @@ format_integer (const conversion_spec 
   break;
 
 case FMT_LEN_z:
-  dirtype = sign ? ptrdiff_type_node : size_type_node;
+  dirtype = signed_or_unsigned_type_for (!sign, size_type_node);
   break;
 
 case FMT_LEN_t:
-  dirtype = sign ? ptrdiff_type_node : size_type_node;
+  dirtype = signed_or_unsigned_type_for (!sign, ptrdiff_type_node);
   break;
 
 case FMT_LEN_j:
+  build_intmax_type_nodes (_type_node, _type_node);
   dirtype = sign ? intmax_type_node : uintmax_type_node;
   break;
 
 default:
-   return fmtresult ();
+  return fmtresult ();
 }
 
   /* The type of the argument to the directive, either deduced from

Jakub


[committed] Fix combine ICE with out of bounds shift (PR rtl-optimization/78527)

2016-11-25 Thread Jakub Jelinek
Hi!

The following testcase ICEs in the combiner, because we try to optimize
HImode SUBREG of SImode LSHIFTRT with shift count 80, so width is -48,
and in the end we call smallest_mode_for_size (-48, MODE_INT) and
that function has unsigned int as first argument; most of the ports don't
have modes with 4Gbits yet.

Bootstrapped/regtested on x86_64-linux and i686-linux, pre-approved by
Segher on IRC, committed to trunk.

2016-11-25  Jakub Jelinek  

PR rtl-optimization/78527
* combine.c (make_compound_operation_int): Ignore LSHIFTRT with
out of bounds shift count.

* gcc.c-torture/compile/pr78527.c: New test.

--- gcc/combine.c.jj2016-11-25 09:49:50.0 +0100
+++ gcc/combine.c   2016-11-25 13:42:43.628487136 +0100
@@ -8091,6 +8091,8 @@ make_compound_operation_int (machine_mod
if (GET_CODE (inner) == LSHIFTRT
&& CONST_INT_P (XEXP (inner, 1))
&& GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
+   && (UINTVAL (XEXP (inner, 1))
+   < GET_MODE_PRECISION (GET_MODE (inner)))
&& subreg_lowpart_p (x))
  {
new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
--- gcc/testsuite/gcc.c-torture/compile/pr78527.c.jj2016-11-25 
13:44:53.806825477 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr78527.c   2016-11-25 
13:44:35.0 +0100
@@ -0,0 +1,21 @@
+/* PR rtl-optimization/78527 */
+
+unsigned a;
+short b, e;
+int *c;
+char d;
+
+int
+main ()
+{
+  int f = 80;
+  for (;;) {
+if (f > 432)
+  *c = a;
+while (b)
+  if (d)
+e = -(a >> f);
+c = 
+b = e;
+  }
+}

Jakub


Re: [Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic

2016-11-25 Thread Paul Richard Thomas
Dear Janus,

This looks fine to me - OK for trunk.

Best regards and thanks

Paul

On 25 November 2016 at 13:40, Janus Weil  wrote:
> Hi all,
>
> here is a patch that fixes a rejects-valid problem with an
> unlimited-polymorphic variable in a generic procedure. Removing the
> line with UNLIMITED_POLY fixes the error and the rest of the patch is
> just slightly refactoring the for-loop.
>
> Regtested successfully on x86_64-linux-gnu. Ok for trunk?
>
> Cheers,
> Janus
>
>
> 2016-11-25  Janus Weil  
>
> PR fortran/60853
> * interface.c (gfc_compare_interfaces): Remove bad special case for
> unlimited polymorphism. Refactor for loop.
>
> 2016-11-25  Janus Weil  
>
> PR fortran/60853
> * gfortran.dg/typebound_assignment_8.f90: New test case.



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-25 Thread Dominik Vogt
On Tue, Nov 22, 2016 at 10:38:42AM -0500, David Malcolm wrote:
> On Tue, 2016-11-22 at 15:45 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> > > On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> > > > Can't it be done only if xloc.file contains any fancy characters?
> > > 
> > > Sure, but why? Strings generally get emitted with quotes around
> > > them, I
> > > don't see a good reason for filenames to be different, especially
> > > if it
> > > makes the output easier to parse.
> > 
> > Because printing common filenames matches what we emit in
> > diagnostics,
> > what e.g. sanitizers emit at runtime diagnostics, what we emit as
> > locations
> > in gimple dumps etc.
> 
> It sounds like a distinction between human-readable vs machine
> -readable.
> 
> How about something like the following, which only adds the quotes if
> outputting the RTL FE's input format?
> 
> Does this fix the failing tests?

Yep.

> --- a/gcc/print-rtl.c
> +++ b/gcc/print-rtl.c
> @@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, 
> int idx)
>if (INSN_HAS_LOCATION (in_insn))
>   {
> expanded_location xloc = insn_location (in_insn);
> -   fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
> +   if (m_compact)
> + fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
> +   else
> + fprintf (m_outfile, " %s:%i", xloc.file, xloc.line);

Looks sensible to me.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-25 Thread Tamar Christina
 >
> > A few comments about this new version:
> > * arm-neon-ref.h: why do you create
> CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
> > Can't you just add calls to CHECK_CRYPTO in the existing
> > CHECK_RESULTS_NAMED_NO_FP16?

Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I 
added it
I didn't remove the split. I'll do it now.

> >
> > * p64_p128:
> > From what I can see ARM and AArch64 differ on the vceq variants
> > available with poly64.
> > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
> > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...
> > Actually I've just noticed the other you submitted while I was writing
> > this, where you add vceq_p64 for aarch64, but it still returns
> > uint64_t.
> > Why do you change the vceq_64 test to return poly64_t instead of
> uint64_t?

This patch is slightly outdated. The correct type is `uint64_t` but when it was 
noticed
This patch was already sent. New one coming soon.

> >
> > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?
> >

This is wrong, remove them. It was supposed to be around the vldX_lane_p64 
tests.

> > The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE
> > tests
> >
> > You need to protect the new vmov, vget_high and vget_lane tests with
> > #ifdef __aarch64__.
> >

vget_lane is already in an #ifdef, vmov you're right, but I also notice that the
test calls VDUP instead of VMOV, which explains why I didn't get a test failure.

Thanks for the feedback,
I'll get these updated.

> 
> Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.
> 
> 
> > Christophe
> >
> >> Kind regards,
> >> Tamar
> >> 
> >> From: Tamar Christina
> >> Sent: Tuesday, November 8, 2016 11:58:46 AM
> >> To: Christophe Lyon
> >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard
> >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
> >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing
> >> Poly64_t intrinsics to GCC
> >>
> >> Hi Christophe,
> >>
> >> Thanks for the review!
> >>
> >>>
> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests
> >>> except for vreinterpret.
> >>> Why do you need to create p64.c ?
> >>
> >> I originally created it because I had a much smaller set of
> >> intrinsics that I wanted to add initially, this grew and It hadn't 
> >> occurred to
> me that I can use the existing file now.
> >>
> >> Another reason was the effective-target arm_crypto_ok as you
> mentioned below.
> >>
> >>>
> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or
> >>> p64_p128.c might be easier to maintain than adding them to vcreate.c
> >>> etc with several #ifdef conditions.
> >>
> >> Fair enough, I'll move them to p64_p128.c.
> >>
> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
> >>> condition? These intrinsics are defined in arm/arm_neon.h, right?
> >>> They are tested in p64_p128.c
> >>
> >> I should have looked for them, they weren't being tested before so I
> >> had Mistakenly assumed that they weren't available. Now I realize I
> >> just need To add the proper test option to the file to enable crypto. I'll
> update this as well.
> >>
> >>> Looking at your patch, it seems some tests are currently missing for arm:
> >>> vget_high_p64. I'm not sure why I missed it when I removed neont-
> >>> testgen...
> >>
> >> I'll adjust the test conditions so they run for ARM as well.
> >>
> >>>
> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
> >>> arm_crypto_ok prevent the tests from running on aarch64?
> >>
> >> Yes they do, I was comparing the output against a clean version and
> >> hasn't noticed That they weren't running. Thanks!
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Christophe


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-25 Thread Segher Boessenkool
On Fri, Nov 25, 2016 at 10:15:25AM +0100, Richard Biener wrote:
> On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkool
>  wrote:
> > On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
> >> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
> >> >
> >> >That we compare different kinds of costs (which really has no meaning at
> >> >all, it's a heuristic at best) in various places is a known problem, not
> >> >a regression.
> >> But the problems with the costing system exhibit themselves as a code
> >> quality regression.  In the end that's what the end-users see -- a
> >> regression in the quality of the code GCC generates.
> >
> > Yes, exactly -- and I fear this all-encompassing change will cause just
> > such a regression for many users.  Tests are running, will know more
> > later today (or tomorrow).
> >
> > The PR is about a very specific problem; the patch is not.  The patch
> > is not a bug fix.  If we allow anything that "makes things better" in
> > stage 3, what make it different from stage 1 then?
> 
> That's a good question ;)  The stage 3 definition has a loophole via
> "go file a bug about feature X, then it's a bugfix!".
> 
> I'm all open for a more sensible definition, like constraining the kind
> of non-regression fixes that we want to allow, but I fear the most
> sensible option would be to simply ditch the notion of different
> "stages" and make it "general development" and "regression fixing".
> (though if you try hard enough and go back in time you'll find that
> almost all non-enhancement bugs are regressions in some sense)

The scale goes: early stage 1, anything goes; ...; until stage 4, only
very narrow regression fixes are allowed.

Let's try to keep that spirit, and not behave like politicians following
the "rules" (or not).

> And yes, current stage3 still feels too much like stage1 ;)

Yes, very much so.  Well, at least trunk bootstraps on more targets now.

--

So IMNSHO this rtx costing change belongs in early stage 1, and should
be reverted.  If ifcvt should use full rtx cost instead of rtx_src_cost,
fix *that*, that is a much more local change.  And even then, test on
more targets please.


Segher


Re: [tree-tailcall] Check if function returns it's argument

2016-11-25 Thread Jeff Law

On 11/25/2016 01:07 AM, Richard Biener wrote:


For the tail-call, issue should we artificially create a lhs and use that
as return value (perhaps by a separate pass before tailcall) ?

__builtin_memcpy (a1, a2, a3);
return a1;

gets transformed to:
_1 = __builtin_memcpy (a1, a2, a3)
return _1;

So tail-call optimization pass would see the IL in it's expected form.


As said, a RTL expert needs to chime in here.  Iff then tail-call
itself should do this rewrite.  But if this form is required to make
things work (I suppose you checked it _does_ actually work?) then
we'd need to make sure later passes do not undo it.  So it looks
fragile to me.  OTOH I seem to remember that the flags we set on
GIMPLE are merely a hint to RTL expansion and the tailcalling is
verified again there?
So tail calling actually sits on the border between trees and RTL. 
Essentially it's an expand-time decision as we use information from 
trees as well as low level target information.


I would not expect the former sequence to tail call.  The tail calling 
code does not know that the return value from memcpy will be a1.  Thus 
the tail calling code has to assume that it'll have to copy a1 into the 
return register after returning from memcpy, which obviously can't be 
done if we tail called memcpy.


The second form is much more likely to turn into a tail call sequence 
because the return value from memcpy will be sitting in the proper 
register.  This form out to work for most calling conventions that allow 
tail calls.


We could (in theory) try and exploit the fact that memcpy returns its 
first argument as a return value, but that would only be helpful on a 
target where the first argument and return value use the same register. 
So I'd have a slight preference to rewriting per Prathamesh's suggestion 
above since it's more general.



Jeff


[patch,avr]: Improve code as of PR41076

2016-11-25 Thread Georg-Johann Lay
Mentioned PR is about composing 16-bit values out of 8-bit values which 
due to integer promotions likes to work on 16-bit values.


The patch adds 3 combiner patterns to catch such situations and then 
split after reload.  Some more splitting is performed after reload for:


and, ior, xor of reg-reg operations,

HImode reg = 0

and 3- and 4-byte registers moves (and 2-byte moved if !MOVW).

This gived better code for almost all test cases, yet some test cases 
could still be improved.


I also tried pre-reload splits, but the generated SUBREGs seem to 
disturb register allocation, hence I used post-reload splits.


There is also PR60145 which has a test case where a 4-byte value is 
composed out of 4 bytes, but the patterns to catch that are going to be 
insane... so for now, such 4-byte cases will still result in bloated code.


Patch tested against trunk without regressions.

Ok for trunk?

Johann


PR 41076
* config/avr/avr.md SPLIT34): New mode iterator.
(bitop): New code iterator.
(*iorhi3.ashift8-*). New insn-and-split patterns.
(*movhi): Post-reload split reg = 0.
(*movhi) [!MOVW]: Post-reload split reg = reg.
(*mov) [SI,SF,PSI,SQ,USQ,SA,USA]: Post-reload split reg = reg.
(andhi3, andpsi3, andsi3): Post-reload split reg-reg operations.
(iorhi3, iorpsi3, iorsi3): Same.
(xorhi3, xorpsi3, xorsi3): Same.
* config/avr/avr.c (avr_rtx_costs_1) [IOR && HImode]: Adjust rtx
costs to *iorhi3.ashift8-* patterns.

Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 242823)
+++ config/avr/avr.c	(working copy)
@@ -10645,6 +10645,19 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
   /* FALLTHRU */
 case AND:
 case IOR:
+  if (IOR == code
+  && HImode == mode
+  && ASHIFT == GET_CODE (XEXP (x, 0)))
+{
+  *total = COSTS_N_INSNS (2);
+  // Just a rough estimate.  If we see no sign- or zero-extend,
+  // then increase the cost a little bit.
+  if (REG_P (XEXP (XEXP (x, 0), 0)))
+*total += COSTS_N_INSNS (1);
+  if (REG_P (XEXP (x, 1)))
+*total += COSTS_N_INSNS (1);
+  return true;
+}
   *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
   *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed);
   if (!CONST_INT_P (XEXP (x, 1)))
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 242823)
+++ config/avr/avr.md	(working copy)
@@ -260,6 +260,10 @@ (define_mode_iterator ORDERED234 [HI SI
   HQ UHQ HA UHA
   SQ USQ SA USA])
 
+;; Post-reload split of 3, 4 bytes wide moves.
+(define_mode_iterator SPLIT34 [SI SF PSI
+   SQ USQ SA USA])
+
 ;; Define code iterators
 ;; Define two incarnations so that we can build the cross product.
 (define_code_iterator any_extend  [sign_extend zero_extend])
@@ -267,6 +271,7 @@ (define_code_iterator any_extend2 [sign_
 (define_code_iterator any_extract [sign_extract zero_extract])
 (define_code_iterator any_shiftrt [lshiftrt ashiftrt])
 
+(define_code_iterator bitop [xor ior and])
 (define_code_iterator xior [xor ior])
 (define_code_iterator eqne [eq ne])
 
@@ -3328,6 +,66 @@ (define_insn "xorsi3"
(set_attr "adjust_len" "*,out_bitop,out_bitop")
(set_attr "cc" "set_n,clobber,clobber")])
 
+
+(define_split
+  [(set (match_operand:SPLIT34 0 "register_operand")
+(match_operand:SPLIT34 1 "register_operand"))]
+  "optimize
+   && reload_completed"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+  {
+machine_mode mode_hi = 4 == GET_MODE_SIZE (mode) ? HImode : QImode;
+bool lo_first = REGNO (operands[0]) < REGNO (operands[1]);
+rtx dst_lo = simplify_gen_subreg (HImode, operands[0], mode, 0);
+rtx src_lo = simplify_gen_subreg (HImode, operands[1], mode, 0);
+rtx dst_hi = simplify_gen_subreg (mode_hi, operands[0], mode, 2);
+rtx src_hi = simplify_gen_subreg (mode_hi, operands[1], mode, 2);
+
+operands[2] = lo_first ? dst_lo : dst_hi;
+operands[3] = lo_first ? src_lo : src_hi;
+operands[4] = lo_first ? dst_hi : dst_lo;
+operands[5] = lo_first ? src_hi : src_lo;
+  })
+  
+(define_split
+  [(set (match_operand:HI 0 "register_operand")
+(match_operand:HI 1 "reg_or_0_operand"))]
+  "optimize
+   && reload_completed
+   && (!AVR_HAVE_MOVW
+   || const0_rtx == operands[1])"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+  {
+operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 1);
+operands[3] = simplify_gen_subreg (QImode, operands[1], HImode, 1);
+operands[4] = simplify_gen_subreg (QImode, operands[0], HImode, 0);
+operands[5] = simplify_gen_subreg (QImode, operands[1], HImode, 0);
+  })
+

Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-25 Thread Jeff Law

On 11/25/2016 02:15 AM, Richard Biener wrote:


That's a good question ;)  The stage 3 definition has a loophole via
"go file a bug about feature X, then it's a bugfix!".
Right.  That loophole has existed since we've moved to the current model 
-- we extend a level of trust to our developers not to abuse the 
loophole.  I think that level of trust is warranted and hasn't been 
significantly violated.



I'm all open for a more sensible definition, like constraining the kind
of non-regression fixes that we want to allow, but I fear the most
sensible option would be to simply ditch the notion of different
"stages" and make it "general development" and "regression fixing".
(though if you try hard enough and go back in time you'll find that
almost all non-enhancement bugs are regressions in some sense)
Similarly, I'm always open for improvements.  My worry is if we went to 
development/regression bugfixing cycle, then non-regression bugs would 
largely be ignored.


General bugfixing is, IMHO, a good period -- it gets a larger portion of 
our developers fixing bugs and gives folks with a heavy review load a 
chance to flush out their queues of stuff that came in right at the end 
of stage1.


I'm not really pushing to open a "development cycle" discussion right 
now, but I do sense that our cycles could use some refinement.




And yes, current stage3 still feels too much like stage1 ;)
Hasn't seemed that way to me, but obviously experiences will differ.  My 
biggest worry about this cycle is the higher than typical (compared to 
the last few years) regression bug counts.


That worry is somewhat mitigated by the belief that we're marking 
regressions much much more consistently this year, so we're a lot less 
likely to get a big jump in marked regressions like we saw last year.


*If* that is the case (and my light poking around seems to indicate 
that's true), then we're likely ahead of the gcc-6 cycle, but behind the 
gcc-5 cycle.


jeff


Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-25 Thread Christophe Lyon
On 25 November 2016 at 15:53, Christophe Lyon
 wrote:
> Hi Tamar,
>
> On 24 November 2016 at 12:45, Tamar Christina  wrote:
>> Hi Christoph,
>>
>> I have combined most of the tests in p64_p128 except for the
>> vreinterpret_p128 and vreinterpret_p64 ones because I felt the number
>> of code that would be have to be added to p64_p128 vs having them in those
>> files isn't worth it. Since a lot of the test setup would have to be copied.
>>
>
> A few comments about this new version:
> * arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
> Can't you just add calls to CHECK_CRYPTO in the existing
> CHECK_RESULTS_NAMED_NO_FP16?
>
> * p64_p128:
> From what I can see ARM and AArch64 differ on the vceq variants
> available with poly64.
> For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
> poly64x1_t __b)
> For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just 
> noticed
> the other you submitted while I was writing this, where you add vceq_p64 for
> aarch64, but it still returns uint64_t.
> Why do you change the vceq_64 test to return poly64_t instead of uint64_t?
>
> Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?
>
> The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE tests
>
> You need to protect the new vmov, vget_high and vget_lane tests with
> #ifdef __aarch64__.
>

Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.


> Christophe
>
>> Kind regards,
>> Tamar
>> 
>> From: Tamar Christina
>> Sent: Tuesday, November 8, 2016 11:58:46 AM
>> To: Christophe Lyon
>> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
>> James Greenhalgh; Kyrylo Tkachov; nd
>> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
>> intrinsics to GCC
>>
>> Hi Christophe,
>>
>> Thanks for the review!
>>
>>>
>>> A while ago I added p64_p128.c, to contain all the poly64/128 tests except 
>>> for
>>> vreinterpret.
>>> Why do you need to create p64.c ?
>>
>> I originally created it because I had a much smaller set of intrinsics that 
>> I wanted to
>> add initially, this grew and It hadn't occurred to me that I can use the 
>> existing file now.
>>
>> Another reason was the effective-target arm_crypto_ok as you mentioned below.
>>
>>>
>>> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might 
>>> be
>>> easier to maintain than adding them to vcreate.c etc with several #ifdef
>>> conditions.
>>
>> Fair enough, I'll move them to p64_p128.c.
>>
>>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
>>> condition? These intrinsics are defined in arm/arm_neon.h, right?
>>> They are tested in p64_p128.c
>>
>> I should have looked for them, they weren't being tested before so I had
>> Mistakenly assumed that they weren't available. Now I realize I just need
>> To add the proper test option to the file to enable crypto. I'll update this 
>> as well.
>>
>>> Looking at your patch, it seems some tests are currently missing for arm:
>>> vget_high_p64. I'm not sure why I missed it when I removed neont-
>>> testgen...
>>
>> I'll adjust the test conditions so they run for ARM as well.
>>
>>>
>>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
>>> arm_crypto_ok prevent the tests from running on aarch64?
>>
>> Yes they do, I was comparing the output against a clean version and hasn't 
>> noticed
>> That they weren't running. Thanks!
>>
>>>
>>> Thanks,
>>>
>>> Christophe


Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-25 Thread Christophe Lyon
Hi Tamar,

On 24 November 2016 at 12:45, Tamar Christina  wrote:
> Hi Christoph,
>
> I have combined most of the tests in p64_p128 except for the
> vreinterpret_p128 and vreinterpret_p64 ones because I felt the number
> of code that would be have to be added to p64_p128 vs having them in those
> files isn't worth it. Since a lot of the test setup would have to be copied.
>

A few comments about this new version:
* arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
Can't you just add calls to CHECK_CRYPTO in the existing
CHECK_RESULTS_NAMED_NO_FP16?

* p64_p128:
>From what I can see ARM and AArch64 differ on the vceq variants
available with poly64.
For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
poly64x1_t __b)
For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just noticed
the other you submitted while I was writing this, where you add vceq_p64 for
aarch64, but it still returns uint64_t.
Why do you change the vceq_64 test to return poly64_t instead of uint64_t?

Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?

The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE tests

You need to protect the new vmov, vget_high and vget_lane tests with
#ifdef __aarch64__.

Christophe

> Kind regards,
> Tamar
> 
> From: Tamar Christina
> Sent: Tuesday, November 8, 2016 11:58:46 AM
> To: Christophe Lyon
> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
> James Greenhalgh; Kyrylo Tkachov; nd
> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
> intrinsics to GCC
>
> Hi Christophe,
>
> Thanks for the review!
>
>>
>> A while ago I added p64_p128.c, to contain all the poly64/128 tests except 
>> for
>> vreinterpret.
>> Why do you need to create p64.c ?
>
> I originally created it because I had a much smaller set of intrinsics that I 
> wanted to
> add initially, this grew and It hadn't occurred to me that I can use the 
> existing file now.
>
> Another reason was the effective-target arm_crypto_ok as you mentioned below.
>
>>
>> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might 
>> be
>> easier to maintain than adding them to vcreate.c etc with several #ifdef
>> conditions.
>
> Fair enough, I'll move them to p64_p128.c.
>
>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
>> condition? These intrinsics are defined in arm/arm_neon.h, right?
>> They are tested in p64_p128.c
>
> I should have looked for them, they weren't being tested before so I had
> Mistakenly assumed that they weren't available. Now I realize I just need
> To add the proper test option to the file to enable crypto. I'll update this 
> as well.
>
>> Looking at your patch, it seems some tests are currently missing for arm:
>> vget_high_p64. I'm not sure why I missed it when I removed neont-
>> testgen...
>
> I'll adjust the test conditions so they run for ARM as well.
>
>>
>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
>> arm_crypto_ok prevent the tests from running on aarch64?
>
> Yes they do, I was comparing the output against a clean version and hasn't 
> noticed
> That they weren't running. Thanks!
>
>>
>> Thanks,
>>
>> Christophe


[OBVIOUS] [PATCH] Fix documentation reference (PR web/71666)

2016-11-25 Thread Martin Liška
The patch is selecting the proper section where -fprofile-generate is 
documented.

Martin
>From 00a7b92ef9bc7158a3e7202deb9b18b8d95dd5d2 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 25 Nov 2016 15:11:48 +0100
Subject: [PATCH] Fix documentation reference (PR web/71666)

gcc/ChangeLog:

2016-11-25  Martin Liska  

	PR web/71666
	* doc/invoke.texi (-fprofile-use): Fix reference to a section
	where -fprofile-generate is documented.
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 34c7187..8a0cad7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8921,8 +8921,8 @@ which are generally profitable only with profile feedback available:
 @option{-ftree-vectorize}, and @option{ftree-loop-distribute-patterns}.
 
 Before you can use this option, you must first generate profiling information.
-@xref{Optimize Options}, for information about the @option{-fprofile-generate}
-option.
+@xref{Instrumentation Options}, for information about the
+@option{-fprofile-generate} option.
 
 By default, GCC emits an error message if the feedback profiles do not
 match the source code.  This error can be turned into a warning by using
-- 
2.10.2



Re: [PATCH] Fix ICE with -Wuninitialized (PR tree-optimization/78455)

2016-11-25 Thread Andreas Schwab
On Nov 23 2016, Christophe Lyon  wrote:

> Since you committed this patch (r242733), I've noticed a regression:
>   gcc.dg/uninit-pred-6_c.c warning (test for warnings, line 43)
>
> now fails on some arm targets, for instance arm-none-linux-gnueabihf
> --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16
>
> (or similarly on arm-none-eabi --with-cpu=cortex-m3 --with-mode=thumb)

Also fails on m68k.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH] Fix PR78515

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Martin Jambor wrote:

> Hi,
> 
> On Fri, Nov 25, 2016 at 12:01:38PM +0100, Richard Biener wrote:
> > 
> > I am testing the following to beat some sanity into 
> > compute_complex_assign_jump_func.
> 
> That the function does not handle ternary operations (did we have them
> since the beginning?) is clearly my fault and the patch is fine.

For quite some time indeed ;)

> Please commit it.

Will do after testing finished.

> > There's still that odd 'stmt2'
> > hanging around that gets set to sth else than stmt with
> > 
> >   op1 = gimple_assign_rhs1 (stmt);
> > 
> >   if (TREE_CODE (op1) == SSA_NAME)
> > {
> >   if (SSA_NAME_IS_DEFAULT_DEF (op1))
> > index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1));
> >   else
> > {
> >   index = load_from_param (fbi, info->descriptors,
> >SSA_NAME_DEF_STMT (op1));
> >   stmt2 = SSA_NAME_DEF_STMT (op1);
> > 
> > I assume that the original code wanted to restrict its processing
> > to unary RHS of 'stmt' but still this "skips" arbitrary unary
> > operations in 'stmt'?  But maybe I'm not understanding jump functions
> > here.  If we have
> > 
> >   _2 = -param_1(D);
> >   _3 = ~_2;
> > 
> > and stmt is _3 then we create a unary pass through JF with - (and the ~
> > gets lost?).
> >
> 
> It definitely looks like that.  Unary arithmetic jump functions have
> been added only recently with the IPA VRP propagation and I remember
> staring at the stmt2 thingy for a while during review but then
> apparently I forgot about it.
> 
> It seems to me that the check should refer to stmt, I will do that and
> see what breaks and how the IL looks at that point and then decide
> where to go from there.

it's the only use of stmt2 though, so it must have been added for some
reason... (maybe somebody wanted to handle simple copy-propagation?!).
I'd say rip it out and thus keep stmt2 == stmt.  There must be
a testcase added for this...

Richard.


Re: [PATCH] Tree-level fix for PR 69526

2016-11-25 Thread Richard Biener
On Wed, Nov 16, 2016 at 4:54 PM, Robin Dapp  wrote:
> Found some time to look into this again.
>
>> Index: tree-ssa-propagate.c
>> ===
>> --- tree-ssa-propagate.c(revision 240133)
>> +++ tree-ssa-propagate.c(working copy)
>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>/* Replace real uses in the statement.  */
>>did_replace |= replace_uses_in (stmt, get_value_fn);
>>
>> -  /* If we made a replacement, fold the statement.  */
>> -  if (did_replace)
>> +  /* Fold the statement.  */
>> +  if (fold_stmt (, follow_single_use_edges))
>> {
>> - fold_stmt (, follow_single_use_edges);
>> + did_replace = true;
>>   stmt = gsi_stmt (i);
>> }
>>
>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>> folding part
>> of your patch).
>
> This causes an ICE and bootstrap errors with newer revisions. I tested
> my patch on r240691 where it still works. How can this be done properly now?

Not sure, I'd have to investigate.  It should still just work
(throwing off a bootstrap
with just that change over the weekend).

>> OTOH given that you use this to decide whether you can use a combined 
>> constant
>> that doesn't look necessary (if the constant is correct, that is) --
>> what you need
>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>> original operation.  I think that always holds, thus the combine_ovf check 
>> looks
>> superfluous to me.
>
> Removed the check and addressed the other remarks.
>
>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
>> does not overflow.  But we do not really care for that, we want to know
>> whether (T)X + CST' might invoke undefined behavior when the original
>> expression did not.  This involves range information on X.  I don't
>> see how we ensure this here.
>
> I guess I'm still missing an undefined behavior case. In which case can
> (T)X + CST' trigger undefined behavior where the original statement did
> not? I see the need for checking in the second pattern ((T)(X) + CST' ->
> (T)(X + CST')), of course.

Looking at

+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (TREE_CODE (type) == INTEGER_TYPE &&
+   TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))

so the conversion (T) is widening or sign-changing.  (&& go to the next line)

If A + CST overflows we cannot do the transform (you check that
with extract_range_from_binary_expr setting 'range_split').

If A + CST does not overflow but is unsigned and we are just changing sign
(precision ==) then (T)A + (CST + CST) might overflow.  Consider
(int)(INT_MAX + 1) + 1 -> INT_MAX + 2.  I think here the important part
is whether A + CST fits in T for the case where we just change the type
to a type with !TYPE_OVERFLOW_WRAPS.  Certainly restricting to
widenings would avoid the issue.

>> But that's "easily fixable" by computing it in unsigned arithmetic, that is
>> doing
>>
>>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))
>
> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
> implementation-defined and not undefined so it should work?

Yes, implementation-defined beavior is fine.

> Revised patch version attached. One thing I'm still not sure about is
> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
> In the current patch version I always do a sign-extend of the first
> constant (UINT_MAX here) which seems to cause no problems in the
> testsuite and some custom tests still worked.

+ /* Sign-extend @1 to TYPE.  */
+ w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);

not sure why you do always sign-extend.  If the inner op is unsigned
and we widen then that's certainly bogus considering your UINT_MAX
example above.  Does

   w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));

not work for some reason?

+ /* Combine in outer, larger type.  */
+ bool combine_ovf = true;
+ combined_cst = wi::add (w1, w2, SIGNED, _ovf);

as you ignore combine_ovf you can simply use

  combined_cst = wi::add (w1, w2);

+ /* Convert combined constant to tree of outer type if
+there was no value range split in the original operation.  */
+ if (!range_split)
+   {

I'd say you want to condition on range_split early, like with

bool range_split;
if (TYPE_OVERFLOW_UNDEFINED (inner_type)
|| ! (extract_range_from_binary_expr (, _split), range_split))
  {

Re: [PATCH] Fix PR78515

2016-11-25 Thread Martin Jambor
Hi,

On Fri, Nov 25, 2016 at 12:01:38PM +0100, Richard Biener wrote:
> 
> I am testing the following to beat some sanity into 
> compute_complex_assign_jump_func.

That the function does not handle ternary operations (did we have them
since the beginning?) is clearly my fault and the patch is fine.
Please commit it.

> There's still that odd 'stmt2'
> hanging around that gets set to sth else than stmt with
> 
>   op1 = gimple_assign_rhs1 (stmt);
> 
>   if (TREE_CODE (op1) == SSA_NAME)
> {
>   if (SSA_NAME_IS_DEFAULT_DEF (op1))
> index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1));
>   else
> {
>   index = load_from_param (fbi, info->descriptors,
>SSA_NAME_DEF_STMT (op1));
>   stmt2 = SSA_NAME_DEF_STMT (op1);
> 
> I assume that the original code wanted to restrict its processing
> to unary RHS of 'stmt' but still this "skips" arbitrary unary
> operations in 'stmt'?  But maybe I'm not understanding jump functions
> here.  If we have
> 
>   _2 = -param_1(D);
>   _3 = ~_2;
> 
> and stmt is _3 then we create a unary pass through JF with - (and the ~
> gets lost?).
>

It definitely looks like that.  Unary arithmetic jump functions have
been added only recently with the IPA VRP propagation and I remember
staring at the stmt2 thingy for a while during review but then
apparently I forgot about it.

It seems to me that the check should refer to stmt, I will do that and
see what breaks and how the IL looks at that point and then decide
where to go from there.

Thanks,

Martin


Re: [PR 70965] Schedule extra pass_rebuild_cgraph_edges

2016-11-25 Thread Jan Hubicka
> On Thu, Nov 24, 2016 at 5:44 PM, Martin Jambor  wrote:
> > Hi,
> >
> > after discussing this with Honza, we have decided that scheduling an
> > extra pass_rebuild_cgraph_edges after pass_fixup_cfg is the correct
> > way to keep the cgraph consistent with gimple IL when early IPA passes
> > need it, such as is the case with the testcase in PR 70965.
> >
> > While needing an extra pass is never nice, this is a consequence of
> > splitting pass_build_ssa_passes out of early optimization passes so
> > that pass_chkp can be in between.
> >
> > The patch below fices the ICE in PR 70965 and has passed bootstrap and
> > testing on x86_64-linux.  OK for trunk?
> 
> Ok.
> 
> Richard.
> 
> > Thanks,
> >
> > Martin
> >
> >
> > 2016-11-24  Martin Jambor  
> >
> > gcc/
> > * passes.def (pass_build_ssa_passes): Add pass_rebuild_cgraph_edges.
> >
> > gcc/testsuite/
> > * g++.dg/pr70965.C: New test.
> > ---
> >  gcc/passes.def |  1 +
> >  gcc/testsuite/g++.dg/pr70965.C | 21 +
> >  2 files changed, 22 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/pr70965.C
> >
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 85a5af0..5faf17f 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> >NEXT_PASS (pass_build_ssa_passes);
> >PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes)
> >NEXT_PASS (pass_fixup_cfg);
> > +  NEXT_PASS (pass_rebuild_cgraph_edges);
> >NEXT_PASS (pass_build_ssa);
> >NEXT_PASS (pass_warn_nonnull_compare);
> >NEXT_PASS (pass_ubsan);

Actually you want to rebuild at the end of pass_build_ssa_passes passes queue.
This may still trip an ICE if one of passes bellow modify CFG (which 
pass_nothorw
does)

Path is OK with that change.
Honza
> > diff --git a/gcc/testsuite/g++.dg/pr70965.C b/gcc/testsuite/g++.dg/pr70965.C
> > new file mode 100644
> > index 000..d8a2c35
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pr70965.C
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -std=c++11" } */
> > +
> > +struct A {};
> > +struct B {};
> > +struct C { using p = int *; template  using ra = A; };
> > +struct J : C { template  struct K { typedef C::ra o; }; };
> > +template  struct D
> > +{
> > +  struct H : J::K::o { H (J::p, A) : J::K::o () {} };
> > +  H d;
> > +  D (const char *, const A  = A ()) : d (0, x) {}
> > +};
> > +extern template class D;
> > +enum L { M };
> > +struct F { virtual char *foo (); };
> > +template  struct I : B { static int foo (int) {} };
> > +struct G { typedef I t; };
> > +void foo (int) { G::t::foo (0); }
> > +void bar (const D &, const D &, int, L);
> > +void baz () try { foo (0); } catch (F ) { bar (e.foo (), "", 0, M); }
> > --
> > 2.10.2
> >


Pointer Bounds Checker and trailing arrays (PR68270)

2016-11-25 Thread Alexander Ivchenko
Hi,

The patch below addresses PR68270. could you please take a look?

2016-11-25  Alexander Ivchenko  

   * c-family/c.opt (flag_chkp_flexible_struct_trailing_arrays):
   Add new option.
   * tree-chkp.c (chkp_parse_array_and_component_ref): Forbid
   narrowing when chkp_parse_array_and_component_ref is used and
   the ARRAY_REF points to an array in the end of the struct.



diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7d8a726..e45d6a2 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1166,6 +1166,11 @@ C ObjC C++ ObjC++ LTO RejectNegative Report
Var(flag_chkp_narrow_to_innermost_ar
 Forces Pointer Bounds Checker to use bounds of the innermost arrays in case of
 nested static arryas access.  By default outermost array is used.

+fchkp-flexible-struct-trailing-arrays
+C ObjC C++ ObjC++ LTO RejectNegative Report
Var(flag_chkp_flexible_struct_trailing_arrays)
+Allow Pointer Bounds Checker to treat all trailing arrays in structures as
+possibly flexible.
+
 fchkp-optimize
 C ObjC C++ ObjC++ LTO Report Var(flag_chkp_optimize) Init(-1)
 Allow Pointer Bounds Checker optimizations.  By default allowed
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 2769682..40f99c3 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -3425,7 +3425,9 @@ chkp_parse_array_and_component_ref (tree node, tree *ptr,
   if (flag_chkp_narrow_bounds
   && !flag_chkp_narrow_to_innermost_arrray
   && (!last_comp
-  || chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1
+  || (chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1))
+  && !(flag_chkp_flexible_struct_trailing_arrays
+   && array_at_struct_end_p (var)
 {
   comp_to_narrow = last_comp;
   break;


[Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic

2016-11-25 Thread Janus Weil
Hi all,

here is a patch that fixes a rejects-valid problem with an
unlimited-polymorphic variable in a generic procedure. Removing the
line with UNLIMITED_POLY fixes the error and the rest of the patch is
just slightly refactoring the for-loop.

Regtested successfully on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2016-11-25  Janus Weil  

PR fortran/60853
* interface.c (gfc_compare_interfaces): Remove bad special case for
unlimited polymorphism. Refactor for loop.

2016-11-25  Janus Weil  

PR fortran/60853
* gfortran.dg/typebound_assignment_8.f90: New test case.
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (revision 242818)
+++ gcc/fortran/interface.c (working copy)
@@ -1728,11 +1728,9 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol
This is also done when comparing interfaces for dummy procedures and in
procedure pointer assignments.  */
 
-for (;;)
+for (; f1 || f2; f1 = f1->next, f2 = f2->next)
   {
/* Check existence.  */
-   if (f1 == NULL && f2 == NULL)
- break;
if (f1 == NULL || f2 == NULL)
  {
if (errmsg != NULL)
@@ -1741,9 +1739,6 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol
return 0;
  }
 
-   if (UNLIMITED_POLY (f1->sym))
- goto next;
-
if (strict_flag)
  {
/* Check all characteristics.  */
@@ -1772,9 +1767,6 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol
return 0;
  }
  }
-next:
-   f1 = f1->next;
-   f2 = f2->next;
   }
 
   return 1;
! { dg-do compile }
!
! PR 60853: [OOP] Failure to disambiguate generic with unlimited polymorphic
!
! Contributed by tlcclt 

module foo_mod
   implicit none

   type Vector
   contains
  procedure :: copyFromScalar
  procedure :: copyFromArray
  generic :: assignment(=) => copyFromScalar, copyFromArray
   end type

contains

   subroutine copyFromScalar(this, scalar)
  class (Vector), intent(inout) :: this
  type  (Vector), intent(in) :: scalar
   end subroutine

   subroutine copyFromArray(this, array)
  class (Vector), intent(inout) :: this
  class (*), intent(in) :: array(:)
   end subroutine

end module


Re: [Patch, fortran] PR78293 - - [5/6/7 Regression] SIGABRT with function result used as argument

2016-11-25 Thread Paul Richard Thomas
Dear All,

Since both Andre and I have taken a good look at this rather small
patch, I decided to commit it as revision 242875.

My movements are such that I will have to hold off on 5- and
6-branches until the week after next.

Best regards

Paul

PS Andre, I wasn't trying to make you feel bad but was making fun of myself :-)


On 25 November 2016 at 13:00, Andre Vehreschild  wrote:
> Hi Thomas,
>
>> Andre put me to shame with a devastatingly simple replacement for a
>> horribly complicated and wrong patch that I was getting into.
>
> I did not mean to. I happened to work in the same area and the PR's 
> description
> rang a whole chorus of bells what might have been going wrong.
>
>> The part of the patch in trans-expr.c fixes the PR and the part in
>> trans-stmt.c fixes a memory leak in function 'tt'. This latter fixes
>
> I propose to change the comment in the second chunk (trans-stmt.c) from:
>
> - /* Deallocate any allocatable components after all the allocations
> -and assignments of expr3 have been completed.  */
>
> to
>
> +   /* Deallocate any allocatable components in expressions that use a
> +  temporary, i.e. are not of expr-type EXPR_VARIABLE or force the
> +  use of a temporary, after the assignment of expr3 is completed.  */
>
> Mind that the indentation is corrupted by my mailer. Please polish the comment
> a bit. It feels awkward.
>
>> half of the memory leaks in class_array_15.f03. I have noted the rest
>> of this problem in PR38319 with which it is associated.
>>
>> Bootstraps and regtests on FC21/x86_64 - OK for trunk and, later
>> 5-branch and 6-branch?
>
> Ok'ing a patch I participated in writing does not feel correct. So not doing 
> it.
> When committing please add yourself also to the Changelog in gcc/fortran to
> prevent confusion when this patch causes regressions. You are as much
> responsible for fixing it as me.
>
> Regards,
> Andre
>
>> Cheers
>>
>> Paul
>>
>> 2016-11-24  Andre Vehreschild  
>>
>> PR fortran/78293
>> * trans-expr.c (gfc_conv_procedure_call): Prepend deallocation
>> of alloctable components to post, rather than adding to
>> se->post.
>> * trans-stmt.c (gfc_trans_allocate): Move deallocation of expr3
>> allocatable components so that all expr3s are visited.
>>
>> 2016-11-24  Paul Thomas  
>>
>> PR fortran/78293
>> * gfortran.dg/allocatable_function_10.f90: New test.
>> * gfortran.dg/class_array_15.f03: Increase builtin_free count
>> from 11 to 12.
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2016-11-25 Thread Tamar Christina
Hi Joseph,

I have updated the patch with the changes,
w.r.t to the formatting, there are tabs there that seem to be rendered
at 4 spaces wide. In my editor setup at 8 spaces they are correct.

Kind Regards,
Tamar

From: Joseph Myers 
Sent: Thursday, November 24, 2016 6:28:18 PM
To: Tamar Christina
Cc: GCC Patches; Wilco Dijkstra; rguent...@suse.de; l...@redhat.com; Michael 
Meissner; nd
Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers 
in GIMPLE.

On Thu, 24 Nov 2016, Tamar Christina wrote:

> @@ -11499,6 +11503,53 @@ to classify.  GCC treats the last argument as 
> type-generic, which
>  means it does not do default promotion from float to double.
>  @end deftypefn
>
> +@deftypefn {Built-in Function} int __builtin_isnan (...)
> +This built-in implements the C99 isnan functionality which checks if
> +the given argument represents a NaN.  The return value of the
> +function will either be a 0 (false) or a 1 (true).
> +On most systems, when an IEEE 754 floating point is used this
> +built-in does not produce a signal when a signaling NaN is used.

"an IEEE 754 floating point" should probably be "an IEEE 754
floating-point type" or similar.

> +GCC treats the argument as type-generic, which means it does
> +not do default promotion from float to double.

I think type names such as float and double should use @code{} in the
manual.

> +the given argument represents an Infinite number.  The return

Infinite should not have a capital letter there.

> +@deftypefn {Built-in Function} int __builtin_iszero (...)
> +This built-in implements the C99 iszero functionality which checks if

This isn't C99, it's TS 18661-1:2014.

> +the given argument represents the number 0.  The return

0 or -0.

> +@deftypefn {Built-in Function} int __builtin_issubnormal (...)
> +This built-in implements the C99 issubnormal functionality which checks if

Again, TS 18661-1.

> +the given argument represents a sub-normal number.  The return

Do not hyphenate subnormal.

> + switch (DECL_FUNCTION_CODE (decl))
> + {
> + case BUILT_IN_SETJMP:
> + lower_builtin_setjmp (gsi);
> + data->cannot_fallthru = false;
> + return;

The indentation in this whole block of code (not all quoted) is wrong.

> +real_inf();

Missing space before '('.

> +  emit_tree_cond (, dest, done_label,
> +   is_normal(, arg, loc), fp_normal);
> +  emit_tree_cond (, dest, done_label,
> +   is_zero(, arg, loc), fp_zero);
> +  emit_tree_cond (, dest, done_label,
> +   is_nan(, arg, loc), fp_nan);
> +  emit_tree_cond (, dest, done_label,
> +   is_infinity(, arg, loc), fp_infinite);

Likewise.

> +   fndecl(, arg, loc), t_true);

Likewise.

--
Joseph S. Myers
jos...@codesourcery.com
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3ac2d44148440b124559ba7cd3de483b7a74b72d..2340f60edb31ebf964367699aaf33ac8401dff41 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -160,7 +160,6 @@ static tree fold_builtin_0 (location_t, tree);
 static tree fold_builtin_1 (location_t, tree, tree);
 static tree fold_builtin_2 (location_t, tree, tree, tree);
 static tree fold_builtin_3 (location_t, tree, tree, tree, tree);
-static tree fold_builtin_varargs (location_t, tree, tree*, int);
 
 static tree fold_builtin_strpbrk (location_t, tree, tree, tree);
 static tree fold_builtin_strstr (location_t, tree, tree, tree);
@@ -2202,19 +2201,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   switch (DECL_FUNCTION_CODE (fndecl))
 {
 CASE_FLT_FN (BUILT_IN_ILOGB):
-  errno_set = true; builtin_optab = ilogb_optab; break;
-CASE_FLT_FN (BUILT_IN_ISINF):
-  builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
-case BUILT_IN_ISFINITE:
-CASE_FLT_FN (BUILT_IN_FINITE):
-case BUILT_IN_FINITED32:
-case BUILT_IN_FINITED64:
-case BUILT_IN_FINITED128:
-case BUILT_IN_ISINFD32:
-case BUILT_IN_ISINFD64:
-case BUILT_IN_ISINFD128:
-  /* These builtins have no optabs (yet).  */
+  errno_set = true;
+  builtin_optab = ilogb_optab;
   break;
 default:
   gcc_unreachable ();
@@ -2233,8 +2221,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 }
 
 /* Expand a call to one of the builtin math functions that operate on
-   floating point argument and output an integer result (ilogb, isinf,
-   isnan, etc).
+   floating point argument and output an integer result (ilogb, etc).
Return 0 if a normal call should be emitted rather than expanding the
function in-line.  EXP is the expression that is a call to the builtin
function; if convenient, the result should be placed in TARGET.  */
@@ -5997,11 +5984,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 CASE_FLT_FN (BUILT_IN_ILOGB):
   if (! flag_unsafe_math_optimizations)
 	break;
-  

Re: [Patch, fortran] PR78293 - - [5/6/7 Regression] SIGABRT with function result used as argument

2016-11-25 Thread Andre Vehreschild
Hi Thomas,

> Andre put me to shame with a devastatingly simple replacement for a
> horribly complicated and wrong patch that I was getting into.

I did not mean to. I happened to work in the same area and the PR's description
rang a whole chorus of bells what might have been going wrong.

> The part of the patch in trans-expr.c fixes the PR and the part in
> trans-stmt.c fixes a memory leak in function 'tt'. This latter fixes

I propose to change the comment in the second chunk (trans-stmt.c) from:

- /* Deallocate any allocatable components after all the allocations
-and assignments of expr3 have been completed.  */

to

+   /* Deallocate any allocatable components in expressions that use a
+  temporary, i.e. are not of expr-type EXPR_VARIABLE or force the
+  use of a temporary, after the assignment of expr3 is completed.  */

Mind that the indentation is corrupted by my mailer. Please polish the comment
a bit. It feels awkward.

> half of the memory leaks in class_array_15.f03. I have noted the rest
> of this problem in PR38319 with which it is associated.
> 
> Bootstraps and regtests on FC21/x86_64 - OK for trunk and, later
> 5-branch and 6-branch?

Ok'ing a patch I participated in writing does not feel correct. So not doing it.
When committing please add yourself also to the Changelog in gcc/fortran to
prevent confusion when this patch causes regressions. You are as much
responsible for fixing it as me.

Regards,
Andre

> Cheers
> 
> Paul
> 
> 2016-11-24  Andre Vehreschild  
> 
> PR fortran/78293
> * trans-expr.c (gfc_conv_procedure_call): Prepend deallocation
> of alloctable components to post, rather than adding to
> se->post.
> * trans-stmt.c (gfc_trans_allocate): Move deallocation of expr3
> allocatable components so that all expr3s are visited.
> 
> 2016-11-24  Paul Thomas  
> 
> PR fortran/78293
> * gfortran.dg/allocatable_function_10.f90: New test.
> * gfortran.dg/class_array_15.f03: Increase builtin_free count
> from 11 to 12.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).

2016-11-25 Thread Richard Biener
On Fri, Nov 25, 2016 at 12:20 PM, Bin.Cheng  wrote:
> On Fri, Nov 25, 2016 at 8:23 AM, Richard Biener
>  wrote:
>> On Thu, Nov 24, 2016 at 4:22 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch fixes two issues in newly introduced pattern: (cond (cmp 
>>> (convert1? @1) @3) (convert2? @1) @2).
>>> For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, 
>>> this patch adds check for that.
>>> Note we don't check c1_type/c2_type for that because it's guarded by 
>>> INTEGER_CST@.
>>> For PR78510, the ICE is covered by revision 242831, but underlying issue is 
>>> when the block of conditions
>>> is not satisfied, the last case simplification is applied anyway since code 
>>> is initialized to cmp at the
>>> first place.  This patch fixes that by setting code to proper value only 
>>> when transformation is valid, it
>>> also incorporates Marc's suggestion by using cmp directly.  Two tests are 
>>> added too.
>>> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?
>>
>> Ok.
>>
>> Btw,
>>
>> -  (cond (cmp@0 (convert1? @1) ...
>> ...
>> - enum tree_code code = TREE_CODE (@0)...
>>
>> shows another misconception (I didn't notice...) in that TREE_CODE of a 
>> captured
>> expression results in the expression code.  On GENERIC that works but on 
>> GIMPLE
>> @0 will be a SSA_NAME and thus 'code' was SSA_NAME ...
> Yes, that was a mistake.
> Patch updated by including new test from duplicate PR78517.  Is it OK?

Yes.

Richard.

> Thanks,
> bin
>
> 2016-11-24  Bin Cheng  
>
> PR middle-end/78507
> PR middle-end/78510
> PR middle-end/78517
> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use
> cmp directly, rather than cmp_code.  Initialize code to ERROR_MARK
> and set it to result code if transformation is valid.  Use code EQ
> directly in last simplification case.
>
> gcc/testsuite/ChangeLog
> 2016-11-24  Bin Cheng  
>
> PR middle-end/78507
> PR middle-end/78510
> PR middle-end/78517
> * g++.dg/torture/pr78507.C: New test.
> * gcc.dg/torture/pr78510.c: New test.
> * gcc.dg/torture/pr78517.c: New test.
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>>
>>> 2016-11-24  Bin Cheng  
>>>
>>> PR middle-end/78507
>>> PR middle-end/78510
>>> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use
>>> cmp directly, rather than cmp_code.  Initialize code to ERROR_MARK
>>> and set it to result code if transformation is valid.  Use code EQ
>>> directly in last simplification case.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2016-11-24  Bin Cheng  
>>>
>>> PR middle-end/78507
>>> PR middle-end/78510
>>> * g++.dg/torture/pr78507.C: New test.
>>> * gcc.dg/torture/pr78510.c: New test.


Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2016-11-25 Thread Ramana Radhakrishnan
On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
 wrote:
> Hi!
>
> This improves the stack usage on the sha512 test case for the case
> without hardware fpu and without iwmmxt by splitting all di-mode
> patterns right while expanding which is similar to what the shift-pattern
> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well as
> thumb1.
>

I would go further and do this in the absence of Neon, the VFP unit
being there doesn't help with DImode operations i.e. we do not have 64
bit integer arithmetic instructions without Neon. The main reason why
we have the DImode patterns split so late is to give a chance for
folks who want to do 64 bit arithmetic in Neon a chance to make this
work as well as support some of the 64 bit Neon intrinsics which IIRC
map down to these instructions. Doing this just for soft-float doesn't
improve the default case only. I don't usually test iwmmxt and I'm not
sure who has the ability to do so, thus keeping this restriction for
iwMMX is fine.


> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>
> Note this also splits many ldrd/strd instructions and therefore I will
> post a followup-patch that mitigates this effect by enabling the ldrd/strd
> peephole optimization after the necessary reg-testing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.

What do you mean by arm-linux-gnueabihf - when folks say that I
interpret it as --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 or (--with-fpu=neon).

If you've really bootstrapped and regtested it on armhf, doesn't this
patch as it stand have no effect there i.e. no change ?
arm-linux-gnueabihf usually means to me someone has configured with
--with-float=hard, so there are no regressions in the hard float ABI
case,


Ramana

> Is it OK for trunk?
>
>
> Thanks
> Bernd.


Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).

2016-11-25 Thread Bin.Cheng
On Fri, Nov 25, 2016 at 8:23 AM, Richard Biener
 wrote:
> On Thu, Nov 24, 2016 at 4:22 PM, Bin Cheng  wrote:
>> Hi,
>> This patch fixes two issues in newly introduced pattern: (cond (cmp 
>> (convert1? @1) @3) (convert2? @1) @2).
>> For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, 
>> this patch adds check for that.
>> Note we don't check c1_type/c2_type for that because it's guarded by 
>> INTEGER_CST@.
>> For PR78510, the ICE is covered by revision 242831, but underlying issue is 
>> when the block of conditions
>> is not satisfied, the last case simplification is applied anyway since code 
>> is initialized to cmp at the
>> first place.  This patch fixes that by setting code to proper value only 
>> when transformation is valid, it
>> also incorporates Marc's suggestion by using cmp directly.  Two tests are 
>> added too.
>> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?
>
> Ok.
>
> Btw,
>
> -  (cond (cmp@0 (convert1? @1) ...
> ...
> - enum tree_code code = TREE_CODE (@0)...
>
> shows another misconception (I didn't notice...) in that TREE_CODE of a 
> captured
> expression results in the expression code.  On GENERIC that works but on 
> GIMPLE
> @0 will be a SSA_NAME and thus 'code' was SSA_NAME ...
Yes, that was a mistake.
Patch updated by including new test from duplicate PR78517.  Is it OK?

Thanks,
bin

2016-11-24  Bin Cheng  

PR middle-end/78507
PR middle-end/78510
PR middle-end/78517
* match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use
cmp directly, rather than cmp_code.  Initialize code to ERROR_MARK
and set it to result code if transformation is valid.  Use code EQ
directly in last simplification case.

gcc/testsuite/ChangeLog
2016-11-24  Bin Cheng  

PR middle-end/78507
PR middle-end/78510
PR middle-end/78517
* g++.dg/torture/pr78507.C: New test.
* gcc.dg/torture/pr78510.c: New test.
* gcc.dg/torture/pr78517.c: New test.

>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-11-24  Bin Cheng  
>>
>> PR middle-end/78507
>> PR middle-end/78510
>> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use
>> cmp directly, rather than cmp_code.  Initialize code to ERROR_MARK
>> and set it to result code if transformation is valid.  Use code EQ
>> directly in last simplification case.
>>
>> gcc/testsuite/ChangeLog
>> 2016-11-24  Bin Cheng  
>>
>> PR middle-end/78507
>> PR middle-end/78510
>> * g++.dg/torture/pr78507.C: New test.
>> * gcc.dg/torture/pr78510.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index 3aa27fa..2d4e019 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1971,14 +1971,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).  */
 (for cmp (lt le gt ge eq)
  (simplify
-  (cond (cmp@0 (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
+  (cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
   (with
{
  tree from_type = TREE_TYPE (@1);
  tree c1_type = TREE_TYPE (@3), c2_type = TREE_TYPE (@2);
- enum tree_code code = TREE_CODE (@0), cmp_code = TREE_CODE (@0);
+ enum tree_code code = ERROR_MARK;
 
- if (int_fits_type_p (@2, from_type)
+ if (INTEGRAL_TYPE_P (from_type)
+&& int_fits_type_p (@2, from_type)
 && (types_match (c1_type, from_type)
 || (TYPE_PRECISION (c1_type) > TYPE_PRECISION (from_type)
 && (TYPE_UNSIGNED (from_type)
@@ -1988,37 +1989,38 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && (TYPE_UNSIGNED (from_type)
 || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type)
{
-if (code != EQ_EXPR)
+if (cmp != EQ_EXPR)
   {
 if (wi::to_widest (@3) == (wi::to_widest (@2) - 1))
   {
 /* X <= Y - 1 equals to X < Y.  */
-if (cmp_code == LE_EXPR)
+if (cmp == LE_EXPR)
   code = LT_EXPR;
 /* X > Y - 1 equals to X >= Y.  */
-if (cmp_code == GT_EXPR)
+if (cmp == GT_EXPR)
   code = GE_EXPR;
   }
 if (wi::to_widest (@3) == (wi::to_widest (@2) + 1))
   {
 /* X < Y + 1 equals to X <= Y.  */
-if (cmp_code == LT_EXPR)
+if (cmp == LT_EXPR)
   code = LE_EXPR;
 /* X >= Y + 1 equals to X > Y.  */
-if (cmp_code == GE_EXPR)
+if (cmp == GE_EXPR)
   code = GT_EXPR;
   }
-if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3))
+if (code != ERROR_MARK
+|| wi::to_widest (@2) == 

Re: [PATCH] combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)

2016-11-25 Thread Paolo Bonzini


On 24/11/2016 15:01, Michael Matz wrote:
>> > (set (reg:DI 69)
>> > (zero_extract:DI (reg:DI 65 [ v_x ])   <-
>> > (const_int 32 [0x20])
>> > (const_int 20 [0x14])))
> Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid 
> when the top 20 bits of v_x are known to be zero already, or alternatively 
> if it's know that the top 32bits of reg 69 won't matter in the future.  I 
> wonder where that knowledge is coming from.

s390 is BITS_BIG_ENDIAN, so zext(v_x,32,20) really means the same as
zext(v_x,32,12) with the more common convention.

Insn 4 and 5 respectively set the low and high 32-bits of (reg:DI 65),
so the transform seems fine to me.

Paolo


[PATCH] Fix PR78515

2016-11-25 Thread Richard Biener

I am testing the following to beat some sanity into 
compute_complex_assign_jump_func.  There's still that odd 'stmt2'
hanging around that gets set to sth else than stmt with

  op1 = gimple_assign_rhs1 (stmt);

  if (TREE_CODE (op1) == SSA_NAME)
{
  if (SSA_NAME_IS_DEFAULT_DEF (op1))
index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1));
  else
{
  index = load_from_param (fbi, info->descriptors,
   SSA_NAME_DEF_STMT (op1));
  stmt2 = SSA_NAME_DEF_STMT (op1);

I assume that the original code wanted to restrict its processing
to unary RHS of 'stmt' but still this "skips" arbitrary unary
operations in 'stmt'?  But maybe I'm not understanding jump functions
here.  If we have

  _2 = -param_1(D);
  _3 = ~_2;

and stmt is _3 then we create a unary pass through JF with - (and the ~
gets lost?).

Anyway, the following is a step in the right direction -- if you
want to address the above (maybe with some comments) you can
happily take the patch from here, otherwise we can followup this
patch.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2016-11-25  Richard Biener  

PR ipa/78515
* ipa-prop.c (compute_complex_assign_jump_func): Properly identify
unary, binary and single RHSs.
* tree.def (BIT_INSERT_EXPR): Adjust tree code name.

* gcc.dg/torture/pr78515.c: New testcase.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 90c19fc..642111d 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1177,29 +1177,37 @@ compute_complex_assign_jump_func (struct 
ipa_func_body_info *fbi,
 
   if (index >= 0)
 {
-  tree op2 = gimple_assign_rhs2 (stmt);
-
-  if (op2)
+  switch (gimple_assign_rhs_class (stmt))
{
- if (!is_gimple_ip_invariant (op2)
- || (TREE_CODE_CLASS (gimple_expr_code (stmt)) != tcc_comparison
- && !useless_type_conversion_p (TREE_TYPE (name),
-TREE_TYPE (op1
-   return;
-
- ipa_set_jf_arith_pass_through (jfunc, index, op2,
-gimple_assign_rhs_code (stmt));
-   }
-  else if (gimple_assign_single_p (stmt))
-   {
- bool agg_p = parm_ref_data_pass_through_p (fbi, index, call, tc_ssa);
- ipa_set_jf_simple_pass_through (jfunc, index, agg_p);
+   case GIMPLE_BINARY_RHS:
+ {
+   tree op2 = gimple_assign_rhs2 (stmt);
+   if (!is_gimple_ip_invariant (op2)
+   || ((TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+!= tcc_comparison)
+   && !useless_type_conversion_p (TREE_TYPE (name),
+  TREE_TYPE (op1
+ return;
+
+   ipa_set_jf_arith_pass_through (jfunc, index, op2,
+  gimple_assign_rhs_code (stmt));
+   break;
+ }
+   case GIMPLE_SINGLE_RHS:
+ {
+   bool agg_p = parm_ref_data_pass_through_p (fbi, index, call,
+  tc_ssa);
+   ipa_set_jf_simple_pass_through (jfunc, index, agg_p);
+   break;
+ }
+   case GIMPLE_UNARY_RHS:
+ if (is_gimple_assign (stmt2)
+ && gimple_assign_rhs_class (stmt2) == GIMPLE_UNARY_RHS
+ && ! CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)))
+   ipa_set_jf_unary_pass_through (jfunc, index,
+  gimple_assign_rhs_code (stmt2));
+   default:;
}
-  else if (is_gimple_assign (stmt2)
-  && (gimple_expr_code (stmt2) != NOP_EXPR)
-  && (TREE_CODE_CLASS (gimple_expr_code (stmt2)) == tcc_unary))
-   ipa_set_jf_unary_pass_through (jfunc, index,
-  gimple_assign_rhs_code (stmt2));
   return;
 }
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr78515.c 
b/gcc/testsuite/gcc.dg/torture/pr78515.c
new file mode 100644
index 000..d700db5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr78515.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512bw" { target x86_64-*-* i?86-*-* } } */
+
+typedef unsigned V __attribute__ ((vector_size (64)));
+
+V g;
+
+static V
+baz (V u, V v)
+{
+  g += u;
+  return v + g + 1;
+}
+
+static V
+bar (V u)
+{
+  u[0] = 0;
+  return baz(u, (V){});
+}
+
+V
+foo ()
+{
+  return (V){bar((V){})[0]};
+}
diff --git a/gcc/tree.def b/gcc/tree.def
index 2c35540..e093307 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -865,7 +865,7 @@ DEFTREECODE (FDESC_EXPR, "fdesc_expr", tcc_expression, 2)
introducing a quaternary operation.
The replaced bits shall be fully inside the container.  If the container
is of vector type, then these bits shall be aligned with its elements.  */
-DEFTREECODE (BIT_INSERT_EXPR, 

Re: [PATCH] Fix Ada bootstrap

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Eric Botcazou wrote:

> > Martin reported a profiledbootstrap issue with a SSA coalescing issue
> > with a
> > 
> >   return _17(ab);
> 
> OK.
> 
> > Yeah, seeing this one as well.
> 
> It fails at -O0 with just:
> 
> eric@polaris:~/build/gcc/native> cat p.adb
> procedure P is
> 
>   subtype Char is Character range 'W' .. 'Z';
> 
>   type Arr is array (Char range <>) of Integer;
> 
>   type Rec (D : Char) is record
> A  : Arr (D .. 'W');
>   end record;
> 
> begin
>   null;
> end;
> eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet p.adb
> +===GNAT BUG DETECTED==+
> | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) 
> GCC error:|
> | in size_binop_loc, at fold-const.c:1744  |
> | Error detected at p.adb:8:20 
> 
> so I presume this comes from the recent match.pd changes.

Could be - please open a PR.

Thanks,
Richard.


Re: [PATCH] Don't use priority {cd}tors if not supported by a target (PR, gcov-profile/78086)

2016-11-25 Thread Jan Hubicka
> Hi.
> 
> Using priority {cd}tors on a target that does not support that can cause 
> failures (see the PR).
> Apart from that, I decided to use priority 100 for both gcov_init and 
> gcov_exit functions as
> the reserved range includes priority 100. Moreover, I enhanced test-cases we 
> have.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From 05a0dcb13d608facdd1c85f4101cd821634d07cd Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 26 Oct 2016 12:50:35 +0200
> Subject: [PATCH] Don't use priority {cd}tors if not supported by a target (PR
>  gcov-profile/78086)
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-10-26  Martin Liska  
> 
>   * g++.dg/gcov/pr16855.C: Clean up the test case.
>   * g++.dg/gcov/pr16855-priority.C: New test.
> 
> gcc/ChangeLog:
> 
> 2016-10-26  Martin Liska  
> 
>   * coverage.c (build_init_ctor): Don't use priority {cd}tors if
>   not supported by a target.  Set priority to 100 if possible.
>   (build_gcov_exit_decl): Likewise.
OK,
thanks!
Honza


[AArch64][GCC][PATCH] Add more Poly64_t intrinsics to GCC

2016-11-25 Thread Tamar Christina
Hi all,

This adds the following NEON intrinsics
to the Aarch64 back-end of GCC:

* vsriq_n_p64
* vsri_n_p64
* vextq_p64
* vext_p64
* vceq_p64
* vbslq_p64
* vbsl_p64

Added new tests for these and ran regression tests on aarch64-none-linux-gnu
and on arm-none-linux-gnueabihf. Tests added in other patch series.

Ok for trunk?

Thanks,
Tamar

gcc/
2016-11-25  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(vsriq_n_p64, vsri_n_p64): Added poly type.
(vextq_p64, vext_p64): Likewise.
(vceq_p64, vbslq_p64, vbsl_p64): Likewise.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index e1154b4b27820c0075d9a9edb4f8b48ef4f06b07..49efeea6f90cf8535aec4b9287bc9b30b7b79e60 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -429,6 +429,7 @@
 
   /* Implemented by aarch64_simd_bsl.  */
   BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
+  VAR2 (BSL_P, simd_bsl,0, di, v2di)
   BUILTIN_VSDQ_I_DI (BSL_U, simd_bsl, 0)
   BUILTIN_VALLDIF (BSL_S, simd_bsl, 0)
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index c463e3b698a47b9b5c5a04e0fb7fff1f71817af1..ddaaa4f8c5615b979df8f765760c41c8e158fba1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10164,6 +10164,19 @@ vrsqrteq_u32 (uint32x4_t a)
result;  \
  })
 
+#define vsri_n_p64(a, b, c)		\
+  __extension__\
+({	\
+   poly64x1_t b_ = (b);		\
+   poly64x1_t a_ = (a);		\
+   poly64x1_t result;		\
+   __asm__ ("sri %d0,%d2,%3"	\
+		: "=w"(result)		\
+		: "0"(a_), "w"(b_), "i"(c)\
+		: /* No clobbers.  */);	\
+   result;\
+ })
+
 #define vsriq_n_p8(a, b, c) \
   __extension__ \
 ({  \
@@ -10190,6 +10203,19 @@ vrsqrteq_u32 (uint32x4_t a)
result;  \
  })
 
+#define vsriq_n_p64(a, b, c)		\
+  __extension__\
+({	\
+   poly64x2_t b_ = (b);		\
+   poly64x2_t a_ = (a);		\
+   poly64x2_t result;		\
+   __asm__ ("sri %0.2d,%2.2d,%3"	\
+		: "=w"(result)		\
+		: "0"(a_), "w"(b_), "i"(c)\
+		: /* No clobbers.  */);	\
+   result;\
+ })
+
 __extension__ extern __inline uint8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vtst_p8 (poly8x8_t a, poly8x8_t b)
@@ -11320,6 +11346,13 @@ vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c)
 {
   return __builtin_aarch64_simd_bslv4hi_pupp (__a, __b, __c);
 }
+__extension__ extern __inline poly64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbsl_p64 (uint64x1_t __a, poly64x1_t __b, poly64x1_t __c)
+{
+  return (poly64x1_t)
+  {__builtin_aarch64_simd_bsldi_pupp (__a[0], __b[0], __c[0])};
+}
 
 __extension__ extern __inline int8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11428,6 +11461,13 @@ vbslq_s16 (uint16x8_t __a, int16x8_t __b, int16x8_t __c)
   return __builtin_aarch64_simd_bslv8hi_suss (__a, __b, __c);
 }
 
+__extension__ extern __inline poly64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbslq_p64 (uint64x2_t __a, poly64x2_t __b, poly64x2_t __c)
+{
+  return __builtin_aarch64_simd_bslv2di_pupp (__a, __b, __c);
+}
+
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vbslq_s32 (uint32x4_t __a, int32x4_t __b, int32x4_t __c)
@@ -11959,6 +11999,13 @@ vceq_p8 (poly8x8_t __a, poly8x8_t __b)
   return (uint8x8_t) (__a == __b);
 }
 
+__extension__ extern __inline uint64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vceq_p64 (poly64x1_t __a, poly64x1_t __b)
+{
+  return (uint64x1_t) (__a == __b);
+}
+
 __extension__ extern __inline uint8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vceq_s8 (int8x8_t __a, int8x8_t __b)
@@ -15620,6 +15667,15 @@ vext_p16 (poly16x4_t __a, poly16x4_t __b, __const int __c)
 #endif
 }
 
+__extension__ extern __inline poly64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vext_p64 (poly64x1_t __a, poly64x1_t __b, __const int __c)
+{
+  __AARCH64_LANE_CHECK (__a, __c);
+  /* The only possible index to the assembler instruction returns element 0.  */
+  return __a;
+}
+
 __extension__ extern __inline int8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vext_s8 (int8x8_t __a, int8x8_t __b, __const int __c)
@@ -15788,6 +15844,18 @@ vextq_p16 (poly16x8_t __a, poly16x8_t __b, __const int __c)
 #endif
 }
 
+__extension__ extern __inline poly64x2_t
+__attribute__ 

Re: [PATCH] Fix Ada bootstrap

2016-11-25 Thread Eric Botcazou
> Martin reported a profiledbootstrap issue with a SSA coalescing issue
> with a
> 
>   return _17(ab);

OK.

> Yeah, seeing this one as well.

It fails at -O0 with just:

eric@polaris:~/build/gcc/native> cat p.adb
procedure P is

  subtype Char is Character range 'W' .. 'Z';

  type Arr is array (Char range <>) of Integer;

  type Rec (D : Char) is record
A  : Arr (D .. 'W');
  end record;

begin
  null;
end;
eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet p.adb
+===GNAT BUG DETECTED==+
| 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) 
GCC error:|
| in size_binop_loc, at fold-const.c:1744  |
| Error detected at p.adb:8:20 

so I presume this comes from the recent match.pd changes.

-- 
Eric Botcazou


Re: [GCC][AArch64][PATCH][Testsuite] Fix failing test vector_initialization_nostack.c

2016-11-25 Thread James Greenhalgh
On Mon, Nov 07, 2016 at 02:05:27PM +, Tamar Christina wrote:
> Hi all,
> 
> This fixes (PR78142) by turning off scheduling for the test.
> r241590 is causing more registers to be used and so
> the SP registered happens to be picked and used.
> 
> This test I believe was checking explicitly that the
> SP is not used if not needed.  

I can't say I'm convinced by the fix - turning off passes to work around
a failure suggests a problem with the test, and turning off the pass reduces
coverage of what we're actually trying to test for.

Can the test be rewritten to only require construction of one vector in
the char case?

Thanks,
James

> gcc/testsuite/
> 
> 2016-11-07  Tamar Christina  
> 
>   PR middle-end/78142
>   * gcc.target/aarch64/vector_initialization_nostack.c
>   (dg-options): Disabled scheduling.

> diff --git a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c 
> b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c
> index 
> bbad04d00263b6a91b826b4911af92bdd226c821..71699281c5ce79fb5cf37e47b8ba078721c19f3a
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */
> +/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model 
> -fno-schedule-insns" } */
>  float arr_f[100][100];
>  float
>  f9 (void)



Re: [PATCH] Fix PR78343

2016-11-25 Thread Richard Biener
On Thu, 24 Nov 2016, Richard Biener wrote:

> 
> I am testing the following patch for an optimization regression where
> a loop made dead by final value replacement was made used again by
> DOM 20 passes later.  The real issue here is that we do not get rid
> of dead loops until very late so this patch makes sure to do that.
> We could schedule it later (but better no later than unrolling
> as that might expose a pretty inefficient way of removing a dead loop).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

As expected some testcases need adjustment, thus applied as follows.

Richard.

2016-11-24  Richard Biener  

PR tree-optimization/78343
* passes.def: Add CD-DCE pass after loop splitting.
* tree-ssa-dce.c (find_obviously_necessary_stmts): Move
SCEV init/finalize ...
(perform_tree_ssa_dce): ... here.  Deal with being
executed inside the loop pipeline in aggressive mode.

* gcc.dg/tree-ssa/sccp-2.c: New testcase.
* gcc.dg/autopar/uns-outer-6.c: Adjust.
* gcc.dg/tree-ssa/20030808-1.c: Likewise.
* gcc.dg/tree-ssa/20040305-1.c: Likewise.
* gcc.dg/vect/pr38529.c: Likewise.

diff --git a/gcc/passes.def b/gcc/passes.def
index 2a470a7..2fa682b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -271,6 +271,9 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_split);
+ /* All unswitching, final value replacement and splitting can expose
+empty loops.  Remove them now.  */
+ NEXT_PASS (pass_cd_dce);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c 
b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
index dc2870b..5af60b0 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
@@ -25,7 +25,7 @@ parloop (int N)
   for (i = 0; i < N; i++)
 {
   for (j = 0; j < N; j++)
-   y[i]=x[i][j];
+   y[i] += x[i][j];
   sum += y[i];
 }
   g_sum = sum;
@@ -46,6 +46,10 @@ main (void)
 
 
 /* Check that outer loop is parallelized.  */
+/* This fails because we have
+ FAILED: data dependencies exist across iterations
+   
+
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops2" 
} } */
 /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops2" 
} } */
 /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
index 7cc5404..cda86a7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
@@ -33,8 +33,8 @@ delete_dead_jumptables ()
 /* There should be no loads of ->code.  If any exist, then we failed to
optimize away all the IF statements and the statements feeding
their conditions.  */
-/* { dg-final { scan-tree-dump-times "->code" 0 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "->code" 0 "cddce3"} } */

 /* There should be no IF statements.  */
-/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce3"} } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
index 501e28c..d1a9af8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c
@@ -27,4 +27,4 @@ void foo(int edx, int eax)
 
 /* After cddce we should have two IF statements remaining as the other
two tests can be threaded.  */
-/* { dg-final { scan-tree-dump-times "if " 2 "cddce2"} } */
+/* { dg-final { scan-tree-dump-times "if " 2 "cddce3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
new file mode 100644
index 000..099b281
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned int
+test(unsigned int quant)
+{
+  unsigned int sum = 0;
+  for (unsigned int i = 0; i < quant; ++i)
+sum += quant;
+  return sum;
+}
+
+/* A single basic-block should remain (computing and
+   returning quant * quant).  */
+/* { dg-final { scan-tree-dump-times "bb" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr38529.c 
b/gcc/testsuite/gcc.dg/vect/pr38529.c
index 171adeb..9b5919d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr38529.c
+++ b/gcc/testsuite/gcc.dg/vect/pr38529.c
@@ -11,7 +11,3 @@ void foo()
 for (j = 0; j < 17; ++j)
   a[i] = 0;
 }
-
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect"  } } */
-
-
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 7b9814e..50b5eef 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -400,7 

Re: [PATCH] Fix Ada bootstrap

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Eric Botcazou wrote:

> > 2016-11-25  Richard Biener  <rguent...@suse.de>
> > 
> > * gimple-fold.c (fold_stmt_1): Check may_propagate_copy
> > before valueizing return stmts.
> 
> What failure is it supposed to fix?  The compiler bootstraps for me at 
> r242863 
> although there is 1 ACATS regression (c41104a):

Martin reported a profiledbootstrap issue with a SSA coalescing issue
with a

  return _17(ab);

I've seen all ada tests fail with my recent testings (which included
a workaround for the VRP issue), all tests fail to link not finding
the RTS.

> +===GNAT BUG DETECTED==+
> | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) 
> GCC error:|
> | in size_binop_loc, at fold-const.c:1744  |
> | Error detected at c41104a.adb:60:33  |

Yeah, seeing this one as well.

Richard.


Re: [PATCH] Fix Ada bootstrap

2016-11-25 Thread Martin Liška
On 11/25/2016 10:20 AM, Richard Biener wrote:
> 
> Bootstrap and testing in progress.
> 
> Richard.
> 
> 2016-11-25  Richard Biener  
> 
>   * gimple-fold.c (fold_stmt_1): Check may_propagate_copy
>   before valueizing return stmts.
> 
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c (revision 242864)
> +++ gcc/gimple-fold.c (working copy)
> @@ -4414,7 +4414,8 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
>   if (ret && TREE_CODE (ret) == SSA_NAME && valueize)
> {
>   tree val = valueize (ret);
> - if (val && val != ret)
> + if (val && val != ret
> + && may_propagate_copy (ret, val))
> {
>   gimple_return_set_retval (ret_stmt, val);
>   changed = true;
> 

Profiled bootstrap on x86_64 just successfully finished with patch applied.

Thanks!
Martin


Re: [PATCH] Fix Ada bootstrap

2016-11-25 Thread Eric Botcazou
> 2016-11-25  Richard Biener  <rguent...@suse.de>
> 
>   * gimple-fold.c (fold_stmt_1): Check may_propagate_copy
>   before valueizing return stmts.

What failure is it supposed to fix?  The compiler bootstraps for me at r242863 
although there is 1 ACATS regression (c41104a):

+===GNAT BUG DETECTED==+
| 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) 
GCC error:|
| in size_binop_loc, at fold-const.c:1744  |
| Error detected at c41104a.adb:60:33  |

-- 
Eric Botcazou


[RFC PATCH, i386]: Separate operations with mask registers from operations with general registers

2016-11-25 Thread Uros Bizjak
Hello!

Attached patch tries to fix the mess with operations where register
allocator is free to use either mask or general registers. There are
plenty of problems with this approach:
a) missing operations wth general registers
- kxnor operation with general register does not exist
- kandn operation with general registers exists for TARGET_BMI only

b) register allocation problems
- DImode operations with 64bit general registers do not exist on 32bit target
- register allocator is free to allocate operation with either
register set, resulting in costly moves between general and mask
register sets

Mask operations can be generated in a consistent way using
corresponding built-ins. To prevent RA problems, we have to separate
mask ops from general ops - this way we always select operation with
well defined register set.

There is special problem with 64bit mask ops on 32bit targets:
defining these operations as a named pattern, where only mask
registers are available, will result in the situation where mask
operation will be used for 64bit values living in a pair of general
registers. RA will obviously generate many inter-regset moves in this
situation.

Another problem in point a) above is dependence of general-reg
operations on various TARGET_AVX* flags. This can be seen with kxnor
pattern, which has to be artificially enabled during register
allocation just to split non-masked operation back to sequence of
general operations. Similar situation arises with kandn on non-BMI
targets.

IMO the mentioned interferences do not warrant mixing of mask and
general operations. I suspect here will be no end of "the value is
moved between integer and mask registers unnecessary"  type of bugs
(as was the situation on 32bit targets in the past, where DImode
values were moved through MMX registers). Users can and should use
relevant builtins when operating with mask registers (n.b.: generic
logic operations can still be used; RA will move values between
regsets -  but in a *consistent way*). FWIW, the few folding
opportunities with mask builtins can be implemented in a
target-folding function.

The attached patch also paves the way for implementation of new
builtins in patch [1] that are otherwise nearly to impossible to
implement.

2016-11-25  Uros Bizjak  

* config/i386/i386.md (UNSPEC_KMASKOP): New.
(UNSPEC_KMOV): Remove.
(kmovw): Expand to plain HImode move.
(k): Rename from *k. Use
register_operand predicates.  Tag pattern with UNSPEC_KMASKOP.
Remove corresponding clobber-removing splitter.
(*anddi_1): Remove mask register alternatives.
(*andsi_1): Ditto.
(*andhi_1): Ditto.
(*andqi_1): Ditto.
(*_1): Ditto.
(*qi_1): Ditto.
(kandn): Use SWI1248_AVX512BW mode iterator.  Remove
general register alternatives.  Tag pattern with UNSPEC_KMASKOP.
Remove corresponding splitter to operation with general registers.
(*andn): Rename from *bmi_andn_.
(*andn): New pattern.
(*kxnor): Remove general register alternatives.  Tag pattern
with UNSPEC_KMASKOP.  Remove corresponding splitter to operation
with general registers.
(knot): New insn pattern.
(*one_cmpl2_1): Remove mask register alternatives.
(one_cmplqi2_1): Ditto.
(*k): Rename from *k3.
Tag pattern with UNSPEC_KMASKOP. Add mode attribute.
* config/i386/predicates.md (mask_reg_operand): Remove predicate.
* config/i386/sse.md (vec_unpacks_hi_hi): Update pattern
to generate kmaskop shift.
(vec_unpacks_hi_): Ditto.
* config/i386/i386-builtin.def (__builtin_ia32_kandhi):
Use CODE_FOR_kandhi.
(__builtin_ia32_knothi): Use CODE_FOR_knothi.
(__builtin_ia32_korhi): Use CODE_FOR_kiorhi.
(__builtin_ia32_kxorhi): Use CODE_FOR_kxorhi.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}. There are a couple fo trivial scan-asm errors due to a rename
and a missing kmov insn, where RA choose much more optimal movw
, insn instead.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01088.html

Uros.
Index: config/i386/i386-builtin.def
===
--- config/i386/i386-builtin.def(revision 242841)
+++ config/i386/i386-builtin.def(working copy)
@@ -1436,15 +1436,15 @@
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, 
"__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, 
(enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", 
IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", 
IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", 
IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 

[PATCH] Fix Ada bootstrap

2016-11-25 Thread Richard Biener

Bootstrap and testing in progress.

Richard.

2016-11-25  Richard Biener  

* gimple-fold.c (fold_stmt_1): Check may_propagate_copy
before valueizing return stmts.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 242864)
+++ gcc/gimple-fold.c   (working copy)
@@ -4414,7 +4414,8 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
if (ret && TREE_CODE (ret) == SSA_NAME && valueize)
  {
tree val = valueize (ret);
-   if (val && val != ret)
+   if (val && val != ret
+   && may_propagate_copy (ret, val))
  {
gimple_return_set_retval (ret_stmt, val);
changed = true;


Re: change initialization of ptrdiff_type_node

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 13:43, Richard Biener  wrote:
> > On Fri, 25 Nov 2016, Jakub Jelinek wrote:
> >
> >> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote:
> >> > --- a/gcc/lto/lto-lang.c
> >> > +++ b/gcc/lto/lto-lang.c
> >> > @@ -1271,8 +1271,30 @@ lto_init (void)
> >> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node)
> >> >   == const_ptr_type_node);
> >> >
> >> > -  ptrdiff_type_node = integer_type_node;
> >> > +  if (strcmp (PTRDIFF_TYPE, "int") == 0)
> >> > +ptrdiff_type_node = integer_type_node;
> >> > +  else if (strcmp (PTRDIFF_TYPE, "long int") == 0)
> >> > +ptrdiff_type_node = long_integer_type_node;
> >> > +  else if (strcmp (PTRDIFF_TYPE, "long long int") == 0)
> >> > +ptrdiff_type_node = long_long_integer_type_node;
> >> > +  else if (strcmp (PTRDIFF_TYPE, "short int") == 0)
> >> > +ptrdiff_type_node = short_integer_type_node;
> >> > +  else
> >> > +{
> >> > +  ptrdiff_type_node = NULL_TREE;
> >> > +  for (int i = 0; i < NUM_INT_N_ENTS; i++)
> >> > +   if (int_n_enabled_p[i])
> >> > + {
> >> > +   char name[50];
> >> > +   sprintf (name, "__int%d", int_n_data[i].bitsize);
> >> > +   if (strcmp (name, PTRDIFF_TYPE) == 0)
> >> > + ptrdiff_type_node = int_n_trees[i].signed_type;
> >> > + }
> >> > +  if (ptrdiff_type_node == NULL_TREE)
> >> > +   gcc_unreachable ();
> >> > +}
> >>
> >> This looks ok to me.
> >
> > But I'd like to see this in build_common_tree_nodes alongside
> > the initialization of size_type_node (and thus removed from
> > c_common_nodes_and_builtins).  This way you can simply remove
> > the lto-lang.c code as well.
> >
> > Please then also remove the ptrdiff_type_node re-set from
> > free_lang_data ().
> Hi Richard,
> Does this version look OK ?
> Validation in progress.

Yes, patch is ok if testing succeeds.

Thanks,
Richard.


Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.

2016-11-25 Thread Richard Biener
On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkool
 wrote:
> On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote:
>> On 11/24/2016 07:53 AM, Segher Boessenkool wrote:
>> >
>> >That we compare different kinds of costs (which really has no meaning at
>> >all, it's a heuristic at best) in various places is a known problem, not
>> >a regression.
>> But the problems with the costing system exhibit themselves as a code
>> quality regression.  In the end that's what the end-users see -- a
>> regression in the quality of the code GCC generates.
>
> Yes, exactly -- and I fear this all-encompassing change will cause just
> such a regression for many users.  Tests are running, will know more
> later today (or tomorrow).
>
> The PR is about a very specific problem; the patch is not.  The patch
> is not a bug fix.  If we allow anything that "makes things better" in
> stage 3, what make it different from stage 1 then?

That's a good question ;)  The stage 3 definition has a loophole via
"go file a bug about feature X, then it's a bugfix!".

I'm all open for a more sensible definition, like constraining the kind
of non-regression fixes that we want to allow, but I fear the most
sensible option would be to simply ditch the notion of different
"stages" and make it "general development" and "regression fixing".
(though if you try hard enough and go back in time you'll find that
almost all non-enhancement bugs are regressions in some sense)

And yes, current stage3 still feels too much like stage1 ;)

Richard.


Re: change initialization of ptrdiff_type_node

2016-11-25 Thread Prathamesh Kulkarni
On 25 November 2016 at 13:43, Richard Biener  wrote:
> On Fri, 25 Nov 2016, Jakub Jelinek wrote:
>
>> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote:
>> > --- a/gcc/lto/lto-lang.c
>> > +++ b/gcc/lto/lto-lang.c
>> > @@ -1271,8 +1271,30 @@ lto_init (void)
>> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node)
>> >   == const_ptr_type_node);
>> >
>> > -  ptrdiff_type_node = integer_type_node;
>> > +  if (strcmp (PTRDIFF_TYPE, "int") == 0)
>> > +ptrdiff_type_node = integer_type_node;
>> > +  else if (strcmp (PTRDIFF_TYPE, "long int") == 0)
>> > +ptrdiff_type_node = long_integer_type_node;
>> > +  else if (strcmp (PTRDIFF_TYPE, "long long int") == 0)
>> > +ptrdiff_type_node = long_long_integer_type_node;
>> > +  else if (strcmp (PTRDIFF_TYPE, "short int") == 0)
>> > +ptrdiff_type_node = short_integer_type_node;
>> > +  else
>> > +{
>> > +  ptrdiff_type_node = NULL_TREE;
>> > +  for (int i = 0; i < NUM_INT_N_ENTS; i++)
>> > +   if (int_n_enabled_p[i])
>> > + {
>> > +   char name[50];
>> > +   sprintf (name, "__int%d", int_n_data[i].bitsize);
>> > +   if (strcmp (name, PTRDIFF_TYPE) == 0)
>> > + ptrdiff_type_node = int_n_trees[i].signed_type;
>> > + }
>> > +  if (ptrdiff_type_node == NULL_TREE)
>> > +   gcc_unreachable ();
>> > +}
>>
>> This looks ok to me.
>
> But I'd like to see this in build_common_tree_nodes alongside
> the initialization of size_type_node (and thus removed from
> c_common_nodes_and_builtins).  This way you can simply remove
> the lto-lang.c code as well.
>
> Please then also remove the ptrdiff_type_node re-set from
> free_lang_data ().
Hi Richard,
Does this version look OK ?
Validation in progress.

Thanks,
Prathamesh
>
>> >
>> > +  unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node);
>> >lto_build_c_type_nodes ();
>> >gcc_assert (va_list_type_node);
>>
>> But why this and the remaining hunks?  Nothing in the middle-end
>> needs it, IMHO it should be kept in c-family/.
>
> Yeah, this change looks unnecessary to me.
>
>> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
>> > index eec2d4f..6c52387 100644
>> > --- a/gcc/tree-core.h
>> > +++ b/gcc/tree-core.h
>> > @@ -617,6 +617,7 @@ enum tree_index {
>> >TI_SIZE_TYPE,
>> >TI_PID_TYPE,
>> >TI_PTRDIFF_TYPE,
>> > +  TI_UNSIGNED_PTRDIFF_TYPE,
>> >TI_VA_LIST_TYPE,
>> >TI_VA_LIST_GPR_COUNTER_FIELD,
>> >TI_VA_LIST_FPR_COUNTER_FIELD,
>> > diff --git a/gcc/tree.h b/gcc/tree.h
>> > index 62cd7bb..ae69d0d 100644
>> > --- a/gcc/tree.h
>> > +++ b/gcc/tree.h
>> > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum 
>> > tree_code __code, int __i,
>> >  #define size_type_node  global_trees[TI_SIZE_TYPE]
>> >  #define pid_type_node   global_trees[TI_PID_TYPE]
>> >  #define ptrdiff_type_node  global_trees[TI_PTRDIFF_TYPE]
>> > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE]
>> >  #define va_list_type_node  global_trees[TI_VA_LIST_TYPE]
>> >  #define va_list_gpr_counter_field  
>> > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD]
>> >  #define va_list_fpr_counter_field  
>> > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD]
>>
>>
>>   Jakub
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
2016-11-25  Prathamesh Kulkarni  

* tree.c (build_common_tree_nodes): Initialize ptrdiff_type_node.
(free_lang_data): Remove assignment to ptrdiff_type_node.
c-family/
* c-common.c (c_common_nodes_and_builtins): Remove initialization of
ptrdiff_type_node.
lto/
* lto-lang.c (lto_init): Remove initialization of ptrdiff_type_node.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 62174a9..0749361 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -4475,8 +4475,6 @@ c_common_nodes_and_builtins (void)
 
   default_function_type
 = build_varargs_function_type_list (integer_type_node, NULL_TREE);
-  ptrdiff_type_node
-= TREE_TYPE (identifier_global_value (get_identifier (PTRDIFF_TYPE)));
   unsigned_ptrdiff_type_node = c_common_unsigned_type (ptrdiff_type_node);
 
   lang_hooks.decls.pushdecl
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index a5f04ba..58f6e0c 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -1271,8 +1271,6 @@ lto_init (void)
   gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node)
  == const_ptr_type_node);
 
-  ptrdiff_type_node = integer_type_node;
-
   lto_build_c_type_nodes ();
   gcc_assert (va_list_type_node);
 
diff --git a/gcc/tree.c b/gcc/tree.c
index 11e0abc..57f2e9a 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6006,7 +6006,6 @@ free_lang_data (void)
   free_lang_data_in_cgraph ();
 
   /* Create gimple variants for common types.  */
-  ptrdiff_type_node = 

Re: [Patch ARM 17/17] Enable _Float16 for ARM.

2016-11-25 Thread Christophe Lyon
Hi James,


On 16 November 2016 at 15:15, Kyrill Tkachov
 wrote:
>
> On 11/11/16 15:42, James Greenhalgh wrote:
>>
>> Hi,
>>
>> Finally, having added support for single-step DFmode to HFmode
>> conversions,
>> this patch adds support for _Float16 to the ARM back-end.
>>
>> That means making sure that only __fp16 promotes and adding similar hooks
>> to
>> those used in the AArch64 port giving the excess precision rules, and
>> marking HFmode as supported in libgcc.
>>
>> Bootstrapped on an ARMv8-A machine, and crosstested with no issues.
>>
>> OK?
>
>
> This looks ok to me once the prerequisites are approved.
>
> Thanks,
> Kyrill
>
>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2016-11-09  James Greenhalgh  
>>
>> PR target/63250
>> * config/arm/arm-builtins.c (arm_simd_floatHF_type_node): Rename
>> to...
>> (arm_fp16_type_node): ...This, make visibile.
>> (arm_simd_builtin_std_type): Rename arm_simd_floatHF_type_node to
>> arm_fp16_type_node.
>> (arm_init_simd_builtin_types): Likewise.
>> (arm_init_fp16_builtins): Likewise.
>> * config/arm/arm.c (arm_excess_precision): New.
>> (arm_floatn_mode): Likewise.
>> (TARGET_C_EXCESS_PRECISION): Likewise.
>> (TARGET_FLOATN_MODE): Likewise.
>> (arm_promoted_type): Only promote arm_fp16_type_node.
>> * config/arm/arm.h (arm_fp16_type_node): Declare.
>>
>> gcc/testsuite/
>>
>> 2016-11-09  James Greenhalgh  
>>
>> * lib/target-supports.exp (add_options_for_float16): Add
>> -mfp16-format=ieee when testign arm*-*-*.
>>
>

I've noticed that after this patch (r242784),
gcc.dg/torture/float16-basic.c
fail on armeb*

Christophe


Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)

2016-11-25 Thread Jakub Jelinek
On Sat, Nov 19, 2016 at 09:15:42PM +0100, Andreas Schwab wrote:
>   * gcc.c-torture/execute/comp-goto-1.c (insn_t): Change offset to
>   signed int.

The same testcase is copied to gcc.dg/tree-prof/, just with extra dg-
directives.  Committed as obvious:

2016-11-25  Jakub Jelinek  
Andreas Schwab  

PR gcov-profile/78467
* gcc.dg/tree-prof/comp-goto-1.c (insn_t): Change offset to
signed int.

--- gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c.jj 2013-06-07 
13:17:15.0 +0200
+++ gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c2016-11-25 
09:46:58.734509541 +0100
@@ -16,7 +16,7 @@ typedef union
 {
   struct
 {
-  unsigned int offset:18;
+  signed int   offset:18;
   unsigned int ignore:4;
   unsigned int s1:8;
   int  :2;


Jakub


Re: [PR 70965] Schedule extra pass_rebuild_cgraph_edges

2016-11-25 Thread Richard Biener
On Thu, Nov 24, 2016 at 5:44 PM, Martin Jambor  wrote:
> Hi,
>
> after discussing this with Honza, we have decided that scheduling an
> extra pass_rebuild_cgraph_edges after pass_fixup_cfg is the correct
> way to keep the cgraph consistent with gimple IL when early IPA passes
> need it, such as is the case with the testcase in PR 70965.
>
> While needing an extra pass is never nice, this is a consequence of
> splitting pass_build_ssa_passes out of early optimization passes so
> that pass_chkp can be in between.
>
> The patch below fices the ICE in PR 70965 and has passed bootstrap and
> testing on x86_64-linux.  OK for trunk?

Ok.

Richard.

> Thanks,
>
> Martin
>
>
> 2016-11-24  Martin Jambor  
>
> gcc/
> * passes.def (pass_build_ssa_passes): Add pass_rebuild_cgraph_edges.
>
> gcc/testsuite/
> * g++.dg/pr70965.C: New test.
> ---
>  gcc/passes.def |  1 +
>  gcc/testsuite/g++.dg/pr70965.C | 21 +
>  2 files changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr70965.C
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 85a5af0..5faf17f 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_build_ssa_passes);
>PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes)
>NEXT_PASS (pass_fixup_cfg);
> +  NEXT_PASS (pass_rebuild_cgraph_edges);
>NEXT_PASS (pass_build_ssa);
>NEXT_PASS (pass_warn_nonnull_compare);
>NEXT_PASS (pass_ubsan);
> diff --git a/gcc/testsuite/g++.dg/pr70965.C b/gcc/testsuite/g++.dg/pr70965.C
> new file mode 100644
> index 000..d8a2c35
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr70965.C
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -std=c++11" } */
> +
> +struct A {};
> +struct B {};
> +struct C { using p = int *; template  using ra = A; };
> +struct J : C { template  struct K { typedef C::ra o; }; };
> +template  struct D
> +{
> +  struct H : J::K::o { H (J::p, A) : J::K::o () {} };
> +  H d;
> +  D (const char *, const A  = A ()) : d (0, x) {}
> +};
> +extern template class D;
> +enum L { M };
> +struct F { virtual char *foo (); };
> +template  struct I : B { static int foo (int) {} };
> +struct G { typedef I t; };
> +void foo (int) { G::t::foo (0); }
> +void bar (const D &, const D &, int, L);
> +void baz () try { foo (0); } catch (F ) { bar (e.foo (), "", 0, M); }
> --
> 2.10.2
>


Re: [tree-tailcall] Check if function returns it's argument

2016-11-25 Thread Prathamesh Kulkarni
On 25 November 2016 at 13:55, Richard Biener  wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 25 November 2016 at 13:37, Richard Biener  wrote:
>> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 18:08, Richard Biener  wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 November 2016 at 17:48, Richard Biener  wrote:
>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 24 November 2016 at 14:07, Richard Biener  
>> >> >> >> wrote:
>> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >
>> >> >> >> >> Hi,
>> >> >> >> >> Consider following test-case:
>> >> >> >> >>
>> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> >> {
>> >> >> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >> >> >>   return a1;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> >> >> and the call could be emitted as a tail-call.
>> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> >> >> but if it is changed to:
>> >> >> >> >>
>> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> >> return t1;
>> >> >> >> >>
>> >> >> >> >> Then memcpy is emitted as a tail-call.
>> >> >> >> >> The attached patch tries to handle the former case.
>> >> >> >> >>
>> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> >> >> Does this patch look OK ?
>> >> >> >> >
>> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it 
>> >> >> >> > doesn't.
>> >> >> >> > */
>> >> >> >> > +tree
>> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> >> >> > +{
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Please just inline it at the single use - the name is not terribly
>> >> >> >> > informative.
>> >> >> >> >
>> >> >> >> > I'm not sure you can rely on code-generation working if you not
>> >> >> >> > effectively change the IL to
>> >> >> >> >
>> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> >   return a1;
>> >> >> >> >
>> >> >> >> > someone more familiar with RTL expansion plus tail call emission 
>> >> >> >> > on
>> >> >> >> > RTL needs to chime in.
>> >> >> >> Well I was trying to copy-propagate function's argument into uses of
>> >> >> >> it's return value if
>> >> >> >> function returned that argument, so the assignment to lhs of call
>> >> >> >> could be made redundant.
>> >> >> >>
>> >> >> >> eg:
>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> {
>> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >>   return t1;
>> >> >> >> }
>> >> >> >>
>> >> >> >> After patch, copyprop transformed it into:
>> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> return a1;
>> >> >> >
>> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's
>> >> >> > better to use t1 as that's readily available in the return register
>> >> >> > while the register for a1 might have been clobbered and thus we
>> >> >> > need to spill it for the later return.
>> >> >> Oh I didn't realize this could possibly pessimize RA.
>> >> >> For test-case:
>> >> >>
>> >> >> void *t1 = memcpy (dest, src, n);
>> >> >> if (t1 != dest)
>> >> >>   __builtin_abort ();
>> >> >>
>> >> >> we could copy-propagate t1 into cond_expr and make the condition 
>> >> >> redundant.
>> >> >> However I suppose this particular case could be handled with VRP 
>> >> >> instead
>> >> >> (t1 and dest should be marked equivalent) ?
>> >> >
>> >> > Yeah, exposing this to value-numbering in general can enable some
>> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then
>> >> > difficult to avoid copy-propgating things...
>> >> >
>> >> > The user can also write
>> >> >
>> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> > {
>> >> >   __builtin_memcpy (a1, a2, a3);
>> >> >   return a1;
>> >> > }
>> >> >
>> >> > so it's good to improve code-gen for that (for the tailcall issue).
>> >> For the tail-call, issue should we artificially create a lhs and use that
>> >> as return value (perhaps by a separate pass before tailcall) ?
>> >>
>> >> __builtin_memcpy (a1, a2, a3);
>> >> return a1;
>> >>
>> >> gets transformed to:
>> >> _1 = __builtin_memcpy (a1, a2, a3)
>> >> return _1;
>> >>
>> >> So tail-call optimization pass would see the IL in it's expected form.
>> >
>> > As said, a RTL expert needs to chime in here.  Iff then tail-call
>> > itself should do this rewrite.  But if this form is required to make
>> > things work (I suppose you checked it _does_ actually work?) then
>> > we'd need to make sure later passes do not undo it.  So it looks
>> > fragile to me.  OTOH I seem to remember that the flags we set on
>> > GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> > verified again 

Re: [PATCH 1/2 v3] PR77822

2016-11-25 Thread Dominik Vogt
On Thu, Nov 24, 2016 at 08:33:32AM -0700, Jeff Law wrote:
> On 11/24/2016 02:59 AM, Dominik Vogt wrote:
> >On Wed, Nov 23, 2016 at 01:22:31PM -0700, Jeff Law wrote:
> >>>   PR target/77822
> >>>   * system.h (SIZE_POS_IN_RANGE): New.
> >>OK.  Though system.h seems like an unfortunate place.  Would rtl.h
> >>work better since this is really about verifying the arguments to
> >>things like {zero,sign}_extract which are RTL concepts.
> >
> >I was unsure whether it's better to give the macro a name not
> >related to *_extract and put it into system.h or make it specific
> >to extracts and put it in rtl.h.
> Yea.  What tends to tip the balance for me is that it's likely not
> reusable outside extraction argument testing.
> 
> >
> >It's probably not really reuseable elsewhere, so when moving it to
> >rtl.h I'll also rename it to EXTRACT_ARGS_IN_RANGE.  Okay?
> Yea, that sounds perfect.

Version 4 of the patch attached.  Bootstrapped and regression
tested on s390x biarch and s390.  Changes since the last version:

v4:
Rename SIZE_POS_IN_RANGE to EXTRACT_ARGS_IN_RANGE and move
it to rtl.h.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

PR target/77822
* rtl.h (EXTRACT_ARGS_IN_RANGE): New.
>From 8e5bcea2983b50ba133a83086fcc21c31d8dbabe Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 17 Nov 2016 14:49:18 +0100
Subject: [PATCH 1/2] PR target/77822: Add helper macro EXTRACT_ARGS_IN_RANGE
 to system.h.

The macro can be used to validate the arguments of zero_extract and
sign_extract to fix this problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
---
 gcc/rtl.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index 660d381..1847bce 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2694,6 +2694,16 @@ get_full_set_src_cost (rtx x, machine_mode mode, struct 
full_rtx_costs *c)
 }
 #endif
 
+/* A convenience macro to validate the arguments of a zero_extract
+   expression.  It determines whether SIZE lies inclusively within
+   [1, RANGE], POS lies inclusively within between [0, RANGE - 1]
+   and the sum lies inclusively within [1, RANGE].  RANGE must be
+   >= 1, but SIZE and POS may be negative.  */
+#define EXTRACT_ARGS_IN_RANGE(SIZE, POS, RANGE) \
+  (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (RANGE) - 1) \
+   && IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (RANGE) \
+  - (unsigned HOST_WIDE_INT)(POS)))
+
 /* In explow.c */
 extern HOST_WIDE_INT trunc_int_for_mode(HOST_WIDE_INT, machine_mode);
 extern rtx plus_constant (machine_mode, rtx, HOST_WIDE_INT, bool = false);
-- 
2.3.0



Re: [PATCH 2/2 v3] PR77822

2016-11-25 Thread Dominik Vogt
On Mon, Nov 21, 2016 at 12:05:42PM +0100, Dominik Vogt wrote:
> On Thu, Nov 17, 2016 at 04:54:17PM +0100, Dominik Vogt wrote:
> > On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote:
> > > The following two patches fix PR 77822 on s390x for gcc-7.  As the
> > > macro doing the argument range checks can be used on other targets
> > > as well, I've put it in system.h (couldn't think of a better
> > > place; maybe rtl.h?).
> > > 
> > > Bootstrapped on s390x biarch, regression tested on s390x biarch
> > > and s390, all on a zEC12 with -march=zEC12.
> > > 
> > > Please check the commit messages for details.
> > 
> > S390 backend patch.

Version 4 of the patchset.  Bootstrapped and regression tested on
s390 biarch and s390.

Changes since the first version of the patchset:

v2:
Add s390 test cases.
Support .cxx tests in s390.exp.
Put all arguments of SIZE_POS_IN_RANGE in parentheses.
Rewrite SIZE_POS_IN_RANGE macro to handle wrapping SIZE +
POS.

v3:
Rename macro argument from UPPER back to RANGE.

v4:
Rename SIZE_POS_IN_RANGE to EXTRACT_ARGS_IN_RANGE and move it to rtl.h.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLogb

PR target/77822
* config/s390/s390.md ("extzv")
("*extzv")
("*extzvdi_lshiftrt")
("*_ior_and_sr_ze")
("*extract1bitdi")
("*insv", "*insv_rnsbg_noshift")
("*insv_rnsbg_srl", "*insv_mem_reg")
("*insvdi_mem_reghigh", "*insvdi_reg_imm"): Use EXTRACT_ARGS_IN_RANGE
to validate the arguments of zero_extract and sign_extract.
gcc/testsuite/ChangeLogb

PR target/77822
* gcc.target/s390/s390.exp: Support .cxx tests.
* gcc.target/s390/pr77822-2.c: New test.
* gcc.target/s390/pr77822-1.cxx: New test.
>From bcf508a7e2f90c7e67702c437780cc12a3f74860 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 17 Nov 2016 14:49:48 +0100
Subject: [PATCH 2/2] PR target/77822: S390: Validate argument range of
 {zero,sign}_extract.

With some undefined code, combine generates patterns where the arguments to
*_extract are out of range, e.b. a negative bit position.  If the s390 backend
accepts these, they lead to not just undefined behaviour but invalid assembly
instructions (argument out of the allowed range).  So this patch makes sure
that the rtl expressions with out of range arguments are rejected.
---
 gcc/config/s390/s390.md |  20 +-
 gcc/testsuite/gcc.target/s390/pr77822-1.cxx |  21 ++
 gcc/testsuite/gcc.target/s390/pr77822-2.c   | 307 
 gcc/testsuite/gcc.target/s390/s390.exp  |   8 +-
 4 files changed, 349 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr77822-1.cxx
 create mode 100644 gcc/testsuite/gcc.target/s390/pr77822-2.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index a449b03..aaf8427 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3741,6 +3741,8 @@
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
+  if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64))
+FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
@@ -3760,7 +3762,9 @@
 (match_operand 2 "const_int_operand" "")   ; size
 (match_operand 3 "const_int_operand" ""))) ; start
   ]
-  ""
+  "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]),
+GET_MODE_BITSIZE (mode))"
   "\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, 
shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
@@ -3773,6 +3777,7 @@
(lshiftrt:DI (match_operand:DI 3 "register_operand" "d")
 (match_operand:DI 4 "nonzero_shift_count_operand" "")))]
   "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64)
&& 64 - UINTVAL (operands[4]) >= UINTVAL (operands[1])"
   "\t%0,%3,%2,%2+%1-1,128-%2-%1-%4"
   [(set_attr "op_type" "RIE")
@@ -3791,6 +3796,7 @@
  (match_operand 5 "const_int_operand" "")) ; start
 4)))]
   "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[4]), INTVAL (operands[5]), 64)
&& UINTVAL (operands[2]) == (~(0ULL) << UINTVAL (operands[4]))"
   "\t%0,%3,64-%4,63,%4+%5"
   [(set_attr "op_type" "RIE")
@@ -3804,7 +3810,8 @@
(const_int 1)  ; size
(match_operand 2 "const_int_operand" "")) ; start
   (const_int 0)))]
-  ""
+  "
+   && EXTRACT_ARGS_IN_RANGE (1, INTVAL (operands[2]), 64)"
   "\t%0,%1,64-1,128+63,%2+1" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
@@ -3919,6 +3926,8 @@
  (match_operand 2 "const_int_operand""I")) ; pos
(match_operand:GPR 3 "nonimmediate_operand" "d"))]
   "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[1]), INTVAL 

Re: [tree-tailcall] Check if function returns it's argument

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 13:37, Richard Biener  wrote:
> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 November 2016 at 18:08, Richard Biener  wrote:
> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 November 2016 at 17:48, Richard Biener  wrote:
> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 24 November 2016 at 14:07, Richard Biener  
> >> >> >> wrote:
> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >> >
> >> >> >> >> Hi,
> >> >> >> >> Consider following test-case:
> >> >> >> >>
> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> >> {
> >> >> >> >>   __builtin_memcpy (a1, a2, a3);
> >> >> >> >>   return a1;
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,
> >> >> >> >> and the call could be emitted as a tail-call.
> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> >> >> >> but if it is changed to:
> >> >> >> >>
> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> >> return t1;
> >> >> >> >>
> >> >> >> >> Then memcpy is emitted as a tail-call.
> >> >> >> >> The attached patch tries to handle the former case.
> >> >> >> >>
> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> >> >> >> Does this patch look OK ?
> >> >> >> >
> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it 
> >> >> >> > doesn't.
> >> >> >> > */
> >> >> >> > +tree
> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)
> >> >> >> > +{
> >> >> >> >
> >> >> >> >
> >> >> >> > Please just inline it at the single use - the name is not terribly
> >> >> >> > informative.
> >> >> >> >
> >> >> >> > I'm not sure you can rely on code-generation working if you not
> >> >> >> > effectively change the IL to
> >> >> >> >
> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> >   return a1;
> >> >> >> >
> >> >> >> > someone more familiar with RTL expansion plus tail call emission on
> >> >> >> > RTL needs to chime in.
> >> >> >> Well I was trying to copy-propagate function's argument into uses of
> >> >> >> it's return value if
> >> >> >> function returned that argument, so the assignment to lhs of call
> >> >> >> could be made redundant.
> >> >> >>
> >> >> >> eg:
> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> {
> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >>   return t1;
> >> >> >> }
> >> >> >>
> >> >> >> After patch, copyprop transformed it into:
> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> return a1;
> >> >> >
> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's
> >> >> > better to use t1 as that's readily available in the return register
> >> >> > while the register for a1 might have been clobbered and thus we
> >> >> > need to spill it for the later return.
> >> >> Oh I didn't realize this could possibly pessimize RA.
> >> >> For test-case:
> >> >>
> >> >> void *t1 = memcpy (dest, src, n);
> >> >> if (t1 != dest)
> >> >>   __builtin_abort ();
> >> >>
> >> >> we could copy-propagate t1 into cond_expr and make the condition 
> >> >> redundant.
> >> >> However I suppose this particular case could be handled with VRP instead
> >> >> (t1 and dest should be marked equivalent) ?
> >> >
> >> > Yeah, exposing this to value-numbering in general can enable some
> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then
> >> > difficult to avoid copy-propgating things...
> >> >
> >> > The user can also write
> >> >
> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> > {
> >> >   __builtin_memcpy (a1, a2, a3);
> >> >   return a1;
> >> > }
> >> >
> >> > so it's good to improve code-gen for that (for the tailcall issue).
> >> For the tail-call, issue should we artificially create a lhs and use that
> >> as return value (perhaps by a separate pass before tailcall) ?
> >>
> >> __builtin_memcpy (a1, a2, a3);
> >> return a1;
> >>
> >> gets transformed to:
> >> _1 = __builtin_memcpy (a1, a2, a3)
> >> return _1;
> >>
> >> So tail-call optimization pass would see the IL in it's expected form.
> >
> > As said, a RTL expert needs to chime in here.  Iff then tail-call
> > itself should do this rewrite.  But if this form is required to make
> > things work (I suppose you checked it _does_ actually work?) then
> > we'd need to make sure later passes do not undo it.  So it looks
> > fragile to me.  OTOH I seem to remember that the flags we set on
> > GIMPLE are merely a hint to RTL expansion and the tailcalling is
> > verified again there?
> 
> Yeah, I verified the form works:
> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> {
>   void *t1 = __builtin_memcpy (a1, a2, a3);
>   return t1;
> }
> 
> assembly:
> f:
> .LFB0:
> .cfi_startproc
>   

Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).

2016-11-25 Thread Richard Biener
On Thu, Nov 24, 2016 at 4:22 PM, Bin Cheng  wrote:
> Hi,
> This patch fixes two issues in newly introduced pattern: (cond (cmp 
> (convert1? @1) @3) (convert2? @1) @2).
> For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, 
> this patch adds check for that.
> Note we don't check c1_type/c2_type for that because it's guarded by 
> INTEGER_CST@.
> For PR78510, the ICE is covered by revision 242831, but underlying issue is 
> when the block of conditions
> is not satisfied, the last case simplification is applied anyway since code 
> is initialized to cmp at the
> first place.  This patch fixes that by setting code to proper value only when 
> transformation is valid, it
> also incorporates Marc's suggestion by using cmp directly.  Two tests are 
> added too.
> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?

Ok.

Btw,

-  (cond (cmp@0 (convert1? @1) ...
...
- enum tree_code code = TREE_CODE (@0)...

shows another misconception (I didn't notice...) in that TREE_CODE of a captured
expression results in the expression code.  On GENERIC that works but on GIMPLE
@0 will be a SSA_NAME and thus 'code' was SSA_NAME ...

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-11-24  Bin Cheng  
>
> PR middle-end/78507
> PR middle-end/78510
> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use
> cmp directly, rather than cmp_code.  Initialize code to ERROR_MARK
> and set it to result code if transformation is valid.  Use code EQ
> directly in last simplification case.
>
> gcc/testsuite/ChangeLog
> 2016-11-24  Bin Cheng  
>
> PR middle-end/78507
> PR middle-end/78510
> * g++.dg/torture/pr78507.C: New test.
> * gcc.dg/torture/pr78510.c: New test.


[Patch, testsuite] Fix bogus pr64427.c failure for avr

2016-11-25 Thread Senthil Kumar Selvaraj

  The smaller int size for the avr target breaks the test's
  expectation on the number of iterations. The failure goes
  away if 32 bit ints are used in place of a plain int.

  Fix by conditionally typedef int32_t to __INT32_TYPE__ for targets
  with int size < 4,  and then use int32_t everywhere.

Regards
Senthil

2016-11-25  Senthil Kumar Selvaraj  

* gcc.dg/pr64277.c: Use 32 bit int for targets
with sizeof(int) < 4. 

Index: gcc/testsuite/gcc.dg/pr64277.c
===
--- gcc/testsuite/gcc.dg/pr64277.c  (revision 242857)
+++ gcc/testsuite/gcc.dg/pr64277.c  (working copy)
@@ -4,10 +4,16 @@
 /* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" 
"cunroll" } } */
 /* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" 
"cunroll" } } */
 
-int f1[10];
+#if __SIZEOF_INT__ < 4
+  __extension__ typedef __INT32_TYPE__ int32_t;
+#else
+  typedef int int32_t;
+#endif
+
+int32_t f1[10];
 void test1 (short a[], short m, unsigned short l)
 {
-  int i = l;
+  int32_t i = l;
   for (i = i + 5; i < m; i++)
 f1[i] = a[i]++;
 }
@@ -14,7 +20,7 @@
 
 void test2 (short a[], short m, short l)
 {
-  int i;
+  int32_t i;
   if (m > 5)
 m = 5;
   for (i = m; i > l; i--)


Re: [tree-tailcall] Check if function returns it's argument

2016-11-25 Thread Prathamesh Kulkarni
On 25 November 2016 at 13:37, Richard Biener  wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 24 November 2016 at 18:08, Richard Biener  wrote:
>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 November 2016 at 17:48, Richard Biener  wrote:
>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 November 2016 at 14:07, Richard Biener  wrote:
>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> Hi,
>> >> >> >> Consider following test-case:
>> >> >> >>
>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> >> {
>> >> >> >>   __builtin_memcpy (a1, a2, a3);
>> >> >> >>   return a1;
>> >> >> >> }
>> >> >> >>
>> >> >> >> return a1 can be considered equivalent to return value of memcpy,
>> >> >> >> and the call could be emitted as a tail-call.
>> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
>> >> >> >> but if it is changed to:
>> >> >> >>
>> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >> return t1;
>> >> >> >>
>> >> >> >> Then memcpy is emitted as a tail-call.
>> >> >> >> The attached patch tries to handle the former case.
>> >> >> >>
>> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
>> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
>> >> >> >> Does this patch look OK ?
>> >> >> >
>> >> >> > +/* Return arg, if function returns it's argument or NULL if it 
>> >> >> > doesn't.
>> >> >> > */
>> >> >> > +tree
>> >> >> > +gimple_call_return_arg (gcall *call_stmt)
>> >> >> > +{
>> >> >> >
>> >> >> >
>> >> >> > Please just inline it at the single use - the name is not terribly
>> >> >> > informative.
>> >> >> >
>> >> >> > I'm not sure you can rely on code-generation working if you not
>> >> >> > effectively change the IL to
>> >> >> >
>> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
>> >> >> >   return a1;
>> >> >> >
>> >> >> > someone more familiar with RTL expansion plus tail call emission on
>> >> >> > RTL needs to chime in.
>> >> >> Well I was trying to copy-propagate function's argument into uses of
>> >> >> it's return value if
>> >> >> function returned that argument, so the assignment to lhs of call
>> >> >> could be made redundant.
>> >> >>
>> >> >> eg:
>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> >> >> {
>> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
>> >> >>   return t1;
>> >> >> }
>> >> >>
>> >> >> After patch, copyprop transformed it into:
>> >> >> t1 = __builtin_memcpy (a1, a2, a3);
>> >> >> return a1;
>> >> >
>> >> > But that's a bad transform -- if we know that t1 == a1 then it's
>> >> > better to use t1 as that's readily available in the return register
>> >> > while the register for a1 might have been clobbered and thus we
>> >> > need to spill it for the later return.
>> >> Oh I didn't realize this could possibly pessimize RA.
>> >> For test-case:
>> >>
>> >> void *t1 = memcpy (dest, src, n);
>> >> if (t1 != dest)
>> >>   __builtin_abort ();
>> >>
>> >> we could copy-propagate t1 into cond_expr and make the condition 
>> >> redundant.
>> >> However I suppose this particular case could be handled with VRP instead
>> >> (t1 and dest should be marked equivalent) ?
>> >
>> > Yeah, exposing this to value-numbering in general can enable some
>> > optimizations (but I wouldn't put it in copyprop).  Note it's then
>> > difficult to avoid copy-propgating things...
>> >
>> > The user can also write
>> >
>> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
>> > {
>> >   __builtin_memcpy (a1, a2, a3);
>> >   return a1;
>> > }
>> >
>> > so it's good to improve code-gen for that (for the tailcall issue).
>> For the tail-call, issue should we artificially create a lhs and use that
>> as return value (perhaps by a separate pass before tailcall) ?
>>
>> __builtin_memcpy (a1, a2, a3);
>> return a1;
>>
>> gets transformed to:
>> _1 = __builtin_memcpy (a1, a2, a3)
>> return _1;
>>
>> So tail-call optimization pass would see the IL in it's expected form.
>
> As said, a RTL expert needs to chime in here.  Iff then tail-call
> itself should do this rewrite.  But if this form is required to make
> things work (I suppose you checked it _does_ actually work?) then
> we'd need to make sure later passes do not undo it.  So it looks
> fragile to me.  OTOH I seem to remember that the flags we set on
> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> verified again there?

Yeah, I verified the form works:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

assembly:
f:
.LFB0:
.cfi_startproc
jmp memcpy
.cfi_endproc

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >> But this now interferes with tail-call optimization, because it is not
>> >> >> able to emit memcpy
>> >> >> as tail-call anymore due to 

Re: [PATCH] eliminate calls to snprintf(0, 0, ...) with known return value (pr78476)

2016-11-25 Thread Markus Trippelsdorf
On 2016.11.22 at 20:02 -0700, Martin Sebor wrote:
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/78476
>   * gimple-ssa-sprintf.c (struct pass_sprintf_length::call_info):
>   Add a member.
>   (handle_gimple_call): Adjust signature.
>   (try_substitute_return_value): Remove calls to bounded functions
>   with zero buffer size whose result is known.
>   (pass_sprintf_length::execute): Adjust call to handle_gimple_call.

You left conflict markers in gcc/ChangeLog in your r242854 commit.

-- 
Markus


Re: change initialization of ptrdiff_type_node

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Jakub Jelinek wrote:

> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote:
> > --- a/gcc/lto/lto-lang.c
> > +++ b/gcc/lto/lto-lang.c
> > @@ -1271,8 +1271,30 @@ lto_init (void)
> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node)
> >   == const_ptr_type_node);
> >  
> > -  ptrdiff_type_node = integer_type_node;
> > +  if (strcmp (PTRDIFF_TYPE, "int") == 0)
> > +ptrdiff_type_node = integer_type_node;
> > +  else if (strcmp (PTRDIFF_TYPE, "long int") == 0)
> > +ptrdiff_type_node = long_integer_type_node;
> > +  else if (strcmp (PTRDIFF_TYPE, "long long int") == 0)
> > +ptrdiff_type_node = long_long_integer_type_node;
> > +  else if (strcmp (PTRDIFF_TYPE, "short int") == 0)
> > +ptrdiff_type_node = short_integer_type_node;
> > +  else
> > +{
> > +  ptrdiff_type_node = NULL_TREE;
> > +  for (int i = 0; i < NUM_INT_N_ENTS; i++)
> > +   if (int_n_enabled_p[i])
> > + {
> > +   char name[50];
> > +   sprintf (name, "__int%d", int_n_data[i].bitsize);
> > +   if (strcmp (name, PTRDIFF_TYPE) == 0)
> > + ptrdiff_type_node = int_n_trees[i].signed_type;
> > + }
> > +  if (ptrdiff_type_node == NULL_TREE)
> > +   gcc_unreachable ();
> > +}
> 
> This looks ok to me.

But I'd like to see this in build_common_tree_nodes alongside
the initialization of size_type_node (and thus removed from
c_common_nodes_and_builtins).  This way you can simply remove
the lto-lang.c code as well.

Please then also remove the ptrdiff_type_node re-set from
free_lang_data ().

> >  
> > +  unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node);
> >lto_build_c_type_nodes ();
> >gcc_assert (va_list_type_node);
> 
> But why this and the remaining hunks?  Nothing in the middle-end
> needs it, IMHO it should be kept in c-family/.

Yeah, this change looks unnecessary to me.

> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index eec2d4f..6c52387 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -617,6 +617,7 @@ enum tree_index {
> >TI_SIZE_TYPE,
> >TI_PID_TYPE,
> >TI_PTRDIFF_TYPE,
> > +  TI_UNSIGNED_PTRDIFF_TYPE,
> >TI_VA_LIST_TYPE,
> >TI_VA_LIST_GPR_COUNTER_FIELD,
> >TI_VA_LIST_FPR_COUNTER_FIELD,
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 62cd7bb..ae69d0d 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum 
> > tree_code __code, int __i,
> >  #define size_type_node  global_trees[TI_SIZE_TYPE]
> >  #define pid_type_node   global_trees[TI_PID_TYPE]
> >  #define ptrdiff_type_node  global_trees[TI_PTRDIFF_TYPE]
> > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE]
> >  #define va_list_type_node  global_trees[TI_VA_LIST_TYPE]
> >  #define va_list_gpr_counter_field  
> > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD]
> >  #define va_list_fpr_counter_field  
> > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD]
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] remove invalid "tail +16c" in config/acx.m4

2016-11-25 Thread ma . jiang
Hi all,
  In "config/acx.m4", there are still some "tail +16c"  which are invalid 
on POSIX systems. 
  In my opinion, all "tail +16c" should be changed to "tail -c +16" 
directly, as most systems has accept the latter.
  And, to skip first 16 bytes, we should use "tail -c +17" instead of 
"tail -c +16".

 * config/acx.m4:Change "tail +16c" to "tail -c +17".
 * configure: Regenerate.
--- gcc-6.2.0/config/acx.m4 2011-12-18 17:58:37.0 +0800
+++ gcc-6.2.0-bak/config/acx.m4 2016-11-23 10:56:21.065817691 +0800
@@ -404,7 +404,7 @@ AC_DEFUN([ACX_PROG_CMP_IGNORE_INITIAL],
 [AC_CACHE_CHECK([how to compare bootstrapped objects], 
gcc_cv_prog_cmp_skip,
 [ echo abfoo >t1
   echo cdfoo >t2
-  gcc_cv_prog_cmp_skip='tail +16c $$f1 > tmp-foo1; tail +16c $$f2 > 
tmp-foo2; cmp tmp-foo1 tmp-foo2'
+  gcc_cv_prog_cmp_skip='tail -c +17 $$f1 > tmp-foo1; tail -c +17 $$f2 > 
tmp-foo2; cmp tmp-foo1 tmp-foo2'
   if cmp t1 t2 2 2 > /dev/null 2>&1; then
 if cmp t1 t2 1 1 > /dev/null 2>&1; then
   :


Re: [tree-tailcall] Check if function returns it's argument

2016-11-25 Thread Richard Biener
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 18:08, Richard Biener  wrote:
> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 November 2016 at 17:48, Richard Biener  wrote:
> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 November 2016 at 14:07, Richard Biener  wrote:
> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >> Consider following test-case:
> >> >> >>
> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> >> {
> >> >> >>   __builtin_memcpy (a1, a2, a3);
> >> >> >>   return a1;
> >> >> >> }
> >> >> >>
> >> >> >> return a1 can be considered equivalent to return value of memcpy,
> >> >> >> and the call could be emitted as a tail-call.
> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,
> >> >> >> but if it is changed to:
> >> >> >>
> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >> >> return t1;
> >> >> >>
> >> >> >> Then memcpy is emitted as a tail-call.
> >> >> >> The attached patch tries to handle the former case.
> >> >> >>
> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*
> >> >> >> Does this patch look OK ?
> >> >> >
> >> >> > +/* Return arg, if function returns it's argument or NULL if it 
> >> >> > doesn't.
> >> >> > */
> >> >> > +tree
> >> >> > +gimple_call_return_arg (gcall *call_stmt)
> >> >> > +{
> >> >> >
> >> >> >
> >> >> > Please just inline it at the single use - the name is not terribly
> >> >> > informative.
> >> >> >
> >> >> > I'm not sure you can rely on code-generation working if you not
> >> >> > effectively change the IL to
> >> >> >
> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);
> >> >> >   return a1;
> >> >> >
> >> >> > someone more familiar with RTL expansion plus tail call emission on
> >> >> > RTL needs to chime in.
> >> >> Well I was trying to copy-propagate function's argument into uses of
> >> >> it's return value if
> >> >> function returned that argument, so the assignment to lhs of call
> >> >> could be made redundant.
> >> >>
> >> >> eg:
> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> >> >> {
> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);
> >> >>   return t1;
> >> >> }
> >> >>
> >> >> After patch, copyprop transformed it into:
> >> >> t1 = __builtin_memcpy (a1, a2, a3);
> >> >> return a1;
> >> >
> >> > But that's a bad transform -- if we know that t1 == a1 then it's
> >> > better to use t1 as that's readily available in the return register
> >> > while the register for a1 might have been clobbered and thus we
> >> > need to spill it for the later return.
> >> Oh I didn't realize this could possibly pessimize RA.
> >> For test-case:
> >>
> >> void *t1 = memcpy (dest, src, n);
> >> if (t1 != dest)
> >>   __builtin_abort ();
> >>
> >> we could copy-propagate t1 into cond_expr and make the condition redundant.
> >> However I suppose this particular case could be handled with VRP instead
> >> (t1 and dest should be marked equivalent) ?
> >
> > Yeah, exposing this to value-numbering in general can enable some
> > optimizations (but I wouldn't put it in copyprop).  Note it's then
> > difficult to avoid copy-propgating things...
> >
> > The user can also write
> >
> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
> > {
> >   __builtin_memcpy (a1, a2, a3);
> >   return a1;
> > }
> >
> > so it's good to improve code-gen for that (for the tailcall issue).
> For the tail-call, issue should we artificially create a lhs and use that
> as return value (perhaps by a separate pass before tailcall) ?
> 
> __builtin_memcpy (a1, a2, a3);
> return a1;
> 
> gets transformed to:
> _1 = __builtin_memcpy (a1, a2, a3)
> return _1;
> 
> So tail-call optimization pass would see the IL in it's expected form.

As said, a RTL expert needs to chime in here.  Iff then tail-call
itself should do this rewrite.  But if this form is required to make
things work (I suppose you checked it _does_ actually work?) then
we'd need to make sure later passes do not undo it.  So it looks
fragile to me.  OTOH I seem to remember that the flags we set on
GIMPLE are merely a hint to RTL expansion and the tailcalling is
verified again there?

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> >> But this now interferes with tail-call optimization, because it is not
> >> >> able to emit memcpy
> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.
> >> >> I am not sure how to workaround this.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Richard.
> >> >>
> >> >
> >> > --
> >> > Richard Biener 
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
> >> > HRB 21284 (AG Nuernberg)
> >>
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 

Re: change initialization of ptrdiff_type_node

2016-11-25 Thread Jakub Jelinek
On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote:
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -1271,8 +1271,30 @@ lto_init (void)
>gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node)
> == const_ptr_type_node);
>  
> -  ptrdiff_type_node = integer_type_node;
> +  if (strcmp (PTRDIFF_TYPE, "int") == 0)
> +ptrdiff_type_node = integer_type_node;
> +  else if (strcmp (PTRDIFF_TYPE, "long int") == 0)
> +ptrdiff_type_node = long_integer_type_node;
> +  else if (strcmp (PTRDIFF_TYPE, "long long int") == 0)
> +ptrdiff_type_node = long_long_integer_type_node;
> +  else if (strcmp (PTRDIFF_TYPE, "short int") == 0)
> +ptrdiff_type_node = short_integer_type_node;
> +  else
> +{
> +  ptrdiff_type_node = NULL_TREE;
> +  for (int i = 0; i < NUM_INT_N_ENTS; i++)
> + if (int_n_enabled_p[i])
> +   {
> + char name[50];
> + sprintf (name, "__int%d", int_n_data[i].bitsize);
> + if (strcmp (name, PTRDIFF_TYPE) == 0)
> +   ptrdiff_type_node = int_n_trees[i].signed_type;
> +   }
> +  if (ptrdiff_type_node == NULL_TREE)
> + gcc_unreachable ();
> +}

This looks ok to me.

>  
> +  unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node);
>lto_build_c_type_nodes ();
>gcc_assert (va_list_type_node);

But why this and the remaining hunks?  Nothing in the middle-end
needs it, IMHO it should be kept in c-family/.

> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index eec2d4f..6c52387 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -617,6 +617,7 @@ enum tree_index {
>TI_SIZE_TYPE,
>TI_PID_TYPE,
>TI_PTRDIFF_TYPE,
> +  TI_UNSIGNED_PTRDIFF_TYPE,
>TI_VA_LIST_TYPE,
>TI_VA_LIST_GPR_COUNTER_FIELD,
>TI_VA_LIST_FPR_COUNTER_FIELD,
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 62cd7bb..ae69d0d 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum tree_code 
> __code, int __i,
>  #define size_type_node  global_trees[TI_SIZE_TYPE]
>  #define pid_type_node   global_trees[TI_PID_TYPE]
>  #define ptrdiff_type_nodeglobal_trees[TI_PTRDIFF_TYPE]
> +#define unsigned_ptrdiff_type_node   global_trees[TI_UNSIGNED_PTRDIFF_TYPE]
>  #define va_list_type_nodeglobal_trees[TI_VA_LIST_TYPE]
>  #define va_list_gpr_counter_field
> global_trees[TI_VA_LIST_GPR_COUNTER_FIELD]
>  #define va_list_fpr_counter_field
> global_trees[TI_VA_LIST_FPR_COUNTER_FIELD]


Jakub


Re: [Patch, tentative] Use MOVE_MAX_PIECES instead of MOVE_MAX in gimple_fold_builtin_memory_op

2016-11-25 Thread Richard Biener
On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:

> 
> Richard Biener writes:
> 
> > On Thu, 24 Nov 2016, Richard Biener wrote:
> >
> >> On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:
> >> 
> >> > Hi,
> >> > 
> >> >   I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the 
> >> > avr
> >> >   target. I found that the (dump) failure is because there are 4
> >> >   instances of memcpy, while the testcase expects only 2 for a
> >> >   non-strict align target like the avr.
> >> > 
> >> >   Comparing that with a dump generated by x64_64-pc-linux, I found that
> >> >   the extra memcpy's come from the forwprop pass, when it replaces
> >> >   strcat with strlen and memcpy. For x86_64, the memcpy generated gets
> >> >   folded into a load/store in gimple_fold_builtin_memory_op. That
> >> >   doesn't happen for the avr because len (2) happens to be bigger than
> >> >   MOVE_MAX (1).
> >> > 
> >> >   The avr can only move 1 byte efficiently from reg <-> memory, but it's
> >> >   more efficient to load and store 2 bytes than to call memcpy, so
> >> >   MOVE_MAX_PIECES is set to 2.
> >> > 
> >> >   Given that gimple_fold_builtin_memory_op gets to choose between
> >> >   leaving the memcpy call as is, or breaking it down to a by-pieces
> >> >   move, shouldn't it use MOVE_MAX_PIECES instead of
> >> >   MOV_MAX?
> >> > 
> >> >   That is what the below patch does, and that makes the test
> >> >   pass. Does this sound right?
> >> 
> >> No, as we handle both memcpy and memmove this way we rely on
> >> the whole storage fit in a single register so we do the
> >> right thing for overlapping memory.
> >
> > So actually your patch doesn't chnage that, the ordering is ensured
> > by emitting a single GIMPLE load/store pair.  There are only
> > four targets defining MOVE_MAX_PIECES, and one (s390) even has
> > a smaller MOVE_MAX_PIECES than MOVE_MAX (huh).  AVR has larger
> > MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much
> > sense to me given their very similar description plus the
> > fact that AVR can only load a single byte at a time...
> >
> > The x86 comment says
> >
> > /* MOVE_MAX_PIECES is the number of bytes at a time which we can
> >move efficiently, as opposed to  MOVE_MAX which is the maximum
> >number of bytes we can move with a single instruction.
> >
> > which doesn't match up with
> >
> > @defmac MOVE_MAX
> > The maximum number of bytes that a single instruction can move quickly
> > between memory and registers or between two memory locations.
> > @end defmac
> >
> > note "quickly" here.  But OTOH
> >
> > @defmac MOVE_MAX_PIECES
> > A C expression used by @code{move_by_pieces} to determine the largest unit
> > a load or store used to copy memory is.  Defaults to @code{MOVE_MAX}.
> > @end defmac
> >
> > here the only difference is "copy memory".  But we try to special
> > case the one load - one store case, not generally "copy memory".
> >
> > So I think MOVE_MAX matches my intent when writing the code.
> 
> Ok, but isn't that inconsistent with tree-inline.c:estimate_move_cost, which
> considers MOVE_MAX_PIECES and MOVE_RATIO to decide between a libcall and
> by-pieces move?

Well, I don't understand why we have both MOVE_MAX and MOVE_MAX_PIECES.
There are exactly two uses of MOVE_MAX in GCC AFAICS, the gimple-fold.c
one and caller-save.c which derives MOVE_MAX_WORDS from it.  
MOVE_MAX_PIECES has the only use in block move expansion plus the
single use in tree-inline.c.

So I can't give a reason why one or the other should be more valid
but the tree-inline.c one tries to match memcpy expansion (obviously),
while the gimple-fold.c one tries to get at the maximum possible
single-insn move amount (and AVR is odd here having that lower than
MOVE_MAX_PIECES, compared to say s390 which has it the opposite way
around).

Richard.

> Regards
> Senthil
> 
> >
> > Richard.
> >
> >> Richard.
> >> 
> >> > Regards
> >> > Senthil
> >> > 
> >> > Index: gcc/gimple-fold.c
> >> > ===
> >> > --- gcc/gimple-fold.c(revision 242741)
> >> > +++ gcc/gimple-fold.c(working copy)
> >> > @@ -703,7 +703,7 @@
> >> >src_align = get_pointer_alignment (src);
> >> >dest_align = get_pointer_alignment (dest);
> >> >if (tree_fits_uhwi_p (len)
> >> > -  && compare_tree_int (len, MOVE_MAX) <= 0
> >> > +  && compare_tree_int (len, MOVE_MAX_PIECES) <= 0
> >> >/* ???  Don't transform copies from strings with known length 
> >> > this
> >> >   confuses the tree-ssa-strlen.c.  This doesn't handle
> >> >   the case in gcc.dg/strlenopt-8.c which is XFAILed for that
> >> > 
> >> > 
> >> 
> >> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)