Re: [PATCH] rs6000: Testcases for rl*i*
On Fri, Nov 25, 2016 at 8:51 PM, Segher Boessenkoolwrote: > These testcases test that we generate the expected code for all of the > rl*i* instructions, that is, rotate-and-mask and rotate-and-mask-insert > for immediate rotation counts. All the testcases do rotate, shift left, > as well as shift right; if that results in an instruction that does not > exist the testcases generate a multiplication instead, so that we can > detect if this is handled properly. > > Many 32-bit instructions zero-extend their result properly in 64-bit > mode, but the rs6000 port does not yet know. These testcases test the > status quo, so they will need updating when ever we handle this. > > The rlwimi-*.c tests all generate suboptimal code in 64-bit mode (some > end up as xor-and-xor, some as separate rotate and insert). A patch to > fix this will hit gcc-patches soon; these testcases will fail until it > is fixed. > > Is this okay for trunk? > > > Segher > > > 2016-11-26 Segher Boessenkool > > gcc/testsuite/ > * gcc.target/powerpc/rldic-0.c: New testcase. > * gcc.target/powerpc/rldic-1.c: New testcase. > * gcc.target/powerpc/rldic-2.c: New testcase. > * gcc.target/powerpc/rldicl-0.c: New testcase. > * gcc.target/powerpc/rldicl-1.c: New testcase. > * gcc.target/powerpc/rldicl-2.c: New testcase. > * gcc.target/powerpc/rldicr-0.c: New testcase. > * gcc.target/powerpc/rldicr-1.c: New testcase. > * gcc.target/powerpc/rldicr-2.c: New testcase. > * gcc.target/powerpc/rldicx.h: New file. > * gcc.target/powerpc/rldimi-0.c: New testcase. > * gcc.target/powerpc/rldimi-1.c: New testcase. > * gcc.target/powerpc/rldimi-2.c: New testcase. > * gcc.target/powerpc/rldimi.h: New file. > * gcc.target/powerpc/rlwimi-0.c: New testcase. > * gcc.target/powerpc/rlwimi-1.c: New testcase. > * gcc.target/powerpc/rlwimi-2.c: New testcase. > * gcc.target/powerpc/rlwimi.h: New file. > * gcc.target/powerpc/rlwinm-0.c: New testcase. > * gcc.target/powerpc/rlwinm-1.c: New testcase. > * gcc.target/powerpc/rlwinm-2.c: New testcase. > * gcc.target/powerpc/rlwinm.h: New file. Okay. I assume that you want to add the testcases after the fix for rlwimi. Thanks, David
[PATCH] rs6000: Testcases for rl*i*
These testcases test that we generate the expected code for all of the rl*i* instructions, that is, rotate-and-mask and rotate-and-mask-insert for immediate rotation counts. All the testcases do rotate, shift left, as well as shift right; if that results in an instruction that does not exist the testcases generate a multiplication instead, so that we can detect if this is handled properly. Many 32-bit instructions zero-extend their result properly in 64-bit mode, but the rs6000 port does not yet know. These testcases test the status quo, so they will need updating when ever we handle this. The rlwimi-*.c tests all generate suboptimal code in 64-bit mode (some end up as xor-and-xor, some as separate rotate and insert). A patch to fix this will hit gcc-patches soon; these testcases will fail until it is fixed. Is this okay for trunk? Segher 2016-11-26 Segher Boessenkoolgcc/testsuite/ * gcc.target/powerpc/rldic-0.c: New testcase. * gcc.target/powerpc/rldic-1.c: New testcase. * gcc.target/powerpc/rldic-2.c: New testcase. * gcc.target/powerpc/rldicl-0.c: New testcase. * gcc.target/powerpc/rldicl-1.c: New testcase. * gcc.target/powerpc/rldicl-2.c: New testcase. * gcc.target/powerpc/rldicr-0.c: New testcase. * gcc.target/powerpc/rldicr-1.c: New testcase. * gcc.target/powerpc/rldicr-2.c: New testcase. * gcc.target/powerpc/rldicx.h: New file. * gcc.target/powerpc/rldimi-0.c: New testcase. * gcc.target/powerpc/rldimi-1.c: New testcase. * gcc.target/powerpc/rldimi-2.c: New testcase. * gcc.target/powerpc/rldimi.h: New file. * gcc.target/powerpc/rlwimi-0.c: New testcase. * gcc.target/powerpc/rlwimi-1.c: New testcase. * gcc.target/powerpc/rlwimi-2.c: New testcase. * gcc.target/powerpc/rlwimi.h: New file. * gcc.target/powerpc/rlwinm-0.c: New testcase. * gcc.target/powerpc/rlwinm-1.c: New testcase. * gcc.target/powerpc/rlwinm-2.c: New testcase. * gcc.target/powerpc/rlwinm.h: New file. --- gcc/testsuite/gcc.target/powerpc/rldic-0.c | 16 gcc/testsuite/gcc.target/powerpc/rldic-1.c | 17 gcc/testsuite/gcc.target/powerpc/rldic-2.c | 16 gcc/testsuite/gcc.target/powerpc/rldicl-0.c | 17 gcc/testsuite/gcc.target/powerpc/rldicl-1.c | 16 gcc/testsuite/gcc.target/powerpc/rldicl-2.c | 17 gcc/testsuite/gcc.target/powerpc/rldicr-0.c | 15 gcc/testsuite/gcc.target/powerpc/rldicr-1.c | 16 gcc/testsuite/gcc.target/powerpc/rldicr-2.c | 15 gcc/testsuite/gcc.target/powerpc/rldicx.h | 117 gcc/testsuite/gcc.target/powerpc/rldimi-0.c | 15 gcc/testsuite/gcc.target/powerpc/rldimi-1.c | 15 gcc/testsuite/gcc.target/powerpc/rldimi-2.c | 15 gcc/testsuite/gcc.target/powerpc/rldimi.h | 106 + gcc/testsuite/gcc.target/powerpc/rlwimi-0.c | 20 + gcc/testsuite/gcc.target/powerpc/rlwimi-1.c | 20 + gcc/testsuite/gcc.target/powerpc/rlwimi-2.c | 18 + gcc/testsuite/gcc.target/powerpc/rlwimi.h | 91 ++ gcc/testsuite/gcc.target/powerpc/rlwinm-0.c | 19 + gcc/testsuite/gcc.target/powerpc/rlwinm-1.c | 19 + gcc/testsuite/gcc.target/powerpc/rlwinm-2.c | 19 + gcc/testsuite/gcc.target/powerpc/rlwinm.h | 92 ++ 22 files changed, 711 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldic-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicl-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicr-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldicx.h create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rldimi.h create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwimi.h create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm.h diff --git a/gcc/testsuite/gcc.target/powerpc/rldic-0.c b/gcc/testsuite/gcc.target/powerpc/rldic-0.c new file mode
Re: [PATCH 3/9] Add option -moutline-msabi-xlogues
Thank you very much for your review! On 11/25/2016 05:51 PM, Sandra Loosemore wrote: On 11/22/2016 10:19 PM, Daniel Santos wrote: diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 9eef558..f556978 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -528,6 +528,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI) EnumValue Enum(calling_abi) String(ms) Value(MS_ABI) +moutline-msabi-xlogues +Target Report Mask(OUTLINE_MSABI_XLOGUES) Save +Reduces function size by using out-of-line stubs to save & restore registers +clobberd by differences in Microsoft and System V ABIs. + Just as a suggestion (I'm not an i386 maintainer), I'd recommend spelling the name of this option -mno-inline-msabi-xlogues instead of -moutline-msabi-xlogues, and making the default -minline-msabi-xlogues. That is certainly more intuitive ("outline" or "out-of-line" vs "inline"). I guess I'm not sure what concept is more important, that it's not inline or that it's using stubs. Probably the former. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8e2f466..4706085 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -25004,6 +25004,15 @@ You can control this behavior for specific functions by using the function attributes @code{ms_abi} and @code{sysv_abi}. @xref{Function Attributes}. +@item -moutline-msabi-xlogues +@itemx -mno-outline-msabi-xlogues By convention, we only list the option form that is not the default OK. This is my first time to work on gcc, so still learning all of the conventions. +@opindex moutline-msabi-xlogues ...but we should have index entries for both. So basically I got that backwards. :) +Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a +SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, emitting +fairly lengthy prologues & epilogues. This option generates prologues & +epilogues that instead call stubs in libgcc to perform these saves & restores, +thus reducing function size at the cost of a few extra instructions. Please use the word "and" in all three locations here, instead of "&". -Sandra Now that I think of it, would "System V" be better than just "SysV"? This is very helpful, thanks again. I'm working on a meaningful testsuite, so I suppose I should just submit a v3 with these changes when I'm done with the tests and hopefully I'll have a little more feedback on the back end & assembly as well. Daniel
Re: [PATCH 3/9] Add option -moutline-msabi-xlogues
On 11/22/2016 10:19 PM, Daniel Santos wrote: diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 9eef558..f556978 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -528,6 +528,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI) EnumValue Enum(calling_abi) String(ms) Value(MS_ABI) +moutline-msabi-xlogues +Target Report Mask(OUTLINE_MSABI_XLOGUES) Save +Reduces function size by using out-of-line stubs to save & restore registers +clobberd by differences in Microsoft and System V ABIs. + Just as a suggestion (I'm not an i386 maintainer), I'd recommend spelling the name of this option -mno-inline-msabi-xlogues instead of -moutline-msabi-xlogues, and making the default -minline-msabi-xlogues. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8e2f466..4706085 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -25004,6 +25004,15 @@ You can control this behavior for specific functions by using the function attributes @code{ms_abi} and @code{sysv_abi}. @xref{Function Attributes}. +@item -moutline-msabi-xlogues +@itemx -mno-outline-msabi-xlogues By convention, we only list the option form that is not the default +@opindex moutline-msabi-xlogues ...but we should have index entries for both. +Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a +SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, emitting +fairly lengthy prologues & epilogues. This option generates prologues & +epilogues that instead call stubs in libgcc to perform these saves & restores, +thus reducing function size at the cost of a few extra instructions. Please use the word "and" in all three locations here, instead of "&". -Sandra
Re: [PATCH] avoid calling alloca(0)
On 11/23/2016 06:15 PM, Martin Sebor wrote: gcc_assert works only in some instances (e.g., in c-ada-spec.c:191) but not in others because some actually do make the alloca(0) call at runtime: at a minimum, lto.c:3285, reg-stack.c:2008, and tree-ssa-threadedge.c:344 assert during bootstrap. You might have the wrong line number of reg-stack.c and lto. You've pointed to the start of subst_asm_stack_regs and lto_main respectively. It'd probably be better if you posted the line with a bit of context. I can trivially fix the threadedge issue. Jeff
Re: [PATCH] avoid calling alloca(0)
On 11/24/2016 12:42 AM, Jakub Jelinek wrote: After reviewing a few more of the XALLOCAVEC calls in the affected files I suspect that those to alloca(0) pointed out by the warning may be just a subset that GCC happens to see thanks to constant propagation. If that's so then changing this subset to alloca(N + !N) or some such is probably not the right approach because it will miss all the other calls where GCC doesn't see that N is zero. I think the most robust solution is to do as Bernd suggests: change XALLOCAVEC as shown above. That will let us prevent any and all unsafe assumptions about the result of alloca(0) being either non-null or distinct. The other approach would be to change XALLOCAVEC to add 1 to the byte count. That would be in line with what XMALLOC does. I still fail to see why you want to change anything at least for hosts where you know XALLOCAVEC is __builtin_alloca. Show me one place which assumes the result of alloca (0) in gcc sources is distinct, or non-NULL. For 0 elements the pointer simply isn't used. And whether for the malloc based alloca it GCs or not really doesn't matter for us. I think for host/build, we ought to assume that alloca is __builtin_alloca. I think we stopped supporting the alloca-on-top-of-malloc host/build systems long ago. But I still think we ought to be "clean" in regard to zero sized allocations. It sounds like an assert may not be sufficient, so we need to look at another approach. Jeff
Re: Pointer Bounds Checker and trailing arrays (PR68270)
2016-11-25 15:47 GMT+03:00 Alexander Ivchenko: > Hi, > > The patch below addresses PR68270. could you please take a look? > > 2016-11-25 Alexander Ivchenko > >* c-family/c.opt (flag_chkp_flexible_struct_trailing_arrays): >Add new option. >* tree-chkp.c (chkp_parse_array_and_component_ref): Forbid >narrowing when chkp_parse_array_and_component_ref is used and >the ARRAY_REF points to an array in the end of the struct. > > > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 7d8a726..e45d6a2 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -1166,6 +1166,11 @@ C ObjC C++ ObjC++ LTO RejectNegative Report > Var(flag_chkp_narrow_to_innermost_ar > Forces Pointer Bounds Checker to use bounds of the innermost arrays in case > of > nested static arryas access. By default outermost array is used. > > +fchkp-flexible-struct-trailing-arrays > +C ObjC C++ ObjC++ LTO RejectNegative Report > Var(flag_chkp_flexible_struct_trailing_arrays) > +Allow Pointer Bounds Checker to treat all trailing arrays in structures as > +possibly flexible. Words 'allow' and 'possibly' are confusing here. This option is about to force checker to do something, not to give him a choice. New option has to be documented in invoke.texi. It would also be nice to reflect changes on GCC MPX wiki page. We have an attribute to change compiler behavior when this option is not set. But we have no way to make exceptions when this option is used. Should we add one? > + > fchkp-optimize > C ObjC C++ ObjC++ LTO Report Var(flag_chkp_optimize) Init(-1) > Allow Pointer Bounds Checker optimizations. By default allowed > diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c > index 2769682..40f99c3 100644 > --- a/gcc/tree-chkp.c > +++ b/gcc/tree-chkp.c > @@ -3425,7 +3425,9 @@ chkp_parse_array_and_component_ref (tree node, tree > *ptr, >if (flag_chkp_narrow_bounds >&& !flag_chkp_narrow_to_innermost_arrray >&& (!last_comp > - || chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1 > + || (chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1)) > + && !(flag_chkp_flexible_struct_trailing_arrays > + && array_at_struct_end_p (var) This is incorrect place for fix. Consider code struct S { int a; int b[10]; }; struct S s; int *p = s.b; Here you need to compute bounds for p and you want your option to take effect but in this case you won't event reach your new check because there is no ARRAY_REF. And even if we change it to int *p = s.b[5]; then it still would be narrowed because s.b would still be written into 'comp_to_narrow' variable. Correct place for fix is in chkp_may_narrow_to_field. Also you should consider fchkp-narrow-to-innermost-arrray option. Should it be more powerfull or not? I think fchkp-narrow-to-innermost-arrray shouldn't narrow to variable sized fields. BTW looks like right now bnd_variable_size attribute is ignored by fchkp-narrow-to-innermost-arrray. This is another problem and may be fixed in another patch though. Also patch lacks tests for various situations (with option and without, with ARRAY_REF and without). In case of new attribute and fix for fchkp-narrow-to-innermost-arrray behavior additional tests are required. -- Ilya > { >comp_to_narrow = last_comp; >break;
[PATCH, fortran, committed] Fix documentation typo
Hi, committed the following as obvious: Index: ChangeLog === --- ChangeLog (revision 242883) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2016-11-25 Janne Blomqvist+ + * intrinsic.texi: Fix ptrdiff_t typo in ISO_C_BINDING constants + table. + 2016-11-25 Janus Weil PR fortran/60853 Index: intrinsic.texi === --- intrinsic.texi (revision 242883) +++ intrinsic.texi (working copy) @@ -14697,7 +14697,7 @@ Furthermore, if @code{__float128} is sup @item @code{INTEGER}@tab @code{C_INT_FAST128_T} @tab @code{int_fast128_t} @tab Ext. @item @code{INTEGER}@tab @code{C_INTMAX_T} @tab @code{intmax_t} @item @code{INTEGER}@tab @code{C_INTPTR_T} @tab @code{intptr_t} -@item @code{INTEGER}@tab @code{C_PTRDIFF_T} @tab @code{intptr_t} @tab TS 29113 +@item @code{INTEGER}@tab @code{C_PTRDIFF_T} @tab @code{ptrdiff_t} @tab TS 29113 @item @code{REAL} @tab @code{C_FLOAT} @tab @code{float} @item @code{REAL} @tab @code{C_DOUBLE}@tab @code{double} @item @code{REAL} @tab @code{C_LONG_DOUBLE} @tab @code{long double} -- Janne Blomqvist
Hurd port for gcc go PATCH 19-23(23)
* src_libgo_runtime_netpoll.goc.diff: Fix restricted word bug. Rename variable errno to errno1. * src_libgo_go_os_os_test.go.diff: Allow EFBIG return code to big seeks. * src_libgo_go_syscall_syscall_gnu_test.go: New file: Define Type and Whence as 32bit in syscall.Flock_t * src_libgo_testsuite_gotest.diff: Remove comm option for ps -o. * add-gnu-to-libgo-headers.diff: Add gnu to +build entry in file headers included in the build. FIXME: Index: gcc-6-6.2.1-4.1/src/libgo/go/archive/tar/stat_atim.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/archive/tar/stat_atim.go +++ gcc-6-6.2.1-4.1/src/libgo/go/archive/tar/stat_atim.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build linux dragonfly openbsd solaris +// +build linux dragonfly openbsd solaris gnu package tar Index: gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_resnew.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/cgo_resnew.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_resnew.go @@ -3,7 +3,7 @@ // license that can be found in the LICENSE file. // +build cgo,!netgo -// +build darwin linux,!android netbsd solaris +// +build darwin linux,!android netbsd solaris gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_socknew.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/cgo_socknew.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/cgo_socknew.go @@ -3,7 +3,7 @@ // license that can be found in the LICENSE file. // +build cgo,!netgo -// +build android linux solaris +// +build android linux solaris gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/net/hook_cloexec.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/hook_cloexec.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/hook_cloexec.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build freebsd linux +// +build freebsd linux gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/net/main_unix_test.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/main_unix_test.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/main_unix_test.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build darwin dragonfly freebsd linux nacl netbsd openbsd solaris +// +build darwin dragonfly freebsd linux nacl netbsd openbsd solaris gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sock_cloexec.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/sock_cloexec.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/sock_cloexec.go @@ -5,7 +5,7 @@ // This file implements sysSocket and accept for platforms that // provide a fast path for setting SetNonblock and CloseOnExec. -// +build freebsd linux +// +build freebsd linux gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sockoptip_posix.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/net/sockoptip_posix.go +++ gcc-6-6.2.1-4.1/src/libgo/go/net/sockoptip_posix.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build darwin dragonfly freebsd linux netbsd openbsd windows +// +build darwin dragonfly freebsd linux netbsd openbsd windows gnu package net Index: gcc-6-6.2.1-4.1/src/libgo/go/syscall/exec_unix.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/syscall/exec_unix.go +++ gcc-6-6.2.1-4.1/src/libgo/go/syscall/exec_unix.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build darwin dragonfly freebsd linux netbsd openbsd solaris +// +build darwin dragonfly freebsd linux netbsd openbsd solaris gnu // Fork, exec, wait, etc. Index: gcc-6-6.2.1-4.1/src/libgo/runtime/netpoll_select.c === --- gcc-6-6.2.1-4.1.orig/src/libgo/runtime/netpoll_select.c +++ gcc-6-6.2.1-4.1/src/libgo/runtime/netpoll_select.c @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build solaris +// +build solaris gnu #include "config.h" Index: gcc-6-6.2.1-4.1/src/libgo/go/os/exec/lp_unix.go === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/os/exec/lp_unix.go +++ gcc-6-6.2.1-4.1/src/libgo/go/os/exec/lp_unix.go @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be
Hurd port for gcc go PATCH 11-18(23)
* src_libgo_go_net_sendfile_gnu.go.diff: New file * src_libgo_go_net_sock_gnu.go.diff: New file * src_libgo_go_net_sockopt_gnu.go.diff: New file * src_libgo_go_net_sockoptip_gnu.go.diff: New file * src_libgo_go_syscall_libcall_gnu_386.go.diff: New file * src_libgo_go_syscall_libcall_gnu.go.diff: New file * src_libgo_go_syscall_libcall_posix-1.go.diff: New file derived from libcall_posix.go Removed the mount call for GNU/Hurd, it exists but use translators. Removed the mlockall/munlockall calls for GNU/Hurd, not yet implemented. Removed the madvise call for GNU/Hurd, not yet implemented. * src_libgo_go_syscall_socket_gnu.go.diff: New fileIndex: gcc-6-6.2.1-4.1/src/libgo/go/net/sendfile_gnu.go === --- /dev/null +++ gcc-6-6.2.1-4.1/src/libgo/go/net/sendfile_gnu.go @@ -0,0 +1,79 @@ +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package net + +import ( + "io" + "os" + "syscall" +) + +// maxSendfileSize is the largest chunk size we ask the kernel to copy +// at a time. +const maxSendfileSize int = 4 << 20 + +// sendFile copies the contents of r to c using the sendfile +// system call to minimize copies. +// +// if handled == true, sendFile returns the number of bytes copied and any +// non-EOF error. +// +// if handled == false, sendFile performed no work. +func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) { + var remain int64 = 1 << 62 // by default, copy until EOF + + lr, ok := r.(*io.LimitedReader) + if ok { + remain, r = lr.N, lr.R + if remain <= 0 { + return 0, nil, true + } + } + f, ok := r.(*os.File) + if !ok { + return 0, nil, false + } + + if err := c.writeLock(); err != nil { + return 0, err, true + } + defer c.writeUnlock() + + dst := c.sysfd + src := int(f.Fd()) + for remain > 0 { + n := maxSendfileSize + if int64(n) > remain { + n = int(remain) + } + n, err1 := syscall.Sendfile(dst, src, nil, n) + if n > 0 { + written += int64(n) + remain -= int64(n) + } + if n == 0 && err1 == nil { + break + } + if err1 == syscall.EAGAIN { + if err1 = c.pd.WaitWrite(); err1 == nil { +continue + } + } + if err1 != nil { + // This includes syscall.ENOSYS (no kernel + // support) and syscall.EINVAL (fd types which + // don't implement sendfile) + err = err1 + break + } + } + if lr != nil { + lr.N = remain + } + if err != nil { + err = os.NewSyscallError("sendfile", err) + } + return written, err, written > 0 +} Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sock_gnu.go === --- /dev/null +++ gcc-6-6.2.1-4.1/src/libgo/go/net/sock_gnu.go @@ -0,0 +1,14 @@ +// Copyright 2014 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build gnu + +package net + +import "syscall" + +func maxListenerBacklog() int { + // From /usr/include/i386-gnu/bits/socket.h + return syscall.SOMAXCONN +} Index: gcc-6-6.2.1-4.1/src/libgo/go/net/sockopt_gnu.go === --- /dev/null +++ gcc-6-6.2.1-4.1/src/libgo/go/net/sockopt_gnu.go @@ -0,0 +1,45 @@ +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build gnu + +package net + +import ( + "os" + "syscall" +) + +func setDefaultSockopts(s, family, sotype int, ipv6only bool) error { + if family == syscall.AF_INET6 && sotype != syscall.SOCK_RAW { + // Allow both IP versions even if the OS default + // is otherwise. Note that some operating systems + // never admit this option. + syscall.SetsockoptInt(s, syscall.IPPROTO_IPV6, syscall.IPV6_V6ONLY, boolint(ipv6only)) + } + // Allow broadcast. + return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_BROADCAST, 1)) +} + +func setDefaultListenerSockopts(s int) error { + // Allow reuse of recently-used addresses. + return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1)) +} + +func setDefaultMulticastSockopts(s int) error { + // Allow multicast UDP and raw IP datagram sockets to listen + // concurrently across multiple listeners. + if err := syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1); err != nil { + return os.NewSyscallError("setsockopt", err) + } + // Allow reuse of recently-used ports. + // This option is supported only in descendants of 4.4BSD, + // to make an effective multicast application that requires + // quick draw possible. + // Not supported on GNU/Hurd + //if syscall.SO_REUSEPORT != 0 { + // return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEPORT, 1)) + //} + return nil +}
Hurd port for gcc go PATCH 1-4(23)
Hi, Attached are patches to enable gccgo to build properly on Debian GNU/Hurd on gcc-6 (6-6.2.1-5). The first three patches are Debian-specific: * debian_rules.defs.diff: Enables build of gccgo for GNU/Hurd Define patches for the generated series file: * debian_rules.patch.diff: Enables split-stack support. * debian_rules.patch.diff: Does not enable split-stack support. * src_gcc_config_i386_gnu.h.diff: Enable split-stack support The test suite results are as follows: Without split-stack support: === go Summary === # of expected passes7359 # of unexpected failures7 # of expected failures 1 # of untested testcases 10 # of unsupported tests 2 === libgo Summary === # of expected passes121 # of unexpected failures13 With split-stack support: === go Summary === # of expected passes7366 # of unexpected failures8 # of expected failures 1 # of untested testcases 6 # of unsupported tests 2 === libgo Summary === # of expected passes120 # of unexpected failures14 All failing go tests and more than half of the libgo tests are runtime errors due to exception handling not working as expected: Aborted runtime_sighandler ... fatal error: unexpected signal during runtime execution panic: runtime error: invalid memory address or nil pointer dereference The above problems are probably due to some remaining issues in gnumach/hurd/glibc to be solved. According to Samuel Thibault these problems can be handled late when the patches are accepted upstream or in Debian gcc. Another more annoying gnumch/hurd/glibc bug is that the built program go (go-6 in Debian) gets killed when executed from the shell vi path, but not when issued directly: /usr/bin/go-6 works fine. go-6 Segmentation fault (core dumped) gdb /usr/bin/go-6 -c ./core warning: Unexpected size of section `.reg2/16883' in core file. Core was generated by `go-6'. Program terminated with signal SIGSEGV, Segmentation fault. warning: Unexpected size of section `.reg2/16883' in core file. #0 0x01e854ae in ?? () (gdb) bt #0 0x01e854ae in ?? () #1 0x in ?? () Nevertheless, it seems like not so much is left for gccgo working properly on GNU/Hurd. Thanks! --- a/debian/rules.defs 2014-01-07 11:10:44.0 +0100 +++ b/debian/rules.defs 2014-01-07 11:23:47.0 +0100 @@ -933,7 +933,7 @@ ifeq (,$(filter $(distrelease),lenny etch squeeze dapper hardy jaunty karmic lucid maverick natty oneiric)) go_no_cpus := $(filter-out arm, $(go_no_cpus)) endif -go_no_systems := gnu kfreebsd-gnu +go_no_systems := kfreebsd ifneq ($(with_base_only),yes) ifneq ($(separate_lang),yes) @@ -943,7 +943,7 @@ ifneq (,$(filter $(DEB_TARGET_ARCH_CPU),$(go_no_cpus))) with_go := disabled for cpu $(DEB_TARGET_ARCH_CPU) endif -ifneq (,$(findstring $(DEB_TARGET_GNU_SYSTEM),$(go_no_systems))) +ifneq (,$(findstring $(DEB_TARGET_ARCH_OS),$(go_no_systems))) with_go := disabled for system $(DEB_TARGET_GNU_SYSTEM) endif ifeq ($(go_no_cross)-$(DEB_CROSS),yes-yes) --- a/debian/rules.patch.orig 2016-11-12 16:50:46.0 +0100 +++ b/debian/rules.patch 2016-11-12 16:55:24.0 +0100 @@ -276,6 +276,27 @@ ifeq ($(DEB_TARGET_ARCH_OS),hurd) debian_patches += hurd-changes + debian_patches += \ + src_gcc_config_i386_gnu.h \ + src_libgo_configure.ac \ + src_libgo_go_net_sendfile_gnu.go \ + src_libgo_go_net_sock_gnu.go \ + src_libgo_go_net_sockopt_gnu.go \ + src_libgo_go_net_sockoptip_gnu.go \ + src_libgo_go_syscall_libcall_gnu_386.go \ + src_libgo_go_syscall_libcall_gnu.go \ + src_libgo_go_syscall_libcall_posix-1.go \ + src_libgo_go_syscall_socket_gnu.go \ + src_libgo_go_syscall_wait.c \ + src_libgo_Makefile.am \ + src_libgo_Makefile.in \ + src_libgo_mksysinfo.sh \ + src_libgo_runtime_getncpu-gnu.c \ + src_libgo_runtime_netpoll.goc \ + src_libgo_go_os_os_test.go \ + src_libgo_go_syscall_syscall_gnu_test.go \ + src_libgo_testsuite_gotest \ + add-gnu-to-libgo-headers endif debian_patches += gcc-ice-dump --- a/debian/rules.patch.orig 2016-11-12 16:50:46.0 +0100 +++ b/debian/rules.patch 2016-11-12 16:55:24.0 +0100 @@ -276,6 +276,26 @@ ifeq ($(DEB_TARGET_ARCH_OS),hurd) debian_patches += hurd-changes + debian_patches += \ + src_libgo_configure.ac \ + src_libgo_go_net_sendfile_gnu.go \ + src_libgo_go_net_sock_gnu.go \ + src_libgo_go_net_sockopt_gnu.go \ + src_libgo_go_net_sockoptip_gnu.go \ + src_libgo_go_syscall_libcall_gnu_386.go \ +
Hurd port for gcc go PATCH 5-10(23)
* src_libgo_configure.ac.diff: Define GOOS=gnu and LIBGO_IS_GNU * src_libgo_Makefile.am.diff: Add support for GNU/Hurd. * src_libgo_Makefile.in.diff: Update accordingly with autreconf2.64 * src_libgo_mksysinfo.sh.diff: Define SYS_IOCTL to 0 if not defined, Fix: #define EWOULDBLOCK EAGAIN in Fix: #define st_dev st_fsid in Index: gcc-6-6.2.1-4.1/src/libgo/configure.ac === --- gcc-6-6.2.1-4.1.orig/src/libgo/configure.ac +++ gcc-6-6.2.1-4.1/src/libgo/configure.ac @@ -151,6 +151,7 @@ is_openbsd=no is_dragonfly=no is_rtems=no is_solaris=no +is_gnu=no GOOS=unknown case ${host} in *-*-darwin*) is_darwin=yes; GOOS=darwin ;; @@ -162,6 +163,7 @@ case ${host} in *-*-dragonfly*) is_dragonfly=yes; GOOS=dragonfly ;; *-*-rtems*)is_rtems=yes; GOOS=rtems ;; *-*-solaris2*) is_solaris=yes; GOOS=solaris ;; + *-*-gnu*) is_gnu=yes; GOOS=gnu ;; esac AM_CONDITIONAL(LIBGO_IS_DARWIN, test $is_darwin = yes) AM_CONDITIONAL(LIBGO_IS_FREEBSD, test $is_freebsd = yes) @@ -172,6 +174,7 @@ AM_CONDITIONAL(LIBGO_IS_OPENBSD, test $i AM_CONDITIONAL(LIBGO_IS_DRAGONFLY, test $is_dragonfly = yes) AM_CONDITIONAL(LIBGO_IS_RTEMS, test $is_rtems = yes) AM_CONDITIONAL(LIBGO_IS_SOLARIS, test $is_solaris = yes) +AM_CONDITIONAL(LIBGO_IS_GNU, test $is_gnu = yes) AC_SUBST(GOOS) dnl Test whether we need to use DejaGNU or whether we can use the Index: gcc-6-6.2.1-4.1/src/libgo/go/syscall/wait.c === --- gcc-6-6.2.1-4.1.orig/src/libgo/go/syscall/wait.c +++ gcc-6-6.2.1-4.1/src/libgo/go/syscall/wait.c @@ -8,6 +8,9 @@ OS-independent. */ #include +#ifndef WCONTINUED +#define WCONTINUED 0 +#endif #include #include "runtime.h" Index: gcc-6-6.2.1-4.1/src/libgo/Makefile.am === --- gcc-6-6.2.1-4.1.orig/src/libgo/Makefile.am +++ gcc-6-6.2.1-4.1/src/libgo/Makefile.am @@ -419,6 +419,9 @@ else if LIBGO_IS_NETBSD runtime_getncpu_file = runtime/getncpu-bsd.c else +if LIBGO_IS_GNU +runtime_getncpu_file = runtime/getncpu-gnu.c +else runtime_getncpu_file = runtime/getncpu-none.c endif endif @@ -426,6 +429,7 @@ endif endif endif endif +endif if LIBGO_IS_LINUX runtime_netpoll_files = runtime/netpoll_epoll.c @@ -433,9 +437,13 @@ else if LIBGO_IS_SOLARIS runtime_netpoll_files = runtime/netpoll_select.c else +if LIBGO_IS_GNU +runtime_netpoll_files = runtime/netpoll_select.c +else runtime_netpoll_files = runtime/netpoll_kqueue.c endif endif +endif runtime_files = \ runtime/go-append.c \ @@ -744,6 +752,14 @@ go_net_sockoptip_file = go/net/sockoptip go_net_cgo_sock_file = go/net/cgo_sockold.go go_net_cgo_res_file = go/net/cgo_resnew.go else +if LIBGO_IS_GNU +go_net_cgo_file = go/net/cgo_linux.go +go_net_sock_file = go/net/sock_gnu.go +go_net_sockopt_file = go/net/sockopt_gnu.go +go_net_sockoptip_file = go/net/sockoptip_gnu.go go/net/sockoptip_posix.go +go_net_cgo_sock_file = go/net/cgo_socknew.go +go_net_cgo_res_file = go/net/cgo_resnew.go +else go_net_cgo_file = go/net/cgo_bsd.go go_net_sock_file = go/net/sock_bsd.go go_net_sockopt_file = go/net/sockopt_bsd.go @@ -755,6 +771,7 @@ endif endif endif endif +endif if LIBGO_IS_LINUX go_net_sendfile_file = go/net/sendfile_linux.go @@ -768,11 +785,15 @@ else if LIBGO_IS_SOLARIS go_net_sendfile_file = go/net/sendfile_solaris.go else +if LIBGO_IS_GNU +go_net_sendfile_file = go/net/sendfile_gnu.go +else go_net_sendfile_file = go/net/sendfile_stub.go endif endif endif endif +endif if LIBGO_IS_LINUX go_net_interface_file = go/net/interface_linux.go @@ -794,9 +815,13 @@ else if LIBGO_IS_FREEBSD go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go else +if LIBGO_IS_GNU +go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go +else go_net_cloexec_file = go/net/sys_cloexec.go endif endif +endif if LIBGO_IS_OPENBSD go_net_tcpsockopt_file = go/net/tcpsockopt_openbsd.go @@ -889,9 +914,13 @@ else if LIBGO_IS_LINUX go_os_dir_file = go/os/dir_largefile.go else +if LIBGO_IS_GNU +go_os_dir_file = go/os/dir_largefile.go +else go_os_dir_file = go/os/dir_regfile.go endif endif +endif if LIBGO_IS_DARWIN go_os_getwd_file = go/os/getwd_darwin.go @@ -911,11 +940,15 @@ else if LIBGO_IS_RTEMS go_os_sys_file = go/os/sys_uname.go else +if LIBGO_IS_GNU +go_os_sys_file = go/os/sys_uname.go +else go_os_sys_file = go/os/sys_bsd.go endif endif endif endif +endif if LIBGO_IS_FREEBSD go_os_cloexec_file = go/os/sys_freebsd.go @@ -937,6 +970,9 @@ else if LIBGO_IS_LINUX go_os_stat_file = go/os/stat_atim.go else +if LIBGO_IS_GNU +go_os_stat_file = go/os/stat_atim.go +else if LIBGO_IS_OPENBSD go_os_stat_file = go/os/stat_atim.go else @@ -960,12 +996,17 @@ endif endif endif endif +endif if LIBGO_IS_LINUX go_os_pipe_file = go/os/pipe_linux.go else +if LIBGO_IS_GNU +go_os_pipe_file = go/os/pipe_linux.go +else
Re: [PATCH] avoid calling alloca(0)
On 11/24/2016 12:47 AM, Jakub Jelinek wrote: On Wed, Nov 23, 2016 at 02:27:05PM -0700, Jeff Law wrote: I believe we should be warning on trying to allocation 0 bytes of memory via malloc, realloc or alloca, with the exception of a non-builtin alloca with no return value, but I think we've covered that elsewhere and Martin's code will DTRT more by accident than design. But we aren't going to warn on all places which might call alloca at runtime with 0 arguments, you would need runtime instrumentation for that (like ubsan). You are going to warn about explicit cases and random subset of the other ones where the user is just unlucky enough that the compiler threaded some jumps etc. For the explicit ptr = alloca (0) it is easy to get rid of it, for the implicit one one has to pessimize code if they want to avoid the warning. No, it's a compile-time warning when the 0 size argument can be exposed. It's not unlike many other warnings in GCC. jeff
[PATCH] Optimiza aggregate a = b = c = {} (PR c/78408)
Hi! This patch optimizes a = {}; b = a; into a = {}; b = {}; and similarly for memset instead of the first stmt and/or memcpy instead of the second one. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-25 Jakub JelinekPR c/78408 * tree-ssa-ccp.c: Include tree-dfa.h. (optimize_memcpy): New function. (pass_fold_builtins::execute): Use it. Remove useless conditional break after BUILT_IN_VA_*. * gcc.dg/pr78408.c: New test. --- gcc/tree-ssa-ccp.c.jj 2016-11-18 20:04:27.0 +0100 +++ gcc/tree-ssa-ccp.c 2016-11-25 17:54:26.862166658 +0100 @@ -143,6 +143,7 @@ along with GCC; see the file COPYING3. #include "stor-layout.h" #include "optabs-query.h" #include "tree-ssa-ccp.h" +#include "tree-dfa.h" /* Possible lattice values. */ typedef enum @@ -2928,6 +2929,113 @@ optimize_atomic_bit_test_and (gimple_stm release_ssa_name (lhs); } +/* Optimize + a = {}; + b = a; + into + a = {}; + b = {}; + Similarly for memset (, ..., sizeof (a)); instead of a = {}; + and/or memcpy (, , sizeof (a)); instead of b = a; */ + +static void +optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, tree src) +{ + gimple *stmt = gsi_stmt (*gsip); + if (gimple_has_volatile_ops (stmt) + || TREE_THIS_VOLATILE (dest) + || TREE_THIS_VOLATILE (src)) +return; + + tree vuse = gimple_vuse (stmt); + if (vuse == NULL) +return; + + gimple *defstmt = SSA_NAME_DEF_STMT (vuse); + tree src2 = NULL_TREE; + tree val = integer_zero_node; + if (gimple_store_p (defstmt) + && gimple_assign_single_p (defstmt) + && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR + && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (defstmt)) == 0 + && !gimple_clobber_p (defstmt)) +src2 = gimple_assign_lhs (defstmt); + else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET) + && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR + && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST + && TREE_CODE (gimple_call_arg (defstmt, 2)) == INTEGER_CST) +{ + HOST_WIDE_INT ssize, max_size, off; + bool reverse; + src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0); + get_ref_base_and_extent (src2, , , _size, ); + if (ssize != max_size + || (ssize % BITS_PER_UNIT) != 0 + || !wi::eq_p (gimple_call_arg (defstmt, 2), ssize / BITS_PER_UNIT)) + src2 = NULL_TREE; + else + { + val = gimple_call_arg (defstmt, 1); + if (!integer_zerop (val) && is_gimple_assign (stmt)) + src2 = NULL_TREE; + } +} + + if (src2 == NULL_TREE) +return; + + if (!operand_equal_p (src, src2, 0)) +{ + /* Handle also +a = {}; +MEM[(char * {ref-all})] = MEM[(char * {ref-all})]; */ + if (is_gimple_assign (stmt) + && TREE_CODE (src) == MEM_REF + && integer_zerop (TREE_OPERAND (src, 1)) + && TREE_CODE (TREE_OPERAND (src, 0)) == ADDR_EXPR + && DECL_P (src2) + && operand_equal_p (TREE_OPERAND (TREE_OPERAND (src, 0), 0), + src2, 0) + && tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (src)), +DECL_SIZE (src2))) + src = TREE_OPERAND (TREE_OPERAND (src, 0), 0); + else + return; +} + if (refs_may_alias_p (dest, src)) +return; + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "Simplified\n "); + print_gimple_stmt (dump_file, stmt, 0, dump_flags); + fprintf (dump_file, "after previous\n "); + print_gimple_stmt (dump_file, defstmt, 0, dump_flags); +} + + if (is_gimple_assign (stmt)) +{ + tree ctor = build_constructor (TREE_TYPE (dest), NULL); + gimple_assign_set_rhs_from_tree (gsip, ctor); + update_stmt (stmt); +} + else +{ + gcall *call = as_a (stmt); + tree fndecl = builtin_decl_implicit (BUILT_IN_MEMSET); + gimple_call_set_fndecl (call, fndecl); + gimple_call_set_fntype (call, TREE_TYPE (fndecl)); + gimple_call_set_arg (call, 1, val); + update_stmt (stmt); +} + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "into\n "); + print_gimple_stmt (dump_file, stmt, 0, dump_flags); +} +} + /* A simple pass that attempts to fold all builtin functions. This pass is run after we've propagated as many constants as we can. */ @@ -2994,6 +3102,9 @@ pass_fold_builtins::execute (function *f continue; } } + else if (gimple_assign_load_p (stmt) && gimple_store_p (stmt)) + optimize_memcpy (, gimple_assign_lhs (stmt), +gimple_assign_rhs1 (stmt)); gsi_next (); continue; } @@ -3109,14 +3220,39 @@ pass_fold_builtins::execute (function *f
[committed] Add 2 tests
Hi! In order to close these 2 PRs fixed recently by Segher, I've added the corresponding tests and am going to close them as fixed. 2016-11-25 Jakub JelinekPR rtl-optimization/78438 PR rtl-optimization/78477 * gcc.c-torture/execute/pr78438.c: New test. * gcc.c-torture/execute/pr78477.c: New test. --- gcc/testsuite/gcc.c-torture/execute/pr78438.c.jj2016-11-25 20:17:29.448019008 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr78438.c 2016-11-25 20:17:05.0 +0100 @@ -0,0 +1,22 @@ +/* PR rtl-optimization/78438 */ + +char a = 0; +int b = 197412621; + +__attribute__ ((noinline, noclone)) +void foo () +{ + a = 0 > (short) (b >> 11); +} + +int +main () +{ + asm volatile ("" : : : "memory"); + if (__CHAR_BIT__ != 8 || sizeof (short) != 2 || sizeof (int) < 4) +return 0; + foo (); + if (a != 0) +__builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr78477.c.jj2016-11-25 20:20:26.593705268 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr78477.c 2016-11-25 20:20:14.0 +0100 @@ -0,0 +1,27 @@ +/* PR rtl-optimization/78477 */ + +unsigned a; +unsigned short b; + +unsigned +foo (unsigned x) +{ + b = x; + a >>= (b & 1); + b = 1 | (b << 5); + b >>= 15; + x = (unsigned char) b > ((2 - (unsigned char) b) & 1); + b = 0; + return x; +} + +int +main () +{ + if (__CHAR_BIT__ != 8 || sizeof (short) != 2 || sizeof (int) < 4) +return 0; + unsigned x = foo (12345); + if (x != 0) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)
On Fri, Nov 25, 2016 at 06:50:17PM +, Richard Sandiford wrote: > I think this is still wrong for modes that are wide enough to use > CONST_WIDE_INT but aren't a multiple of a HWI in size. In that The loop iterates by value_bit, which is 8 bits. Thus, indeed, it will misbehave if there is a CONST_WIDE_INT with a precision not multiple of 8. > Does that sound like an improvement? If so I'll file a PR. > I might have time before GCC 7 if it qualifies. Yeah, it might be an improvement. Jakub
Re: [PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)
Jakub Jelinekwrites: > Hi! > > If we try to simplify a paradoxical subreg of a CONST_WIDE_INT, > we can ICE, because wi::extract_uhwi doesn't like accessing > bytes beyond the precision. > for (i = 0; i < elem_bitsize; i += value_bit) > *vp++ = wi::extract_uhwi (val, i, value_bit); > for (; i < elem_bitsize; i += value_bit) > *vp++ = extend; > looks wrong, because the second loop is useless. But, we actually > do want the second loop, to handle the paradoxical bytes. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-11-25 Jakub Jelinek > > PR rtl-optimization/78526 > * simplify-rtx.c (simplify_immed_subreg): Don't use wi::extract_uhwi > beyond val's precision. > > * gcc.dg/pr78526.c: New test. OK, thanks. I think this is still wrong for modes that are wide enough to use CONST_WIDE_INT but aren't a multiple of a HWI in size. In that case we'll still try to access the integer in a wider precision for the final iteration of the first loop. I'm not sure we have any targets like that though, and either way, I agree this is a strict improvement. Perhaps we should have extract_uhwi that works for arbitrary bitpos and width and zero-extends beyond the precision, and an extract_shwi that does the same but sign-extending beyond the precision? Then code like this could use a single loop and could safely access any wide-int representation (regardless of whether it's stored in sign-extended form). The extract_*hwi routines would become a bit more expensive though. Does that sound like an improvement? If so I'll file a PR. I might have time before GCC 7 if it qualifies. Thanks, Richard
Re: [patch,avr]: Improve code as of PR41076
2016-11-25 18:34 GMT+03:00 Georg-Johann Lay: > Mentioned PR is about composing 16-bit values out of 8-bit values which due > to integer promotions likes to work on 16-bit values. > > The patch adds 3 combiner patterns to catch such situations and then split > after reload. Some more splitting is performed after reload for: > > and, ior, xor of reg-reg operations, > > HImode reg = 0 > > and 3- and 4-byte registers moves (and 2-byte moved if !MOVW). > > This gived better code for almost all test cases, yet some test cases could > still be improved. > > I also tried pre-reload splits, but the generated SUBREGs seem to disturb > register allocation, hence I used post-reload splits. > > There is also PR60145 which has a test case where a 4-byte value is composed > out of 4 bytes, but the patterns to catch that are going to be insane... so > for now, such 4-byte cases will still result in bloated code. > > Patch tested against trunk without regressions. > > Ok for trunk? Ok. Please apply. > > Johann > > > PR 41076 > * config/avr/avr.md SPLIT34): New mode iterator. > (bitop): New code iterator. > (*iorhi3.ashift8-*). New insn-and-split patterns. > (*movhi): Post-reload split reg = 0. > (*movhi) [!MOVW]: Post-reload split reg = reg. > (*mov) [SI,SF,PSI,SQ,USQ,SA,USA]: Post-reload split reg = reg. > (andhi3, andpsi3, andsi3): Post-reload split reg-reg operations. > (iorhi3, iorpsi3, iorsi3): Same. > (xorhi3, xorpsi3, xorsi3): Same. > * config/avr/avr.c (avr_rtx_costs_1) [IOR && HImode]: Adjust rtx > costs to *iorhi3.ashift8-* patterns. >
Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default
I'm sorry for delay. I have a problem with the patch: (Stripping trailing CRs from patch; use --binary to disable.) patching file avr-arch.h (Stripping trailing CRs from patch; use --binary to disable.) patching file avr-devices.c (Stripping trailing CRs from patch; use --binary to disable.) patching file avr-mcus.def Hunk #1 FAILED at 62. 1 out of 1 hunk FAILED -- saving rejects to file avr-mcus.def.rej (Stripping trailing CRs from patch; use --binary to disable.) patching file gen-avr-mmcu-specs.c Hunk #1 succeeded at 215 (offset 5 lines). (Stripping trailing CRs from patch; use --binary to disable.) patching file specs.h Hunk #1 succeeded at 58 (offset 1 line). Hunk #2 succeeded at 66 (offset 1 line). 2016-11-22 23:27 GMT+03:00 Georg-Johann Lay: > Denis Chertykov schrieb: >> >> Do you have any objections, George ? > > > No, the last delta rev3 from 2016-11-10 looks fine to me. > > >> >> 2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi >> : >>> >>> Ping! >>> >>> On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote: Ping! On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote: > > On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote: >> >> On 09.11.2016 10:14, Pitchumani Sivanupandi wrote: >>> >>> On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote: On 08.11.2016 08:08, Pitchumani Sivanupandi wrote: > > I have updated patch to include the flash size as well. Took that > info from > device headers (it was fed into crt's device information note > section > also). >> >> >> The new option would render -mn-flash superfluous, but we should >> keep it for >> backward compatibility. > > Ok. >> >> Shouldn't link_pmem_wrap then be removed from link_relax, i.e. >> from >> LINK_RELAX_SPEC? And what happens if relaxation is off? > > Yes. Removed link_pmem_wrap from link_relax. > Disabling relaxation doesn't change -mpmem-wrap-around behavior. > > flashsize-and-wrap-around.patch > diff --git a/gcc/config/avr/avr-mcus.def > b/gcc/config/avr/avr-mcus.def > index 6bcc6ff..9d4aa1a 100644 > /* >>> >>> /* Classic, > 8K, <= 64K. */ >>> -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL, >>> 0x0060, 0x0, 1) >>> -AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, >>> "__AVR_AT43USB355__",0x0060, 0x0, 1) >>> -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, >>> "__AVR_AT76C711__", 0x0060, 0x0, 1) >>> +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL, >>> 0x0060, 0x0, 1, 0x6000) >>> +AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, >>> "__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000) >>> +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, >>> "__AVR_AT76C711__", 0x0060, 0x0, 1, 0x4000) >>> +AVR_MCU ("at43usb320", ARCH_AVR3, AVR_ISA_NONE, >>> "__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1) >>> /* Classic, == 128K. */ >>> -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL, >>> 0x0060, 0x0, 2) >>> -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, >>> "__AVR_ATmega103__", 0x0060, 0x0, 2) >>> -AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, >>> "__AVR_AT43USB320__", 0x0060, 0x0, 2) >>> +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL, >>> 0x0060, 0x0, 2, 0x2) >>> +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, >>> "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2) >>> /* Classic + MOVW + JMP/CALL. */ >> >> >> If at43usb320 is in the wrong multilib, then this should be handled as >> separate issue / patch together with its own PR. Sorry for the >> confusion. I >> just noticed that some fields don't match... >> >> It is not even clear to me from the data sheet if avr3 is the correct >> multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it >> also has >> MUL) as there is no reference to the exact instruction set -- Atmochip >> will >> know. >> >> Moreover, such a change should be sync'ed with avr-libc as all >> multilib >> stuff is hand-wired there: no use of --print-foo meta information >> retrieval >> by avr-libc :-(( >> >> I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 >> for >> this one. >> > Thats better. I've attached the updated patch. If OK, could someone > commit please? > > I'll try if I could find some more info for AT43USB320. > > Regards, > Pitchumani > >> >
one more patch for PR77541
Uros pointed me out that I should use another target for the PR test. Here is the patch committed as rev. 242881. 2016-11-25 Vladimir MakarovPR rtl-optimization/77541 * gcc.target/i386/pr77541.c: Change target to int128. Index: gcc.target/i386/pr77541.c === --- gcc.target/i386/pr77541.c (revision 242848) +++ gcc.target/i386/pr77541.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target int128 } */ /* { dg-options "-O2 -Wno-psabi" } */ #define MAGIC 0x0706050403020100
Re: [PATCH v2] aarch64: Add split-stack initial support
On 15/11/2016 16:38, Wilco Dijkstra wrote: > > On 07/11/2016 16:59, Adhemerval Zanella wrote: >> On 14/10/2016 15:59, Wilco Dijkstra wrote: > >> There is no limit afaik on gold split stack allocation handling, >> and I think one could be added for each backend (in the method >> override require to implement it). >> >> In fact it is not really required to tie the nop generation with the >> instruction generated by 'aarch64_internal_mov_immediate', it is >> just a matter to simplify linker code. > > If there is no easy limit and you'll still require a nop, I think it is best > then > to emit mov N+movk #0. Then the scheduler won't be able to reorder > them with the add/sub. Good call, I have changed the patch to emit a mov N+mov #0 instead on relying the emit_nop. > >>> Is there any need to detect underflow of x10 or is there a guarantee that >>> stacks are >>> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's >>> safe >>> to do a signed comparison. >> >> I do not think so, at least none of current backend that implements >> split stack do so. > > OK, well a signed comparison like in your new version works for underflow. > > Now to the patch: > > > @@ -3316,6 +3339,28 @@ aarch64_expand_prologue (void) >aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, >callee_adjust != 0 || frame_pointer_needed); >aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed); > + > + if (split_stack_arg_pointer_used_p ()) > +{ > + /* Setup the argument pointer (x10) for -fsplit-stack code. If > + __morestack was called, it will left the arg pointer to the > + old stack in x28. Otherwise, the argument pointer is the top > + of current frame. */ > + rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM); > + rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM); > + rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); > + > + rtx not_more = gen_label_rtx (); > + > + rtx cmp = gen_rtx_fmt_ee (LT, VOIDmode, cc_reg, const0_rtx); > + rtx jump = emit_jump_insn (gen_condjump (cmp, cc_reg, not_more)); > + JUMP_LABEL (jump) = not_more; > + LABEL_NUSES (not_more) += 1; > + > + emit_move_insn (x11, x28); > + > + emit_label (not_more); > +} > > If you pass the old sp in x11 when called from __morestack you can remove > the above thunk completely. Indeed this snippet does not make sense anymore, I removed it. > > + /* It limits total maximum stack allocation on 2G so its value can be > + materialized using two instructions at most (movn/movk). It might be > + used by the linker to add some extra space for split calling non split > + stack functions. */ > + allocate = cfun->machine->frame.frame_size; > + if (allocate > ((HOST_WIDE_INT) 1 << 31)) > +{ > + sorry ("Stack frame larger than 2G is not supported for > -fsplit-stack"); > + return; > +} > > Note a 2-instruction mov/movk can generate any immediate up to 4GB and if > we need even large sizes, we could round up to a multiple of 64KB so that 2 > instructions are enough for a 48-bit stack size... I think we can set a limit of 4GB (powerpc64 backend limits to 2GB and it seems fine). I corrected the comment. > > + int ninsn = aarch64_internal_mov_immediate (reg10, GEN_INT (-allocate), > + true, Pmode); > + gcc_assert (ninsn == 1 || ninsn == 2); > + if (ninsn == 1) > +emit_insn (gen_nop ()); > > To avoid any issues with the nop being scheduled, it's best to emit an > explicit movk > here (0x if allocate > 0, or 0 if zero) using gen_insv_immdi. Right, I changed to that. > > +void > +aarch64_split_stack_space_check (rtx size, rtx label) > > Isn't very similar code used in aarch64_expand_split_stack_prologue? Any > possibility > to share/reuse? I though about it, but it would require split in two subparts (one to load __private_ss from TCB and another to jump to __morestack) and both are basically 4 lines. In the end I think current approach should be simpler. > > +static void > +aarch64_live_on_entry (bitmap regs) > +{ > + if (flag_split_stack) > +bitmap_set_bit (regs, R11_REGNUM); > +} > > I'm wondering whether you need extra code in aarch64_can_eliminate to deal > with the argument pointer? Also do we need to define a fixed register, or > will GCC > automatically allocate it to a callee-save if necessary? Now that you asked I think we can get rid of this live marking. I used as base for initial patch the powerpc backend and modelled the argument pointer usage for aarch64 based on it. But after the patch iterations and they way register is set I think it is safe to remove it this constraint. > > +++ b/libgcc/config/aarch64/morestack.S > > +/* Offset from __morestack frame where the arguments size saved and > + passed to __generic_morestack. */ > +#define ARGS_SIZE_SAVE 80 > > This define is unused.
[PATCH] Fix ICE on CONST_WIDE_INT in simplify_immed_subreg (PR rtl-optimization/78526)
Hi! If we try to simplify a paradoxical subreg of a CONST_WIDE_INT, we can ICE, because wi::extract_uhwi doesn't like accessing bytes beyond the precision. for (i = 0; i < elem_bitsize; i += value_bit) *vp++ = wi::extract_uhwi (val, i, value_bit); for (; i < elem_bitsize; i += value_bit) *vp++ = extend; looks wrong, because the second loop is useless. But, we actually do want the second loop, to handle the paradoxical bytes. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-25 Jakub JelinekPR rtl-optimization/78526 * simplify-rtx.c (simplify_immed_subreg): Don't use wi::extract_uhwi beyond val's precision. * gcc.dg/pr78526.c: New test. --- gcc/simplify-rtx.c.jj 2016-11-18 20:04:28.0 +0100 +++ gcc/simplify-rtx.c 2016-11-25 15:07:13.872322389 +0100 @@ -5740,8 +5740,9 @@ simplify_immed_subreg (machine_mode oute { rtx_mode_t val = rtx_mode_t (el, innermode); unsigned char extend = wi::sign_mask (val); + int prec = wi::get_precision (val); - for (i = 0; i < elem_bitsize; i += value_bit) + for (i = 0; i < prec && i < elem_bitsize; i += value_bit) *vp++ = wi::extract_uhwi (val, i, value_bit); for (; i < elem_bitsize; i += value_bit) *vp++ = extend; --- gcc/testsuite/gcc.dg/pr78526.c.jj 2016-11-25 15:15:53.266799450 +0100 +++ gcc/testsuite/gcc.dg/pr78526.c 2016-11-25 15:16:12.004564146 +0100 @@ -0,0 +1,21 @@ +/* PR rtl-optimization/78526 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O -fno-tree-ccp -fno-tree-sra -g -w" } */ +/* { dg-additional-options "-mavx512bw" { target i?86-*-* x86_64-*-* } } */ + +typedef unsigned U __attribute__ ((vector_size (64))); +typedef unsigned __int128 V __attribute__ ((vector_size (64))); + +static inline V +bar (U u, U x, V v) +{ + v = (V)(U) { 0, ~0 }; + v[x[0]] <<= u[-63]; + return v; +} + +V +foo (U u) +{ + return bar (u, (U) {}, (V) {}); +} Jakub
[C++ PATCH] Fix ICE with scoped enums and -g (PR c++/72808)
Hi! As mentioned in the PR, we ICE during type verification called from dwarf2out, because a class has different TYPE_FIELDS from its variants. In particular it contains an extra CONST_DECL. While CONST_DECLs are ignored for C/C++ in dwarf2out.c, I think they aren't ignored for Fortran/Ada, so ignoring them in the verifier might be too risky. This patch instead fixes the variants immediately. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Another possibility would be just to update TYPE_FIELDS of the variants instead of calling whole fixup_type_variants. 2016-11-25 Jakub JelinekPR c++/72808 * decl.c (build_enumerator): Call fixup_type_variants on current_class_type after finish_member_declaration. * g++.dg/debug/pr72808.C: New test. --- gcc/cp/decl.c.jj2016-11-23 19:44:40.0 +0100 +++ gcc/cp/decl.c 2016-11-25 13:15:46.407116179 +0100 @@ -14518,6 +14518,7 @@ incremented enumerator value is too larg current_access_specifier = access_public_node; finish_member_declaration (decl); + fixup_type_variants (current_class_type); current_access_specifier = saved_cas; } --- gcc/testsuite/g++.dg/debug/pr72808.C.jj 2016-11-25 13:17:06.777091600 +0100 +++ gcc/testsuite/g++.dg/debug/pr72808.C2016-11-25 13:16:27.0 +0100 @@ -0,0 +1,24 @@ +// PR c++/72808 +// { dg-do compile } +// { dg-options "-g -std=c++14" } + +struct A +{ + virtual void foo (); +}; + +struct B : A +{ + void foo (); + enum C : int; +}; + +enum B::C : int +{ + D +}; + +void +B::foo () +{ +} Jakub
[PATCH] Fix -fcompare-debug issues in IPA-ICF (PR lto/78211)
Hi! IPA-ICF performs some code-generation visible changes from hash table traversal, where the hash values can be different between -g and -g0 (I bet it hashes DECL_UID in somewhere, perhaps other things). The following patch fixes it by adding a vector next to the hash table, which tracks the groups in the order they were created. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-25 Jakub JelinekPR lto/78211 * ipa-icf.h (sem_item_optimizer): Add m_classes_vec member. * ipa-icf.c (sem_item_optimizer::sem_item_optimizer): Initialize it. (sem_item_optimizer::~sem_item_optimizer): Traverse m_classes_vec vector instead of traversing m_classes hash table. Release m_classes_vec. (sem_item_optimizer::read_section, sem_item_optimizer::add_class): Formatting fixes. (sem_item_optimizer::get_group_by_hash): When inserting a new group, add it also to m_classes_vec vector. (sem_item_optimizer::remove_symtab_node, sem_item_optimizer::build_hash_based_classes, sem_item_optimizer::parse_nonsingleton_classes): Formatting fixes. (sem_item_optimizer::subdivide_classes_by_equality, sem_item_optimizer::subdivide_classes_by_sensitive_refs, sem_item_optimizer::verify_classes): Traverse m_classes_vec vector instead of traversing m_classes hash table. Formatting fixes. (sem_item_optimizer::traverse_congruence_split, sem_item_optimizer::do_congruence_step_for_index, sem_item_optimizer::do_congruence_step): Formatting fixes. (sem_item_optimizer::process_cong_reduction): Traverse m_classes_vec vector instead of traversing m_classes hash table. (sem_item_optimizer::dump_cong_classes): Likewise. Formatting fixes. (sem_item_optimizer::merge_classes): Traverse m_classes_vec vector instead of traversing m_classes hash table. * g++.dg/ipa/pr78211.C: New test. --- gcc/ipa-icf.h.jj2016-11-22 11:05:58.0 +0100 +++ gcc/ipa-icf.h 2016-11-25 11:41:18.210144897 +0100 @@ -609,9 +609,12 @@ private: /* A set containing all items removed by hooks. */ hash_set m_removed_items_set; - /* Hashtable of congruence classes */ + /* Hashtable of congruence classes. */ hash_table m_classes; + /* Vector of congruence classes. */ + vec m_classes_vec; + /* Count of congruence classes. */ unsigned int m_classes_count; --- gcc/ipa-icf.c.jj2016-11-22 11:05:58.0 +0100 +++ gcc/ipa-icf.c 2016-11-25 12:05:31.765505438 +0100 @@ -2284,6 +2284,7 @@ sem_item_optimizer::sem_item_optimizer ( m_varpool_node_hooks (NULL) { m_items.create (0); + m_classes_vec.create (0); bitmap_obstack_initialize (_bmstack); } @@ -2292,17 +2293,19 @@ sem_item_optimizer::~sem_item_optimizer for (unsigned int i = 0; i < m_items.length (); i++) delete m_items[i]; - for (hash_table::iterator it = m_classes.begin (); - it != m_classes.end (); ++it) + unsigned int l; + congruence_class_group *it; + FOR_EACH_VEC_ELT (m_classes_vec, l, it) { - for (unsigned int i = 0; i < (*it)->classes.length (); i++) - delete (*it)->classes[i]; + for (unsigned int i = 0; i < it->classes.length (); i++) + delete it->classes[i]; - (*it)->classes.release (); - free (*it); + it->classes.release (); + free (it); } m_items.release (); + m_classes_vec.release (); bitmap_obstack_release (_bmstack); } @@ -2361,8 +2364,8 @@ void sem_item_optimizer::read_section (lto_file_decl_data *file_data, const char *data, size_t len) { - const lto_function_header *header = -(const lto_function_header *) data; + const lto_function_header *header += (const lto_function_header *) data; const int cfg_offset = sizeof (lto_function_header); const int main_offset = cfg_offset + header->cfg_size; const int string_offset = main_offset + header->main_size; @@ -2373,9 +2376,9 @@ sem_item_optimizer::read_section (lto_fi lto_input_block ib_main ((const char *) data + main_offset, 0, header->main_size, file_data->mode_table); - data_in = -lto_data_in_create (file_data, (const char *) data + string_offset, - header->string_size, vNULL); + data_in += lto_data_in_create (file_data, (const char *) data + string_offset, + header->string_size, vNULL); count = streamer_read_uhwi (_main); @@ -2473,9 +2476,9 @@ sem_item_optimizer::add_class (congruenc { gcc_assert (cls->members.length ()); - congruence_class_group *group = get_group_by_hash ( - cls->members[0]->get_hash (), - cls->members[0]->type); + congruence_class_group *group += get_group_by_hash (cls->members[0]->get_hash (), +
Re: [Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic
Hi Paul, > This looks fine to me - OK for trunk. thanks a lot. Committed as r242880. Cheers, Janus > On 25 November 2016 at 13:40, Janus Weilwrote: >> Hi all, >> >> here is a patch that fixes a rejects-valid problem with an >> unlimited-polymorphic variable in a generic procedure. Removing the >> line with UNLIMITED_POLY fixes the error and the rest of the patch is >> just slightly refactoring the for-loop. >> >> Regtested successfully on x86_64-linux-gnu. Ok for trunk? >> >> Cheers, >> Janus >> >> >> 2016-11-25 Janus Weil >> >> PR fortran/60853 >> * interface.c (gfc_compare_interfaces): Remove bad special case for >> unlimited polymorphism. Refactor for loop. >> >> 2016-11-25 Janus Weil >> >> PR fortran/60853 >> * gfortran.dg/typebound_assignment_8.f90: New test case. > > > > -- > The difference between genius and stupidity is; genius has its limits. > > Albert Einstein
[PATCH] Fix a couple of issues in gimple-ssa-sprintf.c
Hi! Here is an attempt to fix a couple of bugs in gimple-ssa-sprintf.c. First of all, it assumes size_t is always the same as uintmax_t, which is not necessarily the case. Second, it uses static tree {,u}intmax_type_node; variables for caching those types, but doesn't register them with GC; but their computation is quite cheap, so I think it isn't worth wasting a GC root for those, especially if we compute it only in the very rare case when somebody uses the j modifier. Third, the code assumes that ptrdiff_t is the signed type for size_t. E.g. vms is one port where that isn't true, ptrdiff_t can be 64-bit, while size_t is 32-bit. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-25 Jakub Jelinek* gimple-ssa-sprintf.c (build_intmax_type_nodes): Look at UINTMAX_TYPE rather than SIZE_TYPE. Add gcc_unreachable if intmax_t couldn't be determined. (format_integer): Make {,u}intmax_type_node no longer static, initialize them only when needed. For z and t use signed_or_unsigned_type_for instead of assuming size_t and ptrdiff_t have the same precision. --- gcc/gimple-ssa-sprintf.c.jj 2016-11-25 09:49:47.0 +0100 +++ gcc/gimple-ssa-sprintf.c2016-11-25 10:26:58.763114194 +0100 @@ -733,23 +733,23 @@ format_percent (const conversion_spec &, } -/* Ugh. Compute intmax_type_node and uintmax_type_node the same way - lto/lto-lang.c does it. This should be available in tree.h. */ +/* Compute intmax_type_node and uintmax_type_node similarly to how + tree.c builds size_type_node. */ static void build_intmax_type_nodes (tree *pintmax, tree *puintmax) { - if (strcmp (SIZE_TYPE, "unsigned int") == 0) + if (strcmp (UINTMAX_TYPE, "unsigned int") == 0) { *pintmax = integer_type_node; *puintmax = unsigned_type_node; } - else if (strcmp (SIZE_TYPE, "long unsigned int") == 0) + else if (strcmp (UINTMAX_TYPE, "long unsigned int") == 0) { *pintmax = long_integer_type_node; *puintmax = long_unsigned_type_node; } - else if (strcmp (SIZE_TYPE, "long long unsigned int") == 0) + else if (strcmp (UINTMAX_TYPE, "long long unsigned int") == 0) { *pintmax = long_long_integer_type_node; *puintmax = long_long_unsigned_type_node; @@ -762,12 +762,14 @@ build_intmax_type_nodes (tree *pintmax, char name[50]; sprintf (name, "__int%d unsigned", int_n_data[i].bitsize); - if (strcmp (name, SIZE_TYPE) == 0) + if (strcmp (name, UINTMAX_TYPE) == 0) { *pintmax = int_n_trees[i].signed_type; *puintmax = int_n_trees[i].unsigned_type; + return; } } + gcc_unreachable (); } } @@ -851,15 +853,8 @@ format_pointer (const conversion_spec static fmtresult format_integer (const conversion_spec , tree arg) { - /* These are available as macros in the C and C++ front ends but, - sadly, not here. */ - static tree intmax_type_node; - static tree uintmax_type_node; - - /* Initialize the intmax nodes above the first time through here. */ - if (!intmax_type_node) -build_intmax_type_nodes (_type_node, _type_node); - + tree intmax_type_node; + tree uintmax_type_node; /* Set WIDTH and PRECISION to either the values in the format specification or to zero. */ int width = spec.have_width ? spec.width : 0; @@ -909,19 +904,20 @@ format_integer (const conversion_spec break; case FMT_LEN_z: - dirtype = sign ? ptrdiff_type_node : size_type_node; + dirtype = signed_or_unsigned_type_for (!sign, size_type_node); break; case FMT_LEN_t: - dirtype = sign ? ptrdiff_type_node : size_type_node; + dirtype = signed_or_unsigned_type_for (!sign, ptrdiff_type_node); break; case FMT_LEN_j: + build_intmax_type_nodes (_type_node, _type_node); dirtype = sign ? intmax_type_node : uintmax_type_node; break; default: - return fmtresult (); + return fmtresult (); } /* The type of the argument to the directive, either deduced from Jakub
[committed] Fix combine ICE with out of bounds shift (PR rtl-optimization/78527)
Hi! The following testcase ICEs in the combiner, because we try to optimize HImode SUBREG of SImode LSHIFTRT with shift count 80, so width is -48, and in the end we call smallest_mode_for_size (-48, MODE_INT) and that function has unsigned int as first argument; most of the ports don't have modes with 4Gbits yet. Bootstrapped/regtested on x86_64-linux and i686-linux, pre-approved by Segher on IRC, committed to trunk. 2016-11-25 Jakub JelinekPR rtl-optimization/78527 * combine.c (make_compound_operation_int): Ignore LSHIFTRT with out of bounds shift count. * gcc.c-torture/compile/pr78527.c: New test. --- gcc/combine.c.jj2016-11-25 09:49:50.0 +0100 +++ gcc/combine.c 2016-11-25 13:42:43.628487136 +0100 @@ -8091,6 +8091,8 @@ make_compound_operation_int (machine_mod if (GET_CODE (inner) == LSHIFTRT && CONST_INT_P (XEXP (inner, 1)) && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) + && (UINTVAL (XEXP (inner, 1)) + < GET_MODE_PRECISION (GET_MODE (inner))) && subreg_lowpart_p (x)) { new_rtx = make_compound_operation (XEXP (inner, 0), next_code); --- gcc/testsuite/gcc.c-torture/compile/pr78527.c.jj2016-11-25 13:44:53.806825477 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr78527.c 2016-11-25 13:44:35.0 +0100 @@ -0,0 +1,21 @@ +/* PR rtl-optimization/78527 */ + +unsigned a; +short b, e; +int *c; +char d; + +int +main () +{ + int f = 80; + for (;;) { +if (f > 432) + *c = a; +while (b) + if (d) +e = -(a >> f); +c = +b = e; + } +} Jakub
Re: [Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic
Dear Janus, This looks fine to me - OK for trunk. Best regards and thanks Paul On 25 November 2016 at 13:40, Janus Weilwrote: > Hi all, > > here is a patch that fixes a rejects-valid problem with an > unlimited-polymorphic variable in a generic procedure. Removing the > line with UNLIMITED_POLY fixes the error and the rest of the patch is > just slightly refactoring the for-loop. > > Regtested successfully on x86_64-linux-gnu. Ok for trunk? > > Cheers, > Janus > > > 2016-11-25 Janus Weil > > PR fortran/60853 > * interface.c (gfc_compare_interfaces): Remove bad special case for > unlimited polymorphism. Refactor for loop. > > 2016-11-25 Janus Weil > > PR fortran/60853 > * gfortran.dg/typebound_assignment_8.f90: New test case. -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function
On Tue, Nov 22, 2016 at 10:38:42AM -0500, David Malcolm wrote: > On Tue, 2016-11-22 at 15:45 +0100, Jakub Jelinek wrote: > > On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote: > > > On 11/22/2016 02:37 PM, Jakub Jelinek wrote: > > > > Can't it be done only if xloc.file contains any fancy characters? > > > > > > Sure, but why? Strings generally get emitted with quotes around > > > them, I > > > don't see a good reason for filenames to be different, especially > > > if it > > > makes the output easier to parse. > > > > Because printing common filenames matches what we emit in > > diagnostics, > > what e.g. sanitizers emit at runtime diagnostics, what we emit as > > locations > > in gimple dumps etc. > > It sounds like a distinction between human-readable vs machine > -readable. > > How about something like the following, which only adds the quotes if > outputting the RTL FE's input format? > > Does this fix the failing tests? Yep. > --- a/gcc/print-rtl.c > +++ b/gcc/print-rtl.c > @@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, > int idx) >if (INSN_HAS_LOCATION (in_insn)) > { > expanded_location xloc = insn_location (in_insn); > - fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line); > + if (m_compact) > + fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line); > + else > + fprintf (m_outfile, " %s:%i", xloc.file, xloc.line); Looks sensible to me. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC
> > > A few comments about this new version: > > * arm-neon-ref.h: why do you create > CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64? > > Can't you just add calls to CHECK_CRYPTO in the existing > > CHECK_RESULTS_NAMED_NO_FP16? Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I added it I didn't remove the split. I'll do it now. > > > > * p64_p128: > > From what I can see ARM and AArch64 differ on the vceq variants > > available with poly64. > > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a, > > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ... > > Actually I've just noticed the other you submitted while I was writing > > this, where you add vceq_p64 for aarch64, but it still returns > > uint64_t. > > Why do you change the vceq_64 test to return poly64_t instead of > uint64_t? This patch is slightly outdated. The correct type is `uint64_t` but when it was noticed This patch was already sent. New one coming soon. > > > > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64? > > This is wrong, remove them. It was supposed to be around the vldX_lane_p64 tests. > > The comment /* vget_lane_p64 tests. */ is wrong before VLDX_LANE > > tests > > > > You need to protect the new vmov, vget_high and vget_lane tests with > > #ifdef __aarch64__. > > vget_lane is already in an #ifdef, vmov you're right, but I also notice that the test calls VDUP instead of VMOV, which explains why I didn't get a test failure. Thanks for the feedback, I'll get these updated. > > Actually, vget_high_p64 exists on arm, so no need for the #fidef for it. > > > > Christophe > > > >> Kind regards, > >> Tamar > >> > >> From: Tamar Christina > >> Sent: Tuesday, November 8, 2016 11:58:46 AM > >> To: Christophe Lyon > >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard > >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd > >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing > >> Poly64_t intrinsics to GCC > >> > >> Hi Christophe, > >> > >> Thanks for the review! > >> > >>> > >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests > >>> except for vreinterpret. > >>> Why do you need to create p64.c ? > >> > >> I originally created it because I had a much smaller set of > >> intrinsics that I wanted to add initially, this grew and It hadn't > >> occurred to > me that I can use the existing file now. > >> > >> Another reason was the effective-target arm_crypto_ok as you > mentioned below. > >> > >>> > >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or > >>> p64_p128.c might be easier to maintain than adding them to vcreate.c > >>> etc with several #ifdef conditions. > >> > >> Fair enough, I'll move them to p64_p128.c. > >> > >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)" > >>> condition? These intrinsics are defined in arm/arm_neon.h, right? > >>> They are tested in p64_p128.c > >> > >> I should have looked for them, they weren't being tested before so I > >> had Mistakenly assumed that they weren't available. Now I realize I > >> just need To add the proper test option to the file to enable crypto. I'll > update this as well. > >> > >>> Looking at your patch, it seems some tests are currently missing for arm: > >>> vget_high_p64. I'm not sure why I missed it when I removed neont- > >>> testgen... > >> > >> I'll adjust the test conditions so they run for ARM as well. > >> > >>> > >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target > >>> arm_crypto_ok prevent the tests from running on aarch64? > >> > >> Yes they do, I was comparing the output against a clean version and > >> hasn't noticed That they weren't running. Thanks! > >> > >>> > >>> Thanks, > >>> > >>> Christophe
Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.
On Fri, Nov 25, 2016 at 10:15:25AM +0100, Richard Biener wrote: > On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkool >wrote: > > On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote: > >> On 11/24/2016 07:53 AM, Segher Boessenkool wrote: > >> > > >> >That we compare different kinds of costs (which really has no meaning at > >> >all, it's a heuristic at best) in various places is a known problem, not > >> >a regression. > >> But the problems with the costing system exhibit themselves as a code > >> quality regression. In the end that's what the end-users see -- a > >> regression in the quality of the code GCC generates. > > > > Yes, exactly -- and I fear this all-encompassing change will cause just > > such a regression for many users. Tests are running, will know more > > later today (or tomorrow). > > > > The PR is about a very specific problem; the patch is not. The patch > > is not a bug fix. If we allow anything that "makes things better" in > > stage 3, what make it different from stage 1 then? > > That's a good question ;) The stage 3 definition has a loophole via > "go file a bug about feature X, then it's a bugfix!". > > I'm all open for a more sensible definition, like constraining the kind > of non-regression fixes that we want to allow, but I fear the most > sensible option would be to simply ditch the notion of different > "stages" and make it "general development" and "regression fixing". > (though if you try hard enough and go back in time you'll find that > almost all non-enhancement bugs are regressions in some sense) The scale goes: early stage 1, anything goes; ...; until stage 4, only very narrow regression fixes are allowed. Let's try to keep that spirit, and not behave like politicians following the "rules" (or not). > And yes, current stage3 still feels too much like stage1 ;) Yes, very much so. Well, at least trunk bootstraps on more targets now. -- So IMNSHO this rtx costing change belongs in early stage 1, and should be reverted. If ifcvt should use full rtx cost instead of rtx_src_cost, fix *that*, that is a much more local change. And even then, test on more targets please. Segher
Re: [tree-tailcall] Check if function returns it's argument
On 11/25/2016 01:07 AM, Richard Biener wrote: For the tail-call, issue should we artificially create a lhs and use that as return value (perhaps by a separate pass before tailcall) ? __builtin_memcpy (a1, a2, a3); return a1; gets transformed to: _1 = __builtin_memcpy (a1, a2, a3) return _1; So tail-call optimization pass would see the IL in it's expected form. As said, a RTL expert needs to chime in here. Iff then tail-call itself should do this rewrite. But if this form is required to make things work (I suppose you checked it _does_ actually work?) then we'd need to make sure later passes do not undo it. So it looks fragile to me. OTOH I seem to remember that the flags we set on GIMPLE are merely a hint to RTL expansion and the tailcalling is verified again there? So tail calling actually sits on the border between trees and RTL. Essentially it's an expand-time decision as we use information from trees as well as low level target information. I would not expect the former sequence to tail call. The tail calling code does not know that the return value from memcpy will be a1. Thus the tail calling code has to assume that it'll have to copy a1 into the return register after returning from memcpy, which obviously can't be done if we tail called memcpy. The second form is much more likely to turn into a tail call sequence because the return value from memcpy will be sitting in the proper register. This form out to work for most calling conventions that allow tail calls. We could (in theory) try and exploit the fact that memcpy returns its first argument as a return value, but that would only be helpful on a target where the first argument and return value use the same register. So I'd have a slight preference to rewriting per Prathamesh's suggestion above since it's more general. Jeff
[patch,avr]: Improve code as of PR41076
Mentioned PR is about composing 16-bit values out of 8-bit values which due to integer promotions likes to work on 16-bit values. The patch adds 3 combiner patterns to catch such situations and then split after reload. Some more splitting is performed after reload for: and, ior, xor of reg-reg operations, HImode reg = 0 and 3- and 4-byte registers moves (and 2-byte moved if !MOVW). This gived better code for almost all test cases, yet some test cases could still be improved. I also tried pre-reload splits, but the generated SUBREGs seem to disturb register allocation, hence I used post-reload splits. There is also PR60145 which has a test case where a 4-byte value is composed out of 4 bytes, but the patterns to catch that are going to be insane... so for now, such 4-byte cases will still result in bloated code. Patch tested against trunk without regressions. Ok for trunk? Johann PR 41076 * config/avr/avr.md SPLIT34): New mode iterator. (bitop): New code iterator. (*iorhi3.ashift8-*). New insn-and-split patterns. (*movhi): Post-reload split reg = 0. (*movhi) [!MOVW]: Post-reload split reg = reg. (*mov) [SI,SF,PSI,SQ,USQ,SA,USA]: Post-reload split reg = reg. (andhi3, andpsi3, andsi3): Post-reload split reg-reg operations. (iorhi3, iorpsi3, iorsi3): Same. (xorhi3, xorpsi3, xorsi3): Same. * config/avr/avr.c (avr_rtx_costs_1) [IOR && HImode]: Adjust rtx costs to *iorhi3.ashift8-* patterns. Index: config/avr/avr.c === --- config/avr/avr.c (revision 242823) +++ config/avr/avr.c (working copy) @@ -10645,6 +10645,19 @@ avr_rtx_costs_1 (rtx x, machine_mode mod /* FALLTHRU */ case AND: case IOR: + if (IOR == code + && HImode == mode + && ASHIFT == GET_CODE (XEXP (x, 0))) +{ + *total = COSTS_N_INSNS (2); + // Just a rough estimate. If we see no sign- or zero-extend, + // then increase the cost a little bit. + if (REG_P (XEXP (XEXP (x, 0), 0))) +*total += COSTS_N_INSNS (1); + if (REG_P (XEXP (x, 1))) +*total += COSTS_N_INSNS (1); + return true; +} *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)); *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed); if (!CONST_INT_P (XEXP (x, 1))) Index: config/avr/avr.md === --- config/avr/avr.md (revision 242823) +++ config/avr/avr.md (working copy) @@ -260,6 +260,10 @@ (define_mode_iterator ORDERED234 [HI SI HQ UHQ HA UHA SQ USQ SA USA]) +;; Post-reload split of 3, 4 bytes wide moves. +(define_mode_iterator SPLIT34 [SI SF PSI + SQ USQ SA USA]) + ;; Define code iterators ;; Define two incarnations so that we can build the cross product. (define_code_iterator any_extend [sign_extend zero_extend]) @@ -267,6 +271,7 @@ (define_code_iterator any_extend2 [sign_ (define_code_iterator any_extract [sign_extract zero_extract]) (define_code_iterator any_shiftrt [lshiftrt ashiftrt]) +(define_code_iterator bitop [xor ior and]) (define_code_iterator xior [xor ior]) (define_code_iterator eqne [eq ne]) @@ -3328,6 +,66 @@ (define_insn "xorsi3" (set_attr "adjust_len" "*,out_bitop,out_bitop") (set_attr "cc" "set_n,clobber,clobber")]) + +(define_split + [(set (match_operand:SPLIT34 0 "register_operand") +(match_operand:SPLIT34 1 "register_operand"))] + "optimize + && reload_completed" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] + { +machine_mode mode_hi = 4 == GET_MODE_SIZE (mode) ? HImode : QImode; +bool lo_first = REGNO (operands[0]) < REGNO (operands[1]); +rtx dst_lo = simplify_gen_subreg (HImode, operands[0], mode, 0); +rtx src_lo = simplify_gen_subreg (HImode, operands[1], mode, 0); +rtx dst_hi = simplify_gen_subreg (mode_hi, operands[0], mode, 2); +rtx src_hi = simplify_gen_subreg (mode_hi, operands[1], mode, 2); + +operands[2] = lo_first ? dst_lo : dst_hi; +operands[3] = lo_first ? src_lo : src_hi; +operands[4] = lo_first ? dst_hi : dst_lo; +operands[5] = lo_first ? src_hi : src_lo; + }) + +(define_split + [(set (match_operand:HI 0 "register_operand") +(match_operand:HI 1 "reg_or_0_operand"))] + "optimize + && reload_completed + && (!AVR_HAVE_MOVW + || const0_rtx == operands[1])" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] + { +operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 1); +operands[3] = simplify_gen_subreg (QImode, operands[1], HImode, 1); +operands[4] = simplify_gen_subreg (QImode, operands[0], HImode, 0); +operands[5] = simplify_gen_subreg (QImode, operands[1], HImode, 0); + }) +
Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.
On 11/25/2016 02:15 AM, Richard Biener wrote: That's a good question ;) The stage 3 definition has a loophole via "go file a bug about feature X, then it's a bugfix!". Right. That loophole has existed since we've moved to the current model -- we extend a level of trust to our developers not to abuse the loophole. I think that level of trust is warranted and hasn't been significantly violated. I'm all open for a more sensible definition, like constraining the kind of non-regression fixes that we want to allow, but I fear the most sensible option would be to simply ditch the notion of different "stages" and make it "general development" and "regression fixing". (though if you try hard enough and go back in time you'll find that almost all non-enhancement bugs are regressions in some sense) Similarly, I'm always open for improvements. My worry is if we went to development/regression bugfixing cycle, then non-regression bugs would largely be ignored. General bugfixing is, IMHO, a good period -- it gets a larger portion of our developers fixing bugs and gives folks with a heavy review load a chance to flush out their queues of stuff that came in right at the end of stage1. I'm not really pushing to open a "development cycle" discussion right now, but I do sense that our cycles could use some refinement. And yes, current stage3 still feels too much like stage1 ;) Hasn't seemed that way to me, but obviously experiences will differ. My biggest worry about this cycle is the higher than typical (compared to the last few years) regression bug counts. That worry is somewhat mitigated by the belief that we're marking regressions much much more consistently this year, so we're a lot less likely to get a big jump in marked regressions like we saw last year. *If* that is the case (and my light poking around seems to indicate that's true), then we're likely ahead of the gcc-6 cycle, but behind the gcc-5 cycle. jeff
Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC
On 25 November 2016 at 15:53, Christophe Lyonwrote: > Hi Tamar, > > On 24 November 2016 at 12:45, Tamar Christina wrote: >> Hi Christoph, >> >> I have combined most of the tests in p64_p128 except for the >> vreinterpret_p128 and vreinterpret_p64 ones because I felt the number >> of code that would be have to be added to p64_p128 vs having them in those >> files isn't worth it. Since a lot of the test setup would have to be copied. >> > > A few comments about this new version: > * arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64? > Can't you just add calls to CHECK_CRYPTO in the existing > CHECK_RESULTS_NAMED_NO_FP16? > > * p64_p128: > From what I can see ARM and AArch64 differ on the vceq variants > available with poly64. > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a, > poly64x1_t __b) > For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just > noticed > the other you submitted while I was writing this, where you add vceq_p64 for > aarch64, but it still returns uint64_t. > Why do you change the vceq_64 test to return poly64_t instead of uint64_t? > > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64? > > The comment /* vget_lane_p64 tests. */ is wrong before VLDX_LANE tests > > You need to protect the new vmov, vget_high and vget_lane tests with > #ifdef __aarch64__. > Actually, vget_high_p64 exists on arm, so no need for the #fidef for it. > Christophe > >> Kind regards, >> Tamar >> >> From: Tamar Christina >> Sent: Tuesday, November 8, 2016 11:58:46 AM >> To: Christophe Lyon >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; >> James Greenhalgh; Kyrylo Tkachov; nd >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t >> intrinsics to GCC >> >> Hi Christophe, >> >> Thanks for the review! >> >>> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests except >>> for >>> vreinterpret. >>> Why do you need to create p64.c ? >> >> I originally created it because I had a much smaller set of intrinsics that >> I wanted to >> add initially, this grew and It hadn't occurred to me that I can use the >> existing file now. >> >> Another reason was the effective-target arm_crypto_ok as you mentioned below. >> >>> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might >>> be >>> easier to maintain than adding them to vcreate.c etc with several #ifdef >>> conditions. >> >> Fair enough, I'll move them to p64_p128.c. >> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)" >>> condition? These intrinsics are defined in arm/arm_neon.h, right? >>> They are tested in p64_p128.c >> >> I should have looked for them, they weren't being tested before so I had >> Mistakenly assumed that they weren't available. Now I realize I just need >> To add the proper test option to the file to enable crypto. I'll update this >> as well. >> >>> Looking at your patch, it seems some tests are currently missing for arm: >>> vget_high_p64. I'm not sure why I missed it when I removed neont- >>> testgen... >> >> I'll adjust the test conditions so they run for ARM as well. >> >>> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target >>> arm_crypto_ok prevent the tests from running on aarch64? >> >> Yes they do, I was comparing the output against a clean version and hasn't >> noticed >> That they weren't running. Thanks! >> >>> >>> Thanks, >>> >>> Christophe
Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC
Hi Tamar, On 24 November 2016 at 12:45, Tamar Christinawrote: > Hi Christoph, > > I have combined most of the tests in p64_p128 except for the > vreinterpret_p128 and vreinterpret_p64 ones because I felt the number > of code that would be have to be added to p64_p128 vs having them in those > files isn't worth it. Since a lot of the test setup would have to be copied. > A few comments about this new version: * arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64? Can't you just add calls to CHECK_CRYPTO in the existing CHECK_RESULTS_NAMED_NO_FP16? * p64_p128: >From what I can see ARM and AArch64 differ on the vceq variants available with poly64. For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a, poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just noticed the other you submitted while I was writing this, where you add vceq_p64 for aarch64, but it still returns uint64_t. Why do you change the vceq_64 test to return poly64_t instead of uint64_t? Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64? The comment /* vget_lane_p64 tests. */ is wrong before VLDX_LANE tests You need to protect the new vmov, vget_high and vget_lane tests with #ifdef __aarch64__. Christophe > Kind regards, > Tamar > > From: Tamar Christina > Sent: Tuesday, November 8, 2016 11:58:46 AM > To: Christophe Lyon > Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; > James Greenhalgh; Kyrylo Tkachov; nd > Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t > intrinsics to GCC > > Hi Christophe, > > Thanks for the review! > >> >> A while ago I added p64_p128.c, to contain all the poly64/128 tests except >> for >> vreinterpret. >> Why do you need to create p64.c ? > > I originally created it because I had a much smaller set of intrinsics that I > wanted to > add initially, this grew and It hadn't occurred to me that I can use the > existing file now. > > Another reason was the effective-target arm_crypto_ok as you mentioned below. > >> >> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might >> be >> easier to maintain than adding them to vcreate.c etc with several #ifdef >> conditions. > > Fair enough, I'll move them to p64_p128.c. > >> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)" >> condition? These intrinsics are defined in arm/arm_neon.h, right? >> They are tested in p64_p128.c > > I should have looked for them, they weren't being tested before so I had > Mistakenly assumed that they weren't available. Now I realize I just need > To add the proper test option to the file to enable crypto. I'll update this > as well. > >> Looking at your patch, it seems some tests are currently missing for arm: >> vget_high_p64. I'm not sure why I missed it when I removed neont- >> testgen... > > I'll adjust the test conditions so they run for ARM as well. > >> >> Regarding vreinterpret_p128.c, doesn't the existing effective-target >> arm_crypto_ok prevent the tests from running on aarch64? > > Yes they do, I was comparing the output against a clean version and hasn't > noticed > That they weren't running. Thanks! > >> >> Thanks, >> >> Christophe
[OBVIOUS] [PATCH] Fix documentation reference (PR web/71666)
The patch is selecting the proper section where -fprofile-generate is documented. Martin >From 00a7b92ef9bc7158a3e7202deb9b18b8d95dd5d2 Mon Sep 17 00:00:00 2001 From: marxinDate: Fri, 25 Nov 2016 15:11:48 +0100 Subject: [PATCH] Fix documentation reference (PR web/71666) gcc/ChangeLog: 2016-11-25 Martin Liska PR web/71666 * doc/invoke.texi (-fprofile-use): Fix reference to a section where -fprofile-generate is documented. --- gcc/doc/invoke.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 34c7187..8a0cad7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8921,8 +8921,8 @@ which are generally profitable only with profile feedback available: @option{-ftree-vectorize}, and @option{ftree-loop-distribute-patterns}. Before you can use this option, you must first generate profiling information. -@xref{Optimize Options}, for information about the @option{-fprofile-generate} -option. +@xref{Instrumentation Options}, for information about the +@option{-fprofile-generate} option. By default, GCC emits an error message if the feedback profiles do not match the source code. This error can be turned into a warning by using -- 2.10.2
Re: [PATCH] Fix ICE with -Wuninitialized (PR tree-optimization/78455)
On Nov 23 2016, Christophe Lyonwrote: > Since you committed this patch (r242733), I've noticed a regression: > gcc.dg/uninit-pred-6_c.c warning (test for warnings, line 43) > > now fails on some arm targets, for instance arm-none-linux-gnueabihf > --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16 > > (or similarly on arm-none-eabi --with-cpu=cortex-m3 --with-mode=thumb) Also fails on m68k. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH] Fix PR78515
On Fri, 25 Nov 2016, Martin Jambor wrote: > Hi, > > On Fri, Nov 25, 2016 at 12:01:38PM +0100, Richard Biener wrote: > > > > I am testing the following to beat some sanity into > > compute_complex_assign_jump_func. > > That the function does not handle ternary operations (did we have them > since the beginning?) is clearly my fault and the patch is fine. For quite some time indeed ;) > Please commit it. Will do after testing finished. > > There's still that odd 'stmt2' > > hanging around that gets set to sth else than stmt with > > > > op1 = gimple_assign_rhs1 (stmt); > > > > if (TREE_CODE (op1) == SSA_NAME) > > { > > if (SSA_NAME_IS_DEFAULT_DEF (op1)) > > index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1)); > > else > > { > > index = load_from_param (fbi, info->descriptors, > >SSA_NAME_DEF_STMT (op1)); > > stmt2 = SSA_NAME_DEF_STMT (op1); > > > > I assume that the original code wanted to restrict its processing > > to unary RHS of 'stmt' but still this "skips" arbitrary unary > > operations in 'stmt'? But maybe I'm not understanding jump functions > > here. If we have > > > > _2 = -param_1(D); > > _3 = ~_2; > > > > and stmt is _3 then we create a unary pass through JF with - (and the ~ > > gets lost?). > > > > It definitely looks like that. Unary arithmetic jump functions have > been added only recently with the IPA VRP propagation and I remember > staring at the stmt2 thingy for a while during review but then > apparently I forgot about it. > > It seems to me that the check should refer to stmt, I will do that and > see what breaks and how the IL looks at that point and then decide > where to go from there. it's the only use of stmt2 though, so it must have been added for some reason... (maybe somebody wanted to handle simple copy-propagation?!). I'd say rip it out and thus keep stmt2 == stmt. There must be a testcase added for this... Richard.
Re: [PATCH] Tree-level fix for PR 69526
On Wed, Nov 16, 2016 at 4:54 PM, Robin Dappwrote: > Found some time to look into this again. > >> Index: tree-ssa-propagate.c >> === >> --- tree-ssa-propagate.c(revision 240133) >> +++ tree-ssa-propagate.c(working copy) >> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d >>/* Replace real uses in the statement. */ >>did_replace |= replace_uses_in (stmt, get_value_fn); >> >> - /* If we made a replacement, fold the statement. */ >> - if (did_replace) >> + /* Fold the statement. */ >> + if (fold_stmt (, follow_single_use_edges)) >> { >> - fold_stmt (, follow_single_use_edges); >> + did_replace = true; >> stmt = gsi_stmt (i); >> } >> >> this would need compile-time cost evaluation (and avoid the tree-vrp.c >> folding part >> of your patch). > > This causes an ICE and bootstrap errors with newer revisions. I tested > my patch on r240691 where it still works. How can this be done properly now? Not sure, I'd have to investigate. It should still just work (throwing off a bootstrap with just that change over the weekend). >> OTOH given that you use this to decide whether you can use a combined >> constant >> that doesn't look necessary (if the constant is correct, that is) -- >> what you need >> to make sure is that the final operation, (T)(A) +- CST, does not overflow >> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the >> original operation. I think that always holds, thus the combine_ovf check >> looks >> superfluous to me. > > Removed the check and addressed the other remarks. > >> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2 >> does not overflow. But we do not really care for that, we want to know >> whether (T)X + CST' might invoke undefined behavior when the original >> expression did not. This involves range information on X. I don't >> see how we ensure this here. > > I guess I'm still missing an undefined behavior case. In which case can > (T)X + CST' trigger undefined behavior where the original statement did > not? I see the need for checking in the second pattern ((T)(X) + CST' -> > (T)(X + CST')), of course. Looking at + /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST) */ +#if GIMPLE + (for outer_op (plus minus) + (for inner_op (plus minus) + (simplify +(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (TREE_CODE (type) == INTEGER_TYPE && + TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3))) so the conversion (T) is widening or sign-changing. (&& go to the next line) If A + CST overflows we cannot do the transform (you check that with extract_range_from_binary_expr setting 'range_split'). If A + CST does not overflow but is unsigned and we are just changing sign (precision ==) then (T)A + (CST + CST) might overflow. Consider (int)(INT_MAX + 1) + 1 -> INT_MAX + 2. I think here the important part is whether A + CST fits in T for the case where we just change the type to a type with !TYPE_OVERFLOW_WRAPS. Certainly restricting to widenings would avoid the issue. >> But that's "easily fixable" by computing it in unsigned arithmetic, that is >> doing >> >> (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2)) > > Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit > into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is > implementation-defined and not undefined so it should work? Yes, implementation-defined beavior is fine. > Revised patch version attached. One thing I'm still not sure about is > the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0. > In the current patch version I always do a sign-extend of the first > constant (UINT_MAX here) which seems to cause no problems in the > testsuite and some custom tests still worked. + /* Sign-extend @1 to TYPE. */ + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED); not sure why you do always sign-extend. If the inner op is unsigned and we widen then that's certainly bogus considering your UINT_MAX example above. Does w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); not work for some reason? + /* Combine in outer, larger type. */ + bool combine_ovf = true; + combined_cst = wi::add (w1, w2, SIGNED, _ovf); as you ignore combine_ovf you can simply use combined_cst = wi::add (w1, w2); + /* Convert combined constant to tree of outer type if +there was no value range split in the original operation. */ + if (!range_split) + { I'd say you want to condition on range_split early, like with bool range_split; if (TYPE_OVERFLOW_UNDEFINED (inner_type) || ! (extract_range_from_binary_expr (, _split), range_split)) {
Re: [PATCH] Fix PR78515
Hi, On Fri, Nov 25, 2016 at 12:01:38PM +0100, Richard Biener wrote: > > I am testing the following to beat some sanity into > compute_complex_assign_jump_func. That the function does not handle ternary operations (did we have them since the beginning?) is clearly my fault and the patch is fine. Please commit it. > There's still that odd 'stmt2' > hanging around that gets set to sth else than stmt with > > op1 = gimple_assign_rhs1 (stmt); > > if (TREE_CODE (op1) == SSA_NAME) > { > if (SSA_NAME_IS_DEFAULT_DEF (op1)) > index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1)); > else > { > index = load_from_param (fbi, info->descriptors, >SSA_NAME_DEF_STMT (op1)); > stmt2 = SSA_NAME_DEF_STMT (op1); > > I assume that the original code wanted to restrict its processing > to unary RHS of 'stmt' but still this "skips" arbitrary unary > operations in 'stmt'? But maybe I'm not understanding jump functions > here. If we have > > _2 = -param_1(D); > _3 = ~_2; > > and stmt is _3 then we create a unary pass through JF with - (and the ~ > gets lost?). > It definitely looks like that. Unary arithmetic jump functions have been added only recently with the IPA VRP propagation and I remember staring at the stmt2 thingy for a while during review but then apparently I forgot about it. It seems to me that the check should refer to stmt, I will do that and see what breaks and how the IL looks at that point and then decide where to go from there. Thanks, Martin
Re: [PR 70965] Schedule extra pass_rebuild_cgraph_edges
> On Thu, Nov 24, 2016 at 5:44 PM, Martin Jamborwrote: > > Hi, > > > > after discussing this with Honza, we have decided that scheduling an > > extra pass_rebuild_cgraph_edges after pass_fixup_cfg is the correct > > way to keep the cgraph consistent with gimple IL when early IPA passes > > need it, such as is the case with the testcase in PR 70965. > > > > While needing an extra pass is never nice, this is a consequence of > > splitting pass_build_ssa_passes out of early optimization passes so > > that pass_chkp can be in between. > > > > The patch below fices the ICE in PR 70965 and has passed bootstrap and > > testing on x86_64-linux. OK for trunk? > > Ok. > > Richard. > > > Thanks, > > > > Martin > > > > > > 2016-11-24 Martin Jambor > > > > gcc/ > > * passes.def (pass_build_ssa_passes): Add pass_rebuild_cgraph_edges. > > > > gcc/testsuite/ > > * g++.dg/pr70965.C: New test. > > --- > > gcc/passes.def | 1 + > > gcc/testsuite/g++.dg/pr70965.C | 21 + > > 2 files changed, 22 insertions(+) > > create mode 100644 gcc/testsuite/g++.dg/pr70965.C > > > > diff --git a/gcc/passes.def b/gcc/passes.def > > index 85a5af0..5faf17f 100644 > > --- a/gcc/passes.def > > +++ b/gcc/passes.def > > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see > >NEXT_PASS (pass_build_ssa_passes); > >PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes) > >NEXT_PASS (pass_fixup_cfg); > > + NEXT_PASS (pass_rebuild_cgraph_edges); > >NEXT_PASS (pass_build_ssa); > >NEXT_PASS (pass_warn_nonnull_compare); > >NEXT_PASS (pass_ubsan); Actually you want to rebuild at the end of pass_build_ssa_passes passes queue. This may still trip an ICE if one of passes bellow modify CFG (which pass_nothorw does) Path is OK with that change. Honza > > diff --git a/gcc/testsuite/g++.dg/pr70965.C b/gcc/testsuite/g++.dg/pr70965.C > > new file mode 100644 > > index 000..d8a2c35 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/pr70965.C > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -std=c++11" } */ > > + > > +struct A {}; > > +struct B {}; > > +struct C { using p = int *; template using ra = A; }; > > +struct J : C { template struct K { typedef C::ra o; }; }; > > +template struct D > > +{ > > + struct H : J::K::o { H (J::p, A) : J::K::o () {} }; > > + H d; > > + D (const char *, const A = A ()) : d (0, x) {} > > +}; > > +extern template class D; > > +enum L { M }; > > +struct F { virtual char *foo (); }; > > +template struct I : B { static int foo (int) {} }; > > +struct G { typedef I t; }; > > +void foo (int) { G::t::foo (0); } > > +void bar (const D &, const D &, int, L); > > +void baz () try { foo (0); } catch (F ) { bar (e.foo (), "", 0, M); } > > -- > > 2.10.2 > >
Pointer Bounds Checker and trailing arrays (PR68270)
Hi, The patch below addresses PR68270. could you please take a look? 2016-11-25 Alexander Ivchenko* c-family/c.opt (flag_chkp_flexible_struct_trailing_arrays): Add new option. * tree-chkp.c (chkp_parse_array_and_component_ref): Forbid narrowing when chkp_parse_array_and_component_ref is used and the ARRAY_REF points to an array in the end of the struct. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 7d8a726..e45d6a2 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1166,6 +1166,11 @@ C ObjC C++ ObjC++ LTO RejectNegative Report Var(flag_chkp_narrow_to_innermost_ar Forces Pointer Bounds Checker to use bounds of the innermost arrays in case of nested static arryas access. By default outermost array is used. +fchkp-flexible-struct-trailing-arrays +C ObjC C++ ObjC++ LTO RejectNegative Report Var(flag_chkp_flexible_struct_trailing_arrays) +Allow Pointer Bounds Checker to treat all trailing arrays in structures as +possibly flexible. + fchkp-optimize C ObjC C++ ObjC++ LTO Report Var(flag_chkp_optimize) Init(-1) Allow Pointer Bounds Checker optimizations. By default allowed diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 2769682..40f99c3 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -3425,7 +3425,9 @@ chkp_parse_array_and_component_ref (tree node, tree *ptr, if (flag_chkp_narrow_bounds && !flag_chkp_narrow_to_innermost_arrray && (!last_comp - || chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1 + || (chkp_may_narrow_to_field (TREE_OPERAND (last_comp, 1)) + && !(flag_chkp_flexible_struct_trailing_arrays + && array_at_struct_end_p (var) { comp_to_narrow = last_comp; break;
[Patch, Fortran, OOP] PR 60853: Failure to disambiguate generic with unlimited polymorphic
Hi all, here is a patch that fixes a rejects-valid problem with an unlimited-polymorphic variable in a generic procedure. Removing the line with UNLIMITED_POLY fixes the error and the rest of the patch is just slightly refactoring the for-loop. Regtested successfully on x86_64-linux-gnu. Ok for trunk? Cheers, Janus 2016-11-25 Janus WeilPR fortran/60853 * interface.c (gfc_compare_interfaces): Remove bad special case for unlimited polymorphism. Refactor for loop. 2016-11-25 Janus Weil PR fortran/60853 * gfortran.dg/typebound_assignment_8.f90: New test case. Index: gcc/fortran/interface.c === --- gcc/fortran/interface.c (revision 242818) +++ gcc/fortran/interface.c (working copy) @@ -1728,11 +1728,9 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol This is also done when comparing interfaces for dummy procedures and in procedure pointer assignments. */ -for (;;) +for (; f1 || f2; f1 = f1->next, f2 = f2->next) { /* Check existence. */ - if (f1 == NULL && f2 == NULL) - break; if (f1 == NULL || f2 == NULL) { if (errmsg != NULL) @@ -1741,9 +1739,6 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol return 0; } - if (UNLIMITED_POLY (f1->sym)) - goto next; - if (strict_flag) { /* Check all characteristics. */ @@ -1772,9 +1767,6 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol return 0; } } -next: - f1 = f1->next; - f2 = f2->next; } return 1; ! { dg-do compile } ! ! PR 60853: [OOP] Failure to disambiguate generic with unlimited polymorphic ! ! Contributed by tlcclt module foo_mod implicit none type Vector contains procedure :: copyFromScalar procedure :: copyFromArray generic :: assignment(=) => copyFromScalar, copyFromArray end type contains subroutine copyFromScalar(this, scalar) class (Vector), intent(inout) :: this type (Vector), intent(in) :: scalar end subroutine subroutine copyFromArray(this, array) class (Vector), intent(inout) :: this class (*), intent(in) :: array(:) end subroutine end module
Re: [Patch, fortran] PR78293 - - [5/6/7 Regression] SIGABRT with function result used as argument
Dear All, Since both Andre and I have taken a good look at this rather small patch, I decided to commit it as revision 242875. My movements are such that I will have to hold off on 5- and 6-branches until the week after next. Best regards Paul PS Andre, I wasn't trying to make you feel bad but was making fun of myself :-) On 25 November 2016 at 13:00, Andre Vehreschildwrote: > Hi Thomas, > >> Andre put me to shame with a devastatingly simple replacement for a >> horribly complicated and wrong patch that I was getting into. > > I did not mean to. I happened to work in the same area and the PR's > description > rang a whole chorus of bells what might have been going wrong. > >> The part of the patch in trans-expr.c fixes the PR and the part in >> trans-stmt.c fixes a memory leak in function 'tt'. This latter fixes > > I propose to change the comment in the second chunk (trans-stmt.c) from: > > - /* Deallocate any allocatable components after all the allocations > -and assignments of expr3 have been completed. */ > > to > > + /* Deallocate any allocatable components in expressions that use a > + temporary, i.e. are not of expr-type EXPR_VARIABLE or force the > + use of a temporary, after the assignment of expr3 is completed. */ > > Mind that the indentation is corrupted by my mailer. Please polish the comment > a bit. It feels awkward. > >> half of the memory leaks in class_array_15.f03. I have noted the rest >> of this problem in PR38319 with which it is associated. >> >> Bootstraps and regtests on FC21/x86_64 - OK for trunk and, later >> 5-branch and 6-branch? > > Ok'ing a patch I participated in writing does not feel correct. So not doing > it. > When committing please add yourself also to the Changelog in gcc/fortran to > prevent confusion when this patch causes regressions. You are as much > responsible for fixing it as me. > > Regards, > Andre > >> Cheers >> >> Paul >> >> 2016-11-24 Andre Vehreschild >> >> PR fortran/78293 >> * trans-expr.c (gfc_conv_procedure_call): Prepend deallocation >> of alloctable components to post, rather than adding to >> se->post. >> * trans-stmt.c (gfc_trans_allocate): Move deallocation of expr3 >> allocatable components so that all expr3s are visited. >> >> 2016-11-24 Paul Thomas >> >> PR fortran/78293 >> * gfortran.dg/allocatable_function_10.f90: New test. >> * gfortran.dg/class_array_15.f03: Increase builtin_free count >> from 11 to 12. > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.
Hi Joseph, I have updated the patch with the changes, w.r.t to the formatting, there are tabs there that seem to be rendered at 4 spaces wide. In my editor setup at 8 spaces they are correct. Kind Regards, Tamar From: Joseph MyersSent: Thursday, November 24, 2016 6:28:18 PM To: Tamar Christina Cc: GCC Patches; Wilco Dijkstra; rguent...@suse.de; l...@redhat.com; Michael Meissner; nd Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE. On Thu, 24 Nov 2016, Tamar Christina wrote: > @@ -11499,6 +11503,53 @@ to classify. GCC treats the last argument as > type-generic, which > means it does not do default promotion from float to double. > @end deftypefn > > +@deftypefn {Built-in Function} int __builtin_isnan (...) > +This built-in implements the C99 isnan functionality which checks if > +the given argument represents a NaN. The return value of the > +function will either be a 0 (false) or a 1 (true). > +On most systems, when an IEEE 754 floating point is used this > +built-in does not produce a signal when a signaling NaN is used. "an IEEE 754 floating point" should probably be "an IEEE 754 floating-point type" or similar. > +GCC treats the argument as type-generic, which means it does > +not do default promotion from float to double. I think type names such as float and double should use @code{} in the manual. > +the given argument represents an Infinite number. The return Infinite should not have a capital letter there. > +@deftypefn {Built-in Function} int __builtin_iszero (...) > +This built-in implements the C99 iszero functionality which checks if This isn't C99, it's TS 18661-1:2014. > +the given argument represents the number 0. The return 0 or -0. > +@deftypefn {Built-in Function} int __builtin_issubnormal (...) > +This built-in implements the C99 issubnormal functionality which checks if Again, TS 18661-1. > +the given argument represents a sub-normal number. The return Do not hyphenate subnormal. > + switch (DECL_FUNCTION_CODE (decl)) > + { > + case BUILT_IN_SETJMP: > + lower_builtin_setjmp (gsi); > + data->cannot_fallthru = false; > + return; The indentation in this whole block of code (not all quoted) is wrong. > +real_inf(); Missing space before '('. > + emit_tree_cond (, dest, done_label, > + is_normal(, arg, loc), fp_normal); > + emit_tree_cond (, dest, done_label, > + is_zero(, arg, loc), fp_zero); > + emit_tree_cond (, dest, done_label, > + is_nan(, arg, loc), fp_nan); > + emit_tree_cond (, dest, done_label, > + is_infinity(, arg, loc), fp_infinite); Likewise. > + fndecl(, arg, loc), t_true); Likewise. -- Joseph S. Myers jos...@codesourcery.com diff --git a/gcc/builtins.c b/gcc/builtins.c index 3ac2d44148440b124559ba7cd3de483b7a74b72d..2340f60edb31ebf964367699aaf33ac8401dff41 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -160,7 +160,6 @@ static tree fold_builtin_0 (location_t, tree); static tree fold_builtin_1 (location_t, tree, tree); static tree fold_builtin_2 (location_t, tree, tree, tree); static tree fold_builtin_3 (location_t, tree, tree, tree, tree); -static tree fold_builtin_varargs (location_t, tree, tree*, int); static tree fold_builtin_strpbrk (location_t, tree, tree, tree); static tree fold_builtin_strstr (location_t, tree, tree, tree); @@ -2202,19 +2201,8 @@ interclass_mathfn_icode (tree arg, tree fndecl) switch (DECL_FUNCTION_CODE (fndecl)) { CASE_FLT_FN (BUILT_IN_ILOGB): - errno_set = true; builtin_optab = ilogb_optab; break; -CASE_FLT_FN (BUILT_IN_ISINF): - builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: -case BUILT_IN_ISFINITE: -CASE_FLT_FN (BUILT_IN_FINITE): -case BUILT_IN_FINITED32: -case BUILT_IN_FINITED64: -case BUILT_IN_FINITED128: -case BUILT_IN_ISINFD32: -case BUILT_IN_ISINFD64: -case BUILT_IN_ISINFD128: - /* These builtins have no optabs (yet). */ + errno_set = true; + builtin_optab = ilogb_optab; break; default: gcc_unreachable (); @@ -2233,8 +2221,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) } /* Expand a call to one of the builtin math functions that operate on - floating point argument and output an integer result (ilogb, isinf, - isnan, etc). + floating point argument and output an integer result (ilogb, etc). Return 0 if a normal call should be emitted rather than expanding the function in-line. EXP is the expression that is a call to the builtin function; if convenient, the result should be placed in TARGET. */ @@ -5997,11 +5984,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, CASE_FLT_FN (BUILT_IN_ILOGB): if (! flag_unsafe_math_optimizations) break; -
Re: [Patch, fortran] PR78293 - - [5/6/7 Regression] SIGABRT with function result used as argument
Hi Thomas, > Andre put me to shame with a devastatingly simple replacement for a > horribly complicated and wrong patch that I was getting into. I did not mean to. I happened to work in the same area and the PR's description rang a whole chorus of bells what might have been going wrong. > The part of the patch in trans-expr.c fixes the PR and the part in > trans-stmt.c fixes a memory leak in function 'tt'. This latter fixes I propose to change the comment in the second chunk (trans-stmt.c) from: - /* Deallocate any allocatable components after all the allocations -and assignments of expr3 have been completed. */ to + /* Deallocate any allocatable components in expressions that use a + temporary, i.e. are not of expr-type EXPR_VARIABLE or force the + use of a temporary, after the assignment of expr3 is completed. */ Mind that the indentation is corrupted by my mailer. Please polish the comment a bit. It feels awkward. > half of the memory leaks in class_array_15.f03. I have noted the rest > of this problem in PR38319 with which it is associated. > > Bootstraps and regtests on FC21/x86_64 - OK for trunk and, later > 5-branch and 6-branch? Ok'ing a patch I participated in writing does not feel correct. So not doing it. When committing please add yourself also to the Changelog in gcc/fortran to prevent confusion when this patch causes regressions. You are as much responsible for fixing it as me. Regards, Andre > Cheers > > Paul > > 2016-11-24 Andre Vehreschild> > PR fortran/78293 > * trans-expr.c (gfc_conv_procedure_call): Prepend deallocation > of alloctable components to post, rather than adding to > se->post. > * trans-stmt.c (gfc_trans_allocate): Move deallocation of expr3 > allocatable components so that all expr3s are visited. > > 2016-11-24 Paul Thomas > > PR fortran/78293 > * gfortran.dg/allocatable_function_10.f90: New test. > * gfortran.dg/class_array_15.f03: Increase builtin_free count > from 11 to 12. -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).
On Fri, Nov 25, 2016 at 12:20 PM, Bin.Chengwrote: > On Fri, Nov 25, 2016 at 8:23 AM, Richard Biener > wrote: >> On Thu, Nov 24, 2016 at 4:22 PM, Bin Cheng wrote: >>> Hi, >>> This patch fixes two issues in newly introduced pattern: (cond (cmp >>> (convert1? @1) @3) (convert2? @1) @2). >>> For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, >>> this patch adds check for that. >>> Note we don't check c1_type/c2_type for that because it's guarded by >>> INTEGER_CST@. >>> For PR78510, the ICE is covered by revision 242831, but underlying issue is >>> when the block of conditions >>> is not satisfied, the last case simplification is applied anyway since code >>> is initialized to cmp at the >>> first place. This patch fixes that by setting code to proper value only >>> when transformation is valid, it >>> also incorporates Marc's suggestion by using cmp directly. Two tests are >>> added too. >>> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures? >> >> Ok. >> >> Btw, >> >> - (cond (cmp@0 (convert1? @1) ... >> ... >> - enum tree_code code = TREE_CODE (@0)... >> >> shows another misconception (I didn't notice...) in that TREE_CODE of a >> captured >> expression results in the expression code. On GENERIC that works but on >> GIMPLE >> @0 will be a SSA_NAME and thus 'code' was SSA_NAME ... > Yes, that was a mistake. > Patch updated by including new test from duplicate PR78517. Is it OK? Yes. Richard. > Thanks, > bin > > 2016-11-24 Bin Cheng > > PR middle-end/78507 > PR middle-end/78510 > PR middle-end/78517 > * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use > cmp directly, rather than cmp_code. Initialize code to ERROR_MARK > and set it to result code if transformation is valid. Use code EQ > directly in last simplification case. > > gcc/testsuite/ChangeLog > 2016-11-24 Bin Cheng > > PR middle-end/78507 > PR middle-end/78510 > PR middle-end/78517 > * g++.dg/torture/pr78507.C: New test. > * gcc.dg/torture/pr78510.c: New test. > * gcc.dg/torture/pr78517.c: New test. > >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>> >>> 2016-11-24 Bin Cheng >>> >>> PR middle-end/78507 >>> PR middle-end/78510 >>> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use >>> cmp directly, rather than cmp_code. Initialize code to ERROR_MARK >>> and set it to result code if transformation is valid. Use code EQ >>> directly in last simplification case. >>> >>> gcc/testsuite/ChangeLog >>> 2016-11-24 Bin Cheng >>> >>> PR middle-end/78507 >>> PR middle-end/78510 >>> * g++.dg/torture/pr78507.C: New test. >>> * gcc.dg/torture/pr78510.c: New test.
Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlingerwrote: > Hi! > > This improves the stack usage on the sha512 test case for the case > without hardware fpu and without iwmmxt by splitting all di-mode > patterns right while expanding which is similar to what the shift-pattern > does. It does nothing in the case iwmmxt and fpu=neon or vfp as well as > thumb1. > I would go further and do this in the absence of Neon, the VFP unit being there doesn't help with DImode operations i.e. we do not have 64 bit integer arithmetic instructions without Neon. The main reason why we have the DImode patterns split so late is to give a chance for folks who want to do 64 bit arithmetic in Neon a chance to make this work as well as support some of the 64 bit Neon intrinsics which IIRC map down to these instructions. Doing this just for soft-float doesn't improve the default case only. I don't usually test iwmmxt and I'm not sure who has the ability to do so, thus keeping this restriction for iwMMX is fine. > It reduces the stack usage from 2300 to near optimal 272 bytes (!). > > Note this also splits many ldrd/strd instructions and therefore I will > post a followup-patch that mitigates this effect by enabling the ldrd/strd > peephole optimization after the necessary reg-testing. > > > Bootstrapped and reg-tested on arm-linux-gnueabihf. What do you mean by arm-linux-gnueabihf - when folks say that I interpret it as --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 or (--with-fpu=neon). If you've really bootstrapped and regtested it on armhf, doesn't this patch as it stand have no effect there i.e. no change ? arm-linux-gnueabihf usually means to me someone has configured with --with-float=hard, so there are no regressions in the hard float ABI case, Ramana > Is it OK for trunk? > > > Thanks > Bernd.
Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).
On Fri, Nov 25, 2016 at 8:23 AM, Richard Bienerwrote: > On Thu, Nov 24, 2016 at 4:22 PM, Bin Cheng wrote: >> Hi, >> This patch fixes two issues in newly introduced pattern: (cond (cmp >> (convert1? @1) @3) (convert2? @1) @2). >> For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, >> this patch adds check for that. >> Note we don't check c1_type/c2_type for that because it's guarded by >> INTEGER_CST@. >> For PR78510, the ICE is covered by revision 242831, but underlying issue is >> when the block of conditions >> is not satisfied, the last case simplification is applied anyway since code >> is initialized to cmp at the >> first place. This patch fixes that by setting code to proper value only >> when transformation is valid, it >> also incorporates Marc's suggestion by using cmp directly. Two tests are >> added too. >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures? > > Ok. > > Btw, > > - (cond (cmp@0 (convert1? @1) ... > ... > - enum tree_code code = TREE_CODE (@0)... > > shows another misconception (I didn't notice...) in that TREE_CODE of a > captured > expression results in the expression code. On GENERIC that works but on > GIMPLE > @0 will be a SSA_NAME and thus 'code' was SSA_NAME ... Yes, that was a mistake. Patch updated by including new test from duplicate PR78517. Is it OK? Thanks, bin 2016-11-24 Bin Cheng PR middle-end/78507 PR middle-end/78510 PR middle-end/78517 * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use cmp directly, rather than cmp_code. Initialize code to ERROR_MARK and set it to result code if transformation is valid. Use code EQ directly in last simplification case. gcc/testsuite/ChangeLog 2016-11-24 Bin Cheng PR middle-end/78507 PR middle-end/78510 PR middle-end/78517 * g++.dg/torture/pr78507.C: New test. * gcc.dg/torture/pr78510.c: New test. * gcc.dg/torture/pr78517.c: New test. > > Thanks, > Richard. > >> Thanks, >> bin >> >> 2016-11-24 Bin Cheng >> >> PR middle-end/78507 >> PR middle-end/78510 >> * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use >> cmp directly, rather than cmp_code. Initialize code to ERROR_MARK >> and set it to result code if transformation is valid. Use code EQ >> directly in last simplification case. >> >> gcc/testsuite/ChangeLog >> 2016-11-24 Bin Cheng >> >> PR middle-end/78507 >> PR middle-end/78510 >> * g++.dg/torture/pr78507.C: New test. >> * gcc.dg/torture/pr78510.c: New test. diff --git a/gcc/match.pd b/gcc/match.pd index 3aa27fa..2d4e019 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1971,14 +1971,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2). */ (for cmp (lt le gt ge eq) (simplify - (cond (cmp@0 (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2) + (cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2) (with { tree from_type = TREE_TYPE (@1); tree c1_type = TREE_TYPE (@3), c2_type = TREE_TYPE (@2); - enum tree_code code = TREE_CODE (@0), cmp_code = TREE_CODE (@0); + enum tree_code code = ERROR_MARK; - if (int_fits_type_p (@2, from_type) + if (INTEGRAL_TYPE_P (from_type) +&& int_fits_type_p (@2, from_type) && (types_match (c1_type, from_type) || (TYPE_PRECISION (c1_type) > TYPE_PRECISION (from_type) && (TYPE_UNSIGNED (from_type) @@ -1988,37 +1989,38 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_UNSIGNED (from_type) || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type) { -if (code != EQ_EXPR) +if (cmp != EQ_EXPR) { if (wi::to_widest (@3) == (wi::to_widest (@2) - 1)) { /* X <= Y - 1 equals to X < Y. */ -if (cmp_code == LE_EXPR) +if (cmp == LE_EXPR) code = LT_EXPR; /* X > Y - 1 equals to X >= Y. */ -if (cmp_code == GT_EXPR) +if (cmp == GT_EXPR) code = GE_EXPR; } if (wi::to_widest (@3) == (wi::to_widest (@2) + 1)) { /* X < Y + 1 equals to X <= Y. */ -if (cmp_code == LT_EXPR) +if (cmp == LT_EXPR) code = LE_EXPR; /* X >= Y + 1 equals to X > Y. */ -if (cmp_code == GE_EXPR) +if (cmp == GE_EXPR) code = GT_EXPR; } -if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3)) +if (code != ERROR_MARK +|| wi::to_widest (@2) ==
Re: [PATCH] combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)
On 24/11/2016 15:01, Michael Matz wrote: >> > (set (reg:DI 69) >> > (zero_extract:DI (reg:DI 65 [ v_x ]) <- >> > (const_int 32 [0x20]) >> > (const_int 20 [0x14]))) > Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid > when the top 20 bits of v_x are known to be zero already, or alternatively > if it's know that the top 32bits of reg 69 won't matter in the future. I > wonder where that knowledge is coming from. s390 is BITS_BIG_ENDIAN, so zext(v_x,32,20) really means the same as zext(v_x,32,12) with the more common convention. Insn 4 and 5 respectively set the low and high 32-bits of (reg:DI 65), so the transform seems fine to me. Paolo
[PATCH] Fix PR78515
I am testing the following to beat some sanity into compute_complex_assign_jump_func. There's still that odd 'stmt2' hanging around that gets set to sth else than stmt with op1 = gimple_assign_rhs1 (stmt); if (TREE_CODE (op1) == SSA_NAME) { if (SSA_NAME_IS_DEFAULT_DEF (op1)) index = ipa_get_param_decl_index (info, SSA_NAME_VAR (op1)); else { index = load_from_param (fbi, info->descriptors, SSA_NAME_DEF_STMT (op1)); stmt2 = SSA_NAME_DEF_STMT (op1); I assume that the original code wanted to restrict its processing to unary RHS of 'stmt' but still this "skips" arbitrary unary operations in 'stmt'? But maybe I'm not understanding jump functions here. If we have _2 = -param_1(D); _3 = ~_2; and stmt is _3 then we create a unary pass through JF with - (and the ~ gets lost?). Anyway, the following is a step in the right direction -- if you want to address the above (maybe with some comments) you can happily take the patch from here, otherwise we can followup this patch. Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk? Thanks, Richard. 2016-11-25 Richard BienerPR ipa/78515 * ipa-prop.c (compute_complex_assign_jump_func): Properly identify unary, binary and single RHSs. * tree.def (BIT_INSERT_EXPR): Adjust tree code name. * gcc.dg/torture/pr78515.c: New testcase. diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 90c19fc..642111d 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -1177,29 +1177,37 @@ compute_complex_assign_jump_func (struct ipa_func_body_info *fbi, if (index >= 0) { - tree op2 = gimple_assign_rhs2 (stmt); - - if (op2) + switch (gimple_assign_rhs_class (stmt)) { - if (!is_gimple_ip_invariant (op2) - || (TREE_CODE_CLASS (gimple_expr_code (stmt)) != tcc_comparison - && !useless_type_conversion_p (TREE_TYPE (name), -TREE_TYPE (op1 - return; - - ipa_set_jf_arith_pass_through (jfunc, index, op2, -gimple_assign_rhs_code (stmt)); - } - else if (gimple_assign_single_p (stmt)) - { - bool agg_p = parm_ref_data_pass_through_p (fbi, index, call, tc_ssa); - ipa_set_jf_simple_pass_through (jfunc, index, agg_p); + case GIMPLE_BINARY_RHS: + { + tree op2 = gimple_assign_rhs2 (stmt); + if (!is_gimple_ip_invariant (op2) + || ((TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) +!= tcc_comparison) + && !useless_type_conversion_p (TREE_TYPE (name), + TREE_TYPE (op1 + return; + + ipa_set_jf_arith_pass_through (jfunc, index, op2, + gimple_assign_rhs_code (stmt)); + break; + } + case GIMPLE_SINGLE_RHS: + { + bool agg_p = parm_ref_data_pass_through_p (fbi, index, call, + tc_ssa); + ipa_set_jf_simple_pass_through (jfunc, index, agg_p); + break; + } + case GIMPLE_UNARY_RHS: + if (is_gimple_assign (stmt2) + && gimple_assign_rhs_class (stmt2) == GIMPLE_UNARY_RHS + && ! CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))) + ipa_set_jf_unary_pass_through (jfunc, index, + gimple_assign_rhs_code (stmt2)); + default:; } - else if (is_gimple_assign (stmt2) - && (gimple_expr_code (stmt2) != NOP_EXPR) - && (TREE_CODE_CLASS (gimple_expr_code (stmt2)) == tcc_unary)) - ipa_set_jf_unary_pass_through (jfunc, index, - gimple_assign_rhs_code (stmt2)); return; } diff --git a/gcc/testsuite/gcc.dg/torture/pr78515.c b/gcc/testsuite/gcc.dg/torture/pr78515.c new file mode 100644 index 000..d700db5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr78515.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mavx512bw" { target x86_64-*-* i?86-*-* } } */ + +typedef unsigned V __attribute__ ((vector_size (64))); + +V g; + +static V +baz (V u, V v) +{ + g += u; + return v + g + 1; +} + +static V +bar (V u) +{ + u[0] = 0; + return baz(u, (V){}); +} + +V +foo () +{ + return (V){bar((V){})[0]}; +} diff --git a/gcc/tree.def b/gcc/tree.def index 2c35540..e093307 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -865,7 +865,7 @@ DEFTREECODE (FDESC_EXPR, "fdesc_expr", tcc_expression, 2) introducing a quaternary operation. The replaced bits shall be fully inside the container. If the container is of vector type, then these bits shall be aligned with its elements. */ -DEFTREECODE (BIT_INSERT_EXPR,
Re: [PATCH] Fix Ada bootstrap
On Fri, 25 Nov 2016, Eric Botcazou wrote: > > Martin reported a profiledbootstrap issue with a SSA coalescing issue > > with a > > > > return _17(ab); > > OK. > > > Yeah, seeing this one as well. > > It fails at -O0 with just: > > eric@polaris:~/build/gcc/native> cat p.adb > procedure P is > > subtype Char is Character range 'W' .. 'Z'; > > type Arr is array (Char range <>) of Integer; > > type Rec (D : Char) is record > A : Arr (D .. 'W'); > end record; > > begin > null; > end; > eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet p.adb > +===GNAT BUG DETECTED==+ > | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) > GCC error:| > | in size_binop_loc, at fold-const.c:1744 | > | Error detected at p.adb:8:20 > > so I presume this comes from the recent match.pd changes. Could be - please open a PR. Thanks, Richard.
Re: [PATCH] Don't use priority {cd}tors if not supported by a target (PR, gcov-profile/78086)
> Hi. > > Using priority {cd}tors on a target that does not support that can cause > failures (see the PR). > Apart from that, I decided to use priority 100 for both gcov_init and > gcov_exit functions as > the reserved range includes priority 100. Moreover, I enhanced test-cases we > have. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > >From 05a0dcb13d608facdd1c85f4101cd821634d07cd Mon Sep 17 00:00:00 2001 > From: marxin> Date: Wed, 26 Oct 2016 12:50:35 +0200 > Subject: [PATCH] Don't use priority {cd}tors if not supported by a target (PR > gcov-profile/78086) > > gcc/testsuite/ChangeLog: > > 2016-10-26 Martin Liska > > * g++.dg/gcov/pr16855.C: Clean up the test case. > * g++.dg/gcov/pr16855-priority.C: New test. > > gcc/ChangeLog: > > 2016-10-26 Martin Liska > > * coverage.c (build_init_ctor): Don't use priority {cd}tors if > not supported by a target. Set priority to 100 if possible. > (build_gcov_exit_decl): Likewise. OK, thanks! Honza
[AArch64][GCC][PATCH] Add more Poly64_t intrinsics to GCC
Hi all, This adds the following NEON intrinsics to the Aarch64 back-end of GCC: * vsriq_n_p64 * vsri_n_p64 * vextq_p64 * vext_p64 * vceq_p64 * vbslq_p64 * vbsl_p64 Added new tests for these and ran regression tests on aarch64-none-linux-gnu and on arm-none-linux-gnueabihf. Tests added in other patch series. Ok for trunk? Thanks, Tamar gcc/ 2016-11-25 Tamar Christina* config/aarch64/aarch64-builtins.c (vsriq_n_p64, vsri_n_p64): Added poly type. (vextq_p64, vext_p64): Likewise. (vceq_p64, vbslq_p64, vbsl_p64): Likewise.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index e1154b4b27820c0075d9a9edb4f8b48ef4f06b07..49efeea6f90cf8535aec4b9287bc9b30b7b79e60 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -429,6 +429,7 @@ /* Implemented by aarch64_simd_bsl. */ BUILTIN_VDQQH (BSL_P, simd_bsl, 0) + VAR2 (BSL_P, simd_bsl,0, di, v2di) BUILTIN_VSDQ_I_DI (BSL_U, simd_bsl, 0) BUILTIN_VALLDIF (BSL_S, simd_bsl, 0) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index c463e3b698a47b9b5c5a04e0fb7fff1f71817af1..ddaaa4f8c5615b979df8f765760c41c8e158fba1 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10164,6 +10164,19 @@ vrsqrteq_u32 (uint32x4_t a) result; \ }) +#define vsri_n_p64(a, b, c) \ + __extension__\ +({ \ + poly64x1_t b_ = (b); \ + poly64x1_t a_ = (a); \ + poly64x1_t result; \ + __asm__ ("sri %d0,%d2,%3" \ + : "=w"(result) \ + : "0"(a_), "w"(b_), "i"(c)\ + : /* No clobbers. */); \ + result;\ + }) + #define vsriq_n_p8(a, b, c) \ __extension__ \ ({ \ @@ -10190,6 +10203,19 @@ vrsqrteq_u32 (uint32x4_t a) result; \ }) +#define vsriq_n_p64(a, b, c) \ + __extension__\ +({ \ + poly64x2_t b_ = (b); \ + poly64x2_t a_ = (a); \ + poly64x2_t result; \ + __asm__ ("sri %0.2d,%2.2d,%3" \ + : "=w"(result) \ + : "0"(a_), "w"(b_), "i"(c)\ + : /* No clobbers. */); \ + result;\ + }) + __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vtst_p8 (poly8x8_t a, poly8x8_t b) @@ -11320,6 +11346,13 @@ vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c) { return __builtin_aarch64_simd_bslv4hi_pupp (__a, __b, __c); } +__extension__ extern __inline poly64x1_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbsl_p64 (uint64x1_t __a, poly64x1_t __b, poly64x1_t __c) +{ + return (poly64x1_t) + {__builtin_aarch64_simd_bsldi_pupp (__a[0], __b[0], __c[0])}; +} __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) @@ -11428,6 +11461,13 @@ vbslq_s16 (uint16x8_t __a, int16x8_t __b, int16x8_t __c) return __builtin_aarch64_simd_bslv8hi_suss (__a, __b, __c); } +__extension__ extern __inline poly64x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbslq_p64 (uint64x2_t __a, poly64x2_t __b, poly64x2_t __c) +{ + return __builtin_aarch64_simd_bslv2di_pupp (__a, __b, __c); +} + __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vbslq_s32 (uint32x4_t __a, int32x4_t __b, int32x4_t __c) @@ -11959,6 +11999,13 @@ vceq_p8 (poly8x8_t __a, poly8x8_t __b) return (uint8x8_t) (__a == __b); } +__extension__ extern __inline uint64x1_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vceq_p64 (poly64x1_t __a, poly64x1_t __b) +{ + return (uint64x1_t) (__a == __b); +} + __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vceq_s8 (int8x8_t __a, int8x8_t __b) @@ -15620,6 +15667,15 @@ vext_p16 (poly16x4_t __a, poly16x4_t __b, __const int __c) #endif } +__extension__ extern __inline poly64x1_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vext_p64 (poly64x1_t __a, poly64x1_t __b, __const int __c) +{ + __AARCH64_LANE_CHECK (__a, __c); + /* The only possible index to the assembler instruction returns element 0. */ + return __a; +} + __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vext_s8 (int8x8_t __a, int8x8_t __b, __const int __c) @@ -15788,6 +15844,18 @@ vextq_p16 (poly16x8_t __a, poly16x8_t __b, __const int __c) #endif } +__extension__ extern __inline poly64x2_t +__attribute__
Re: [PATCH] Fix Ada bootstrap
> Martin reported a profiledbootstrap issue with a SSA coalescing issue > with a > > return _17(ab); OK. > Yeah, seeing this one as well. It fails at -O0 with just: eric@polaris:~/build/gcc/native> cat p.adb procedure P is subtype Char is Character range 'W' .. 'Z'; type Arr is array (Char range <>) of Integer; type Rec (D : Char) is record A : Arr (D .. 'W'); end record; begin null; end; eric@polaris:~/build/gcc/native> gcc/gnat1 -quiet p.adb +===GNAT BUG DETECTED==+ | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) GCC error:| | in size_binop_loc, at fold-const.c:1744 | | Error detected at p.adb:8:20 so I presume this comes from the recent match.pd changes. -- Eric Botcazou
Re: [GCC][AArch64][PATCH][Testsuite] Fix failing test vector_initialization_nostack.c
On Mon, Nov 07, 2016 at 02:05:27PM +, Tamar Christina wrote: > Hi all, > > This fixes (PR78142) by turning off scheduling for the test. > r241590 is causing more registers to be used and so > the SP registered happens to be picked and used. > > This test I believe was checking explicitly that the > SP is not used if not needed. I can't say I'm convinced by the fix - turning off passes to work around a failure suggests a problem with the test, and turning off the pass reduces coverage of what we're actually trying to test for. Can the test be rewritten to only require construction of one vector in the char case? Thanks, James > gcc/testsuite/ > > 2016-11-07 Tamar Christina> > PR middle-end/78142 > * gcc.target/aarch64/vector_initialization_nostack.c > (dg-options): Disabled scheduling. > diff --git a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c > b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c > index > bbad04d00263b6a91b826b4911af92bdd226c821..71699281c5ce79fb5cf37e47b8ba078721c19f3a > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c > +++ b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */ > +/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model > -fno-schedule-insns" } */ > float arr_f[100][100]; > float > f9 (void)
Re: [PATCH] Fix PR78343
On Thu, 24 Nov 2016, Richard Biener wrote: > > I am testing the following patch for an optimization regression where > a loop made dead by final value replacement was made used again by > DOM 20 passes later. The real issue here is that we do not get rid > of dead loops until very late so this patch makes sure to do that. > We could schedule it later (but better no later than unrolling > as that might expose a pretty inefficient way of removing a dead loop). > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. As expected some testcases need adjustment, thus applied as follows. Richard. 2016-11-24 Richard BienerPR tree-optimization/78343 * passes.def: Add CD-DCE pass after loop splitting. * tree-ssa-dce.c (find_obviously_necessary_stmts): Move SCEV init/finalize ... (perform_tree_ssa_dce): ... here. Deal with being executed inside the loop pipeline in aggressive mode. * gcc.dg/tree-ssa/sccp-2.c: New testcase. * gcc.dg/autopar/uns-outer-6.c: Adjust. * gcc.dg/tree-ssa/20030808-1.c: Likewise. * gcc.dg/tree-ssa/20040305-1.c: Likewise. * gcc.dg/vect/pr38529.c: Likewise. diff --git a/gcc/passes.def b/gcc/passes.def index 2a470a7..2fa682b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -271,6 +271,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_loop_split); + /* All unswitching, final value replacement and splitting can expose +empty loops. Remove them now. */ + NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c index dc2870b..5af60b0 100644 --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c @@ -25,7 +25,7 @@ parloop (int N) for (i = 0; i < N; i++) { for (j = 0; j < N; j++) - y[i]=x[i][j]; + y[i] += x[i][j]; sum += y[i]; } g_sum = sum; @@ -46,6 +46,10 @@ main (void) /* Check that outer loop is parallelized. */ +/* This fails because we have + FAILED: data dependencies exist across iterations + + /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops2" } } */ /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops2" } } */ /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c index 7cc5404..cda86a7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c @@ -33,8 +33,8 @@ delete_dead_jumptables () /* There should be no loads of ->code. If any exist, then we failed to optimize away all the IF statements and the statements feeding their conditions. */ -/* { dg-final { scan-tree-dump-times "->code" 0 "cddce2"} } */ +/* { dg-final { scan-tree-dump-times "->code" 0 "cddce3"} } */ /* There should be no IF statements. */ -/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */ +/* { dg-final { scan-tree-dump-times "if " 0 "cddce3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c index 501e28c..d1a9af8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040305-1.c @@ -27,4 +27,4 @@ void foo(int edx, int eax) /* After cddce we should have two IF statements remaining as the other two tests can be threaded. */ -/* { dg-final { scan-tree-dump-times "if " 2 "cddce2"} } */ +/* { dg-final { scan-tree-dump-times "if " 2 "cddce3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c new file mode 100644 index 000..099b281 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sccp-2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int +test(unsigned int quant) +{ + unsigned int sum = 0; + for (unsigned int i = 0; i < quant; ++i) +sum += quant; + return sum; +} + +/* A single basic-block should remain (computing and + returning quant * quant). */ +/* { dg-final { scan-tree-dump-times "bb" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr38529.c b/gcc/testsuite/gcc.dg/vect/pr38529.c index 171adeb..9b5919d 100644 --- a/gcc/testsuite/gcc.dg/vect/pr38529.c +++ b/gcc/testsuite/gcc.dg/vect/pr38529.c @@ -11,7 +11,3 @@ void foo() for (j = 0; j < 17; ++j) a[i] = 0; } - -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */ - - diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 7b9814e..50b5eef 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -400,7
Re: [PATCH] Fix Ada bootstrap
On Fri, 25 Nov 2016, Eric Botcazou wrote: > > 2016-11-25 Richard Biener <rguent...@suse.de> > > > > * gimple-fold.c (fold_stmt_1): Check may_propagate_copy > > before valueizing return stmts. > > What failure is it supposed to fix? The compiler bootstraps for me at > r242863 > although there is 1 ACATS regression (c41104a): Martin reported a profiledbootstrap issue with a SSA coalescing issue with a return _17(ab); I've seen all ada tests fail with my recent testings (which included a workaround for the VRP issue), all tests fail to link not finding the RTS. > +===GNAT BUG DETECTED==+ > | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) > GCC error:| > | in size_binop_loc, at fold-const.c:1744 | > | Error detected at c41104a.adb:60:33 | Yeah, seeing this one as well. Richard.
Re: [PATCH] Fix Ada bootstrap
On 11/25/2016 10:20 AM, Richard Biener wrote: > > Bootstrap and testing in progress. > > Richard. > > 2016-11-25 Richard Biener> > * gimple-fold.c (fold_stmt_1): Check may_propagate_copy > before valueizing return stmts. > > Index: gcc/gimple-fold.c > === > --- gcc/gimple-fold.c (revision 242864) > +++ gcc/gimple-fold.c (working copy) > @@ -4414,7 +4414,8 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, > if (ret && TREE_CODE (ret) == SSA_NAME && valueize) > { > tree val = valueize (ret); > - if (val && val != ret) > + if (val && val != ret > + && may_propagate_copy (ret, val)) > { > gimple_return_set_retval (ret_stmt, val); > changed = true; > Profiled bootstrap on x86_64 just successfully finished with patch applied. Thanks! Martin
Re: [PATCH] Fix Ada bootstrap
> 2016-11-25 Richard Biener <rguent...@suse.de> > > * gimple-fold.c (fold_stmt_1): Check may_propagate_copy > before valueizing return stmts. What failure is it supposed to fix? The compiler bootstraps for me at r242863 although there is 1 ACATS regression (c41104a): +===GNAT BUG DETECTED==+ | 7.0.0 20161125 (experimental) [trunk revision 242863] (x86_64-suse-linux) GCC error:| | in size_binop_loc, at fold-const.c:1744 | | Error detected at c41104a.adb:60:33 | -- Eric Botcazou
[RFC PATCH, i386]: Separate operations with mask registers from operations with general registers
Hello! Attached patch tries to fix the mess with operations where register allocator is free to use either mask or general registers. There are plenty of problems with this approach: a) missing operations wth general registers - kxnor operation with general register does not exist - kandn operation with general registers exists for TARGET_BMI only b) register allocation problems - DImode operations with 64bit general registers do not exist on 32bit target - register allocator is free to allocate operation with either register set, resulting in costly moves between general and mask register sets Mask operations can be generated in a consistent way using corresponding built-ins. To prevent RA problems, we have to separate mask ops from general ops - this way we always select operation with well defined register set. There is special problem with 64bit mask ops on 32bit targets: defining these operations as a named pattern, where only mask registers are available, will result in the situation where mask operation will be used for 64bit values living in a pair of general registers. RA will obviously generate many inter-regset moves in this situation. Another problem in point a) above is dependence of general-reg operations on various TARGET_AVX* flags. This can be seen with kxnor pattern, which has to be artificially enabled during register allocation just to split non-masked operation back to sequence of general operations. Similar situation arises with kandn on non-BMI targets. IMO the mentioned interferences do not warrant mixing of mask and general operations. I suspect here will be no end of "the value is moved between integer and mask registers unnecessary" type of bugs (as was the situation on 32bit targets in the past, where DImode values were moved through MMX registers). Users can and should use relevant builtins when operating with mask registers (n.b.: generic logic operations can still be used; RA will move values between regsets - but in a *consistent way*). FWIW, the few folding opportunities with mask builtins can be implemented in a target-folding function. The attached patch also paves the way for implementation of new builtins in patch [1] that are otherwise nearly to impossible to implement. 2016-11-25 Uros Bizjak* config/i386/i386.md (UNSPEC_KMASKOP): New. (UNSPEC_KMOV): Remove. (kmovw): Expand to plain HImode move. (k): Rename from *k. Use register_operand predicates. Tag pattern with UNSPEC_KMASKOP. Remove corresponding clobber-removing splitter. (*anddi_1): Remove mask register alternatives. (*andsi_1): Ditto. (*andhi_1): Ditto. (*andqi_1): Ditto. (*_1): Ditto. (*qi_1): Ditto. (kandn): Use SWI1248_AVX512BW mode iterator. Remove general register alternatives. Tag pattern with UNSPEC_KMASKOP. Remove corresponding splitter to operation with general registers. (*andn): Rename from *bmi_andn_. (*andn): New pattern. (*kxnor): Remove general register alternatives. Tag pattern with UNSPEC_KMASKOP. Remove corresponding splitter to operation with general registers. (knot): New insn pattern. (*one_cmpl2_1): Remove mask register alternatives. (one_cmplqi2_1): Ditto. (*k): Rename from *k3. Tag pattern with UNSPEC_KMASKOP. Add mode attribute. * config/i386/predicates.md (mask_reg_operand): Remove predicate. * config/i386/sse.md (vec_unpacks_hi_hi): Update pattern to generate kmaskop shift. (vec_unpacks_hi_): Ditto. * config/i386/i386-builtin.def (__builtin_ia32_kandhi): Use CODE_FOR_kandhi. (__builtin_ia32_knothi): Use CODE_FOR_knothi. (__builtin_ia32_korhi): Use CODE_FOR_kiorhi. (__builtin_ia32_kxorhi): Use CODE_FOR_kxorhi. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. There are a couple fo trivial scan-asm errors due to a rename and a missing kmov insn, where RA choose much more optimal movw , insn instead. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01088.html Uros. Index: config/i386/i386-builtin.def === --- config/i386/i386-builtin.def(revision 242841) +++ config/i386/i386-builtin.def(working copy) @@ -1436,15 +1436,15 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND) /* Mask arithmetic operations */ -BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI) -BDESC (OPTION_MASK_ISA_AVX512F,
[PATCH] Fix Ada bootstrap
Bootstrap and testing in progress. Richard. 2016-11-25 Richard Biener* gimple-fold.c (fold_stmt_1): Check may_propagate_copy before valueizing return stmts. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 242864) +++ gcc/gimple-fold.c (working copy) @@ -4414,7 +4414,8 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, if (ret && TREE_CODE (ret) == SSA_NAME && valueize) { tree val = valueize (ret); - if (val && val != ret) + if (val && val != ret + && may_propagate_copy (ret, val)) { gimple_return_set_retval (ret_stmt, val); changed = true;
Re: change initialization of ptrdiff_type_node
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > On 25 November 2016 at 13:43, Richard Bienerwrote: > > On Fri, 25 Nov 2016, Jakub Jelinek wrote: > > > >> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: > >> > --- a/gcc/lto/lto-lang.c > >> > +++ b/gcc/lto/lto-lang.c > >> > @@ -1271,8 +1271,30 @@ lto_init (void) > >> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) > >> > == const_ptr_type_node); > >> > > >> > - ptrdiff_type_node = integer_type_node; > >> > + if (strcmp (PTRDIFF_TYPE, "int") == 0) > >> > +ptrdiff_type_node = integer_type_node; > >> > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) > >> > +ptrdiff_type_node = long_integer_type_node; > >> > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) > >> > +ptrdiff_type_node = long_long_integer_type_node; > >> > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) > >> > +ptrdiff_type_node = short_integer_type_node; > >> > + else > >> > +{ > >> > + ptrdiff_type_node = NULL_TREE; > >> > + for (int i = 0; i < NUM_INT_N_ENTS; i++) > >> > + if (int_n_enabled_p[i]) > >> > + { > >> > + char name[50]; > >> > + sprintf (name, "__int%d", int_n_data[i].bitsize); > >> > + if (strcmp (name, PTRDIFF_TYPE) == 0) > >> > + ptrdiff_type_node = int_n_trees[i].signed_type; > >> > + } > >> > + if (ptrdiff_type_node == NULL_TREE) > >> > + gcc_unreachable (); > >> > +} > >> > >> This looks ok to me. > > > > But I'd like to see this in build_common_tree_nodes alongside > > the initialization of size_type_node (and thus removed from > > c_common_nodes_and_builtins). This way you can simply remove > > the lto-lang.c code as well. > > > > Please then also remove the ptrdiff_type_node re-set from > > free_lang_data (). > Hi Richard, > Does this version look OK ? > Validation in progress. Yes, patch is ok if testing succeeds. Thanks, Richard.
Re: [0/3] Fix PR78120, in ifcvt/rtlanal/i386.
On Thu, Nov 24, 2016 at 5:14 PM, Segher Boessenkoolwrote: > On Thu, Nov 24, 2016 at 08:48:04AM -0700, Jeff Law wrote: >> On 11/24/2016 07:53 AM, Segher Boessenkool wrote: >> > >> >That we compare different kinds of costs (which really has no meaning at >> >all, it's a heuristic at best) in various places is a known problem, not >> >a regression. >> But the problems with the costing system exhibit themselves as a code >> quality regression. In the end that's what the end-users see -- a >> regression in the quality of the code GCC generates. > > Yes, exactly -- and I fear this all-encompassing change will cause just > such a regression for many users. Tests are running, will know more > later today (or tomorrow). > > The PR is about a very specific problem; the patch is not. The patch > is not a bug fix. If we allow anything that "makes things better" in > stage 3, what make it different from stage 1 then? That's a good question ;) The stage 3 definition has a loophole via "go file a bug about feature X, then it's a bugfix!". I'm all open for a more sensible definition, like constraining the kind of non-regression fixes that we want to allow, but I fear the most sensible option would be to simply ditch the notion of different "stages" and make it "general development" and "regression fixing". (though if you try hard enough and go back in time you'll find that almost all non-enhancement bugs are regressions in some sense) And yes, current stage3 still feels too much like stage1 ;) Richard.
Re: change initialization of ptrdiff_type_node
On 25 November 2016 at 13:43, Richard Bienerwrote: > On Fri, 25 Nov 2016, Jakub Jelinek wrote: > >> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: >> > --- a/gcc/lto/lto-lang.c >> > +++ b/gcc/lto/lto-lang.c >> > @@ -1271,8 +1271,30 @@ lto_init (void) >> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) >> > == const_ptr_type_node); >> > >> > - ptrdiff_type_node = integer_type_node; >> > + if (strcmp (PTRDIFF_TYPE, "int") == 0) >> > +ptrdiff_type_node = integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) >> > +ptrdiff_type_node = long_integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) >> > +ptrdiff_type_node = long_long_integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) >> > +ptrdiff_type_node = short_integer_type_node; >> > + else >> > +{ >> > + ptrdiff_type_node = NULL_TREE; >> > + for (int i = 0; i < NUM_INT_N_ENTS; i++) >> > + if (int_n_enabled_p[i]) >> > + { >> > + char name[50]; >> > + sprintf (name, "__int%d", int_n_data[i].bitsize); >> > + if (strcmp (name, PTRDIFF_TYPE) == 0) >> > + ptrdiff_type_node = int_n_trees[i].signed_type; >> > + } >> > + if (ptrdiff_type_node == NULL_TREE) >> > + gcc_unreachable (); >> > +} >> >> This looks ok to me. > > But I'd like to see this in build_common_tree_nodes alongside > the initialization of size_type_node (and thus removed from > c_common_nodes_and_builtins). This way you can simply remove > the lto-lang.c code as well. > > Please then also remove the ptrdiff_type_node re-set from > free_lang_data (). Hi Richard, Does this version look OK ? Validation in progress. Thanks, Prathamesh > >> > >> > + unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node); >> >lto_build_c_type_nodes (); >> >gcc_assert (va_list_type_node); >> >> But why this and the remaining hunks? Nothing in the middle-end >> needs it, IMHO it should be kept in c-family/. > > Yeah, this change looks unnecessary to me. > >> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h >> > index eec2d4f..6c52387 100644 >> > --- a/gcc/tree-core.h >> > +++ b/gcc/tree-core.h >> > @@ -617,6 +617,7 @@ enum tree_index { >> >TI_SIZE_TYPE, >> >TI_PID_TYPE, >> >TI_PTRDIFF_TYPE, >> > + TI_UNSIGNED_PTRDIFF_TYPE, >> >TI_VA_LIST_TYPE, >> >TI_VA_LIST_GPR_COUNTER_FIELD, >> >TI_VA_LIST_FPR_COUNTER_FIELD, >> > diff --git a/gcc/tree.h b/gcc/tree.h >> > index 62cd7bb..ae69d0d 100644 >> > --- a/gcc/tree.h >> > +++ b/gcc/tree.h >> > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum >> > tree_code __code, int __i, >> > #define size_type_node global_trees[TI_SIZE_TYPE] >> > #define pid_type_node global_trees[TI_PID_TYPE] >> > #define ptrdiff_type_node global_trees[TI_PTRDIFF_TYPE] >> > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE] >> > #define va_list_type_node global_trees[TI_VA_LIST_TYPE] >> > #define va_list_gpr_counter_field >> > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD] >> > #define va_list_fpr_counter_field >> > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD] >> >> >> Jakub >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) 2016-11-25 Prathamesh Kulkarni * tree.c (build_common_tree_nodes): Initialize ptrdiff_type_node. (free_lang_data): Remove assignment to ptrdiff_type_node. c-family/ * c-common.c (c_common_nodes_and_builtins): Remove initialization of ptrdiff_type_node. lto/ * lto-lang.c (lto_init): Remove initialization of ptrdiff_type_node. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 62174a9..0749361 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4475,8 +4475,6 @@ c_common_nodes_and_builtins (void) default_function_type = build_varargs_function_type_list (integer_type_node, NULL_TREE); - ptrdiff_type_node -= TREE_TYPE (identifier_global_value (get_identifier (PTRDIFF_TYPE))); unsigned_ptrdiff_type_node = c_common_unsigned_type (ptrdiff_type_node); lang_hooks.decls.pushdecl diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c index a5f04ba..58f6e0c 100644 --- a/gcc/lto/lto-lang.c +++ b/gcc/lto/lto-lang.c @@ -1271,8 +1271,6 @@ lto_init (void) gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) == const_ptr_type_node); - ptrdiff_type_node = integer_type_node; - lto_build_c_type_nodes (); gcc_assert (va_list_type_node); diff --git a/gcc/tree.c b/gcc/tree.c index 11e0abc..57f2e9a 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -6006,7 +6006,6 @@ free_lang_data (void) free_lang_data_in_cgraph (); /* Create gimple variants for common types. */ - ptrdiff_type_node =
Re: [Patch ARM 17/17] Enable _Float16 for ARM.
Hi James, On 16 November 2016 at 15:15, Kyrill Tkachovwrote: > > On 11/11/16 15:42, James Greenhalgh wrote: >> >> Hi, >> >> Finally, having added support for single-step DFmode to HFmode >> conversions, >> this patch adds support for _Float16 to the ARM back-end. >> >> That means making sure that only __fp16 promotes and adding similar hooks >> to >> those used in the AArch64 port giving the excess precision rules, and >> marking HFmode as supported in libgcc. >> >> Bootstrapped on an ARMv8-A machine, and crosstested with no issues. >> >> OK? > > > This looks ok to me once the prerequisites are approved. > > Thanks, > Kyrill > > >> Thanks, >> James >> >> --- >> gcc/ >> >> 2016-11-09 James Greenhalgh >> >> PR target/63250 >> * config/arm/arm-builtins.c (arm_simd_floatHF_type_node): Rename >> to... >> (arm_fp16_type_node): ...This, make visibile. >> (arm_simd_builtin_std_type): Rename arm_simd_floatHF_type_node to >> arm_fp16_type_node. >> (arm_init_simd_builtin_types): Likewise. >> (arm_init_fp16_builtins): Likewise. >> * config/arm/arm.c (arm_excess_precision): New. >> (arm_floatn_mode): Likewise. >> (TARGET_C_EXCESS_PRECISION): Likewise. >> (TARGET_FLOATN_MODE): Likewise. >> (arm_promoted_type): Only promote arm_fp16_type_node. >> * config/arm/arm.h (arm_fp16_type_node): Declare. >> >> gcc/testsuite/ >> >> 2016-11-09 James Greenhalgh >> >> * lib/target-supports.exp (add_options_for_float16): Add >> -mfp16-format=ieee when testign arm*-*-*. >> > I've noticed that after this patch (r242784), gcc.dg/torture/float16-basic.c fail on armeb* Christophe
Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
On Sat, Nov 19, 2016 at 09:15:42PM +0100, Andreas Schwab wrote: > * gcc.c-torture/execute/comp-goto-1.c (insn_t): Change offset to > signed int. The same testcase is copied to gcc.dg/tree-prof/, just with extra dg- directives. Committed as obvious: 2016-11-25 Jakub JelinekAndreas Schwab PR gcov-profile/78467 * gcc.dg/tree-prof/comp-goto-1.c (insn_t): Change offset to signed int. --- gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c.jj 2013-06-07 13:17:15.0 +0200 +++ gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c2016-11-25 09:46:58.734509541 +0100 @@ -16,7 +16,7 @@ typedef union { struct { - unsigned int offset:18; + signed int offset:18; unsigned int ignore:4; unsigned int s1:8; int :2; Jakub
Re: [PR 70965] Schedule extra pass_rebuild_cgraph_edges
On Thu, Nov 24, 2016 at 5:44 PM, Martin Jamborwrote: > Hi, > > after discussing this with Honza, we have decided that scheduling an > extra pass_rebuild_cgraph_edges after pass_fixup_cfg is the correct > way to keep the cgraph consistent with gimple IL when early IPA passes > need it, such as is the case with the testcase in PR 70965. > > While needing an extra pass is never nice, this is a consequence of > splitting pass_build_ssa_passes out of early optimization passes so > that pass_chkp can be in between. > > The patch below fices the ICE in PR 70965 and has passed bootstrap and > testing on x86_64-linux. OK for trunk? Ok. Richard. > Thanks, > > Martin > > > 2016-11-24 Martin Jambor > > gcc/ > * passes.def (pass_build_ssa_passes): Add pass_rebuild_cgraph_edges. > > gcc/testsuite/ > * g++.dg/pr70965.C: New test. > --- > gcc/passes.def | 1 + > gcc/testsuite/g++.dg/pr70965.C | 21 + > 2 files changed, 22 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/pr70965.C > > diff --git a/gcc/passes.def b/gcc/passes.def > index 85a5af0..5faf17f 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_build_ssa_passes); >PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes) >NEXT_PASS (pass_fixup_cfg); > + NEXT_PASS (pass_rebuild_cgraph_edges); >NEXT_PASS (pass_build_ssa); >NEXT_PASS (pass_warn_nonnull_compare); >NEXT_PASS (pass_ubsan); > diff --git a/gcc/testsuite/g++.dg/pr70965.C b/gcc/testsuite/g++.dg/pr70965.C > new file mode 100644 > index 000..d8a2c35 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr70965.C > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -std=c++11" } */ > + > +struct A {}; > +struct B {}; > +struct C { using p = int *; template using ra = A; }; > +struct J : C { template struct K { typedef C::ra o; }; }; > +template struct D > +{ > + struct H : J::K::o { H (J::p, A) : J::K::o () {} }; > + H d; > + D (const char *, const A = A ()) : d (0, x) {} > +}; > +extern template class D; > +enum L { M }; > +struct F { virtual char *foo (); }; > +template struct I : B { static int foo (int) {} }; > +struct G { typedef I t; }; > +void foo (int) { G::t::foo (0); } > +void bar (const D &, const D &, int, L); > +void baz () try { foo (0); } catch (F ) { bar (e.foo (), "", 0, M); } > -- > 2.10.2 >
Re: [tree-tailcall] Check if function returns it's argument
On 25 November 2016 at 13:55, Richard Bienerwrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 25 November 2016 at 13:37, Richard Biener wrote: >> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 18:08, Richard Biener wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 17:48, Richard Biener wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> On 24 November 2016 at 14:07, Richard Biener >> >> >> >> wrote: >> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >> > >> >> >> >> >> Hi, >> >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> >> { >> >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return a1; >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return t1; >> >> >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> >> Does this patch look OK ? >> >> >> >> > >> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it >> >> >> >> > doesn't. >> >> >> >> > */ >> >> >> >> > +tree >> >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> >> > +{ >> >> >> >> > >> >> >> >> > >> >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> >> > informative. >> >> >> >> > >> >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> >> > effectively change the IL to >> >> >> >> > >> >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> > return a1; >> >> >> >> > >> >> >> >> > someone more familiar with RTL expansion plus tail call emission >> >> >> >> > on >> >> >> >> > RTL needs to chime in. >> >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> >> it's return value if >> >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> >> could be made redundant. >> >> >> >> >> >> >> >> eg: >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> } >> >> >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> > >> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> >> > better to use t1 as that's readily available in the return register >> >> >> > while the register for a1 might have been clobbered and thus we >> >> >> > need to spill it for the later return. >> >> >> Oh I didn't realize this could possibly pessimize RA. >> >> >> For test-case: >> >> >> >> >> >> void *t1 = memcpy (dest, src, n); >> >> >> if (t1 != dest) >> >> >> __builtin_abort (); >> >> >> >> >> >> we could copy-propagate t1 into cond_expr and make the condition >> >> >> redundant. >> >> >> However I suppose this particular case could be handled with VRP >> >> >> instead >> >> >> (t1 and dest should be marked equivalent) ? >> >> > >> >> > Yeah, exposing this to value-numbering in general can enable some >> >> > optimizations (but I wouldn't put it in copyprop). Note it's then >> >> > difficult to avoid copy-propgating things... >> >> > >> >> > The user can also write >> >> > >> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> > { >> >> > __builtin_memcpy (a1, a2, a3); >> >> > return a1; >> >> > } >> >> > >> >> > so it's good to improve code-gen for that (for the tailcall issue). >> >> For the tail-call, issue should we artificially create a lhs and use that >> >> as return value (perhaps by a separate pass before tailcall) ? >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> return a1; >> >> >> >> gets transformed to: >> >> _1 = __builtin_memcpy (a1, a2, a3) >> >> return _1; >> >> >> >> So tail-call optimization pass would see the IL in it's expected form. >> > >> > As said, a RTL expert needs to chime in here. Iff then tail-call >> > itself should do this rewrite. But if this form is required to make >> > things work (I suppose you checked it _does_ actually work?) then >> > we'd need to make sure later passes do not undo it. So it looks >> > fragile to me. OTOH I seem to remember that the flags we set on >> > GIMPLE are merely a hint to RTL expansion and the tailcalling is >> > verified again
Re: [PATCH 1/2 v3] PR77822
On Thu, Nov 24, 2016 at 08:33:32AM -0700, Jeff Law wrote: > On 11/24/2016 02:59 AM, Dominik Vogt wrote: > >On Wed, Nov 23, 2016 at 01:22:31PM -0700, Jeff Law wrote: > >>> PR target/77822 > >>> * system.h (SIZE_POS_IN_RANGE): New. > >>OK. Though system.h seems like an unfortunate place. Would rtl.h > >>work better since this is really about verifying the arguments to > >>things like {zero,sign}_extract which are RTL concepts. > > > >I was unsure whether it's better to give the macro a name not > >related to *_extract and put it into system.h or make it specific > >to extracts and put it in rtl.h. > Yea. What tends to tip the balance for me is that it's likely not > reusable outside extraction argument testing. > > > > >It's probably not really reuseable elsewhere, so when moving it to > >rtl.h I'll also rename it to EXTRACT_ARGS_IN_RANGE. Okay? > Yea, that sounds perfect. Version 4 of the patch attached. Bootstrapped and regression tested on s390x biarch and s390. Changes since the last version: v4: Rename SIZE_POS_IN_RANGE to EXTRACT_ARGS_IN_RANGE and move it to rtl.h. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog PR target/77822 * rtl.h (EXTRACT_ARGS_IN_RANGE): New. >From 8e5bcea2983b50ba133a83086fcc21c31d8dbabe Mon Sep 17 00:00:00 2001 From: Dominik VogtDate: Thu, 17 Nov 2016 14:49:18 +0100 Subject: [PATCH 1/2] PR target/77822: Add helper macro EXTRACT_ARGS_IN_RANGE to system.h. The macro can be used to validate the arguments of zero_extract and sign_extract to fix this problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822 --- gcc/rtl.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/rtl.h b/gcc/rtl.h index 660d381..1847bce 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2694,6 +2694,16 @@ get_full_set_src_cost (rtx x, machine_mode mode, struct full_rtx_costs *c) } #endif +/* A convenience macro to validate the arguments of a zero_extract + expression. It determines whether SIZE lies inclusively within + [1, RANGE], POS lies inclusively within between [0, RANGE - 1] + and the sum lies inclusively within [1, RANGE]. RANGE must be + >= 1, but SIZE and POS may be negative. */ +#define EXTRACT_ARGS_IN_RANGE(SIZE, POS, RANGE) \ + (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (RANGE) - 1) \ + && IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (RANGE) \ + - (unsigned HOST_WIDE_INT)(POS))) + /* In explow.c */ extern HOST_WIDE_INT trunc_int_for_mode(HOST_WIDE_INT, machine_mode); extern rtx plus_constant (machine_mode, rtx, HOST_WIDE_INT, bool = false); -- 2.3.0
Re: [PATCH 2/2 v3] PR77822
On Mon, Nov 21, 2016 at 12:05:42PM +0100, Dominik Vogt wrote: > On Thu, Nov 17, 2016 at 04:54:17PM +0100, Dominik Vogt wrote: > > On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote: > > > The following two patches fix PR 77822 on s390x for gcc-7. As the > > > macro doing the argument range checks can be used on other targets > > > as well, I've put it in system.h (couldn't think of a better > > > place; maybe rtl.h?). > > > > > > Bootstrapped on s390x biarch, regression tested on s390x biarch > > > and s390, all on a zEC12 with -march=zEC12. > > > > > > Please check the commit messages for details. > > > > S390 backend patch. Version 4 of the patchset. Bootstrapped and regression tested on s390 biarch and s390. Changes since the first version of the patchset: v2: Add s390 test cases. Support .cxx tests in s390.exp. Put all arguments of SIZE_POS_IN_RANGE in parentheses. Rewrite SIZE_POS_IN_RANGE macro to handle wrapping SIZE + POS. v3: Rename macro argument from UPPER back to RANGE. v4: Rename SIZE_POS_IN_RANGE to EXTRACT_ARGS_IN_RANGE and move it to rtl.h. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLogb PR target/77822 * config/s390/s390.md ("extzv") ("*extzv") ("*extzvdi_lshiftrt") ("*_ior_and_sr_ze") ("*extract1bitdi") ("*insv", "*insv_rnsbg_noshift") ("*insv_rnsbg_srl", "*insv_mem_reg") ("*insvdi_mem_reghigh", "*insvdi_reg_imm"): Use EXTRACT_ARGS_IN_RANGE to validate the arguments of zero_extract and sign_extract. gcc/testsuite/ChangeLogb PR target/77822 * gcc.target/s390/s390.exp: Support .cxx tests. * gcc.target/s390/pr77822-2.c: New test. * gcc.target/s390/pr77822-1.cxx: New test. >From bcf508a7e2f90c7e67702c437780cc12a3f74860 Mon Sep 17 00:00:00 2001 From: Dominik VogtDate: Thu, 17 Nov 2016 14:49:48 +0100 Subject: [PATCH 2/2] PR target/77822: S390: Validate argument range of {zero,sign}_extract. With some undefined code, combine generates patterns where the arguments to *_extract are out of range, e.b. a negative bit position. If the s390 backend accepts these, they lead to not just undefined behaviour but invalid assembly instructions (argument out of the allowed range). So this patch makes sure that the rtl expressions with out of range arguments are rejected. --- gcc/config/s390/s390.md | 20 +- gcc/testsuite/gcc.target/s390/pr77822-1.cxx | 21 ++ gcc/testsuite/gcc.target/s390/pr77822-2.c | 307 gcc/testsuite/gcc.target/s390/s390.exp | 8 +- 4 files changed, 349 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/pr77822-1.cxx create mode 100644 gcc/testsuite/gcc.target/s390/pr77822-2.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index a449b03..aaf8427 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3741,6 +3741,8 @@ (clobber (reg:CC CC_REGNUM))])] "TARGET_Z10" { + if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64)) +FAIL; /* Starting with zEC12 there is risbgn not clobbering CC. */ if (TARGET_ZEC12) { @@ -3760,7 +3762,9 @@ (match_operand 2 "const_int_operand" "") ; size (match_operand 3 "const_int_operand" ""))) ; start ] - "" + " + && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), +GET_MODE_BITSIZE (mode))" "\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -3773,6 +3777,7 @@ (lshiftrt:DI (match_operand:DI 3 "register_operand" "d") (match_operand:DI 4 "nonzero_shift_count_operand" "")))] " + && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64) && 64 - UINTVAL (operands[4]) >= UINTVAL (operands[1])" "\t%0,%3,%2,%2+%1-1,128-%2-%1-%4" [(set_attr "op_type" "RIE") @@ -3791,6 +3796,7 @@ (match_operand 5 "const_int_operand" "")) ; start 4)))] " + && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[4]), INTVAL (operands[5]), 64) && UINTVAL (operands[2]) == (~(0ULL) << UINTVAL (operands[4]))" "\t%0,%3,64-%4,63,%4+%5" [(set_attr "op_type" "RIE") @@ -3804,7 +3810,8 @@ (const_int 1) ; size (match_operand 2 "const_int_operand" "")) ; start (const_int 0)))] - "" + " + && EXTRACT_ARGS_IN_RANGE (1, INTVAL (operands[2]), 64)" "\t%0,%1,64-1,128+63,%2+1" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -3919,6 +3926,8 @@ (match_operand 2 "const_int_operand""I")) ; pos (match_operand:GPR 3 "nonimmediate_operand" "d"))] " + && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[1]), INTVAL
Re: [tree-tailcall] Check if function returns it's argument
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > On 25 November 2016 at 13:37, Richard Bienerwrote: > > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 24 November 2016 at 18:08, Richard Biener wrote: > >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 24 November 2016 at 17:48, Richard Biener wrote: > >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> On 24 November 2016 at 14:07, Richard Biener > >> >> >> wrote: > >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> >> > > >> >> >> >> Hi, > >> >> >> >> Consider following test-case: > >> >> >> >> > >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> >> { > >> >> >> >> __builtin_memcpy (a1, a2, a3); > >> >> >> >> return a1; > >> >> >> >> } > >> >> >> >> > >> >> >> >> return a1 can be considered equivalent to return value of memcpy, > >> >> >> >> and the call could be emitted as a tail-call. > >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, > >> >> >> >> but if it is changed to: > >> >> >> >> > >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> >> return t1; > >> >> >> >> > >> >> >> >> Then memcpy is emitted as a tail-call. > >> >> >> >> The attached patch tries to handle the former case. > >> >> >> >> > >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* > >> >> >> >> Does this patch look OK ? > >> >> >> > > >> >> >> > +/* Return arg, if function returns it's argument or NULL if it > >> >> >> > doesn't. > >> >> >> > */ > >> >> >> > +tree > >> >> >> > +gimple_call_return_arg (gcall *call_stmt) > >> >> >> > +{ > >> >> >> > > >> >> >> > > >> >> >> > Please just inline it at the single use - the name is not terribly > >> >> >> > informative. > >> >> >> > > >> >> >> > I'm not sure you can rely on code-generation working if you not > >> >> >> > effectively change the IL to > >> >> >> > > >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); > >> >> >> > return a1; > >> >> >> > > >> >> >> > someone more familiar with RTL expansion plus tail call emission on > >> >> >> > RTL needs to chime in. > >> >> >> Well I was trying to copy-propagate function's argument into uses of > >> >> >> it's return value if > >> >> >> function returned that argument, so the assignment to lhs of call > >> >> >> could be made redundant. > >> >> >> > >> >> >> eg: > >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> { > >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return t1; > >> >> >> } > >> >> >> > >> >> >> After patch, copyprop transformed it into: > >> >> >> t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return a1; > >> >> > > >> >> > But that's a bad transform -- if we know that t1 == a1 then it's > >> >> > better to use t1 as that's readily available in the return register > >> >> > while the register for a1 might have been clobbered and thus we > >> >> > need to spill it for the later return. > >> >> Oh I didn't realize this could possibly pessimize RA. > >> >> For test-case: > >> >> > >> >> void *t1 = memcpy (dest, src, n); > >> >> if (t1 != dest) > >> >> __builtin_abort (); > >> >> > >> >> we could copy-propagate t1 into cond_expr and make the condition > >> >> redundant. > >> >> However I suppose this particular case could be handled with VRP instead > >> >> (t1 and dest should be marked equivalent) ? > >> > > >> > Yeah, exposing this to value-numbering in general can enable some > >> > optimizations (but I wouldn't put it in copyprop). Note it's then > >> > difficult to avoid copy-propgating things... > >> > > >> > The user can also write > >> > > >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> > { > >> > __builtin_memcpy (a1, a2, a3); > >> > return a1; > >> > } > >> > > >> > so it's good to improve code-gen for that (for the tailcall issue). > >> For the tail-call, issue should we artificially create a lhs and use that > >> as return value (perhaps by a separate pass before tailcall) ? > >> > >> __builtin_memcpy (a1, a2, a3); > >> return a1; > >> > >> gets transformed to: > >> _1 = __builtin_memcpy (a1, a2, a3) > >> return _1; > >> > >> So tail-call optimization pass would see the IL in it's expected form. > > > > As said, a RTL expert needs to chime in here. Iff then tail-call > > itself should do this rewrite. But if this form is required to make > > things work (I suppose you checked it _does_ actually work?) then > > we'd need to make sure later passes do not undo it. So it looks > > fragile to me. OTOH I seem to remember that the flags we set on > > GIMPLE are merely a hint to RTL expansion and the tailcalling is > > verified again there? > > Yeah, I verified the form works: > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > void *t1 = __builtin_memcpy (a1, a2, a3); > return t1; > } > > assembly: > f: > .LFB0: > .cfi_startproc >
Re: [PATCH PR78507/PR78510]Fix two ICEs in pattern (cond (cmp (convert1? @1) @3) (convert2? @1) @2).
On Thu, Nov 24, 2016 at 4:22 PM, Bin Chengwrote: > Hi, > This patch fixes two issues in newly introduced pattern: (cond (cmp > (convert1? @1) @3) (convert2? @1) @2). > For PR78507, we need to check if from_type is INTEGRAL_TYPE_P explicitly, > this patch adds check for that. > Note we don't check c1_type/c2_type for that because it's guarded by > INTEGER_CST@. > For PR78510, the ICE is covered by revision 242831, but underlying issue is > when the block of conditions > is not satisfied, the last case simplification is applied anyway since code > is initialized to cmp at the > first place. This patch fixes that by setting code to proper value only when > transformation is valid, it > also incorporates Marc's suggestion by using cmp directly. Two tests are > added too. > Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures? Ok. Btw, - (cond (cmp@0 (convert1? @1) ... ... - enum tree_code code = TREE_CODE (@0)... shows another misconception (I didn't notice...) in that TREE_CODE of a captured expression results in the expression code. On GENERIC that works but on GIMPLE @0 will be a SSA_NAME and thus 'code' was SSA_NAME ... Thanks, Richard. > Thanks, > bin > > 2016-11-24 Bin Cheng > > PR middle-end/78507 > PR middle-end/78510 > * match.pd ((cond (cmp (convert1? @1) @3) (convert2? @1) @2)): Use > cmp directly, rather than cmp_code. Initialize code to ERROR_MARK > and set it to result code if transformation is valid. Use code EQ > directly in last simplification case. > > gcc/testsuite/ChangeLog > 2016-11-24 Bin Cheng > > PR middle-end/78507 > PR middle-end/78510 > * g++.dg/torture/pr78507.C: New test. > * gcc.dg/torture/pr78510.c: New test.
[Patch, testsuite] Fix bogus pr64427.c failure for avr
The smaller int size for the avr target breaks the test's expectation on the number of iterations. The failure goes away if 32 bit ints are used in place of a plain int. Fix by conditionally typedef int32_t to __INT32_TYPE__ for targets with int size < 4, and then use int32_t everywhere. Regards Senthil 2016-11-25 Senthil Kumar Selvaraj* gcc.dg/pr64277.c: Use 32 bit int for targets with sizeof(int) < 4. Index: gcc/testsuite/gcc.dg/pr64277.c === --- gcc/testsuite/gcc.dg/pr64277.c (revision 242857) +++ gcc/testsuite/gcc.dg/pr64277.c (working copy) @@ -4,10 +4,16 @@ /* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" "cunroll" } } */ /* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" "cunroll" } } */ -int f1[10]; +#if __SIZEOF_INT__ < 4 + __extension__ typedef __INT32_TYPE__ int32_t; +#else + typedef int int32_t; +#endif + +int32_t f1[10]; void test1 (short a[], short m, unsigned short l) { - int i = l; + int32_t i = l; for (i = i + 5; i < m; i++) f1[i] = a[i]++; } @@ -14,7 +20,7 @@ void test2 (short a[], short m, short l) { - int i; + int32_t i; if (m > 5) m = 5; for (i = m; i > l; i--)
Re: [tree-tailcall] Check if function returns it's argument
On 25 November 2016 at 13:37, Richard Bienerwrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 18:08, Richard Biener wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 17:48, Richard Biener wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 14:07, Richard Biener wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> Hi, >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> >> } >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> Does this patch look OK ? >> >> >> > >> >> >> > +/* Return arg, if function returns it's argument or NULL if it >> >> >> > doesn't. >> >> >> > */ >> >> >> > +tree >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> > +{ >> >> >> > >> >> >> > >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> > informative. >> >> >> > >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> > effectively change the IL to >> >> >> > >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> > return a1; >> >> >> > >> >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> >> > RTL needs to chime in. >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> it's return value if >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> could be made redundant. >> >> >> >> >> >> eg: >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> { >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return t1; >> >> >> } >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return a1; >> >> > >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> > better to use t1 as that's readily available in the return register >> >> > while the register for a1 might have been clobbered and thus we >> >> > need to spill it for the later return. >> >> Oh I didn't realize this could possibly pessimize RA. >> >> For test-case: >> >> >> >> void *t1 = memcpy (dest, src, n); >> >> if (t1 != dest) >> >> __builtin_abort (); >> >> >> >> we could copy-propagate t1 into cond_expr and make the condition >> >> redundant. >> >> However I suppose this particular case could be handled with VRP instead >> >> (t1 and dest should be marked equivalent) ? >> > >> > Yeah, exposing this to value-numbering in general can enable some >> > optimizations (but I wouldn't put it in copyprop). Note it's then >> > difficult to avoid copy-propgating things... >> > >> > The user can also write >> > >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> > { >> > __builtin_memcpy (a1, a2, a3); >> > return a1; >> > } >> > >> > so it's good to improve code-gen for that (for the tailcall issue). >> For the tail-call, issue should we artificially create a lhs and use that >> as return value (perhaps by a separate pass before tailcall) ? >> >> __builtin_memcpy (a1, a2, a3); >> return a1; >> >> gets transformed to: >> _1 = __builtin_memcpy (a1, a2, a3) >> return _1; >> >> So tail-call optimization pass would see the IL in it's expected form. > > As said, a RTL expert needs to chime in here. Iff then tail-call > itself should do this rewrite. But if this form is required to make > things work (I suppose you checked it _does_ actually work?) then > we'd need to make sure later passes do not undo it. So it looks > fragile to me. OTOH I seem to remember that the flags we set on > GIMPLE are merely a hint to RTL expansion and the tailcalling is > verified again there? Yeah, I verified the form works: void *f(void *a1, void *a2, __SIZE_TYPE__ a3) { void *t1 = __builtin_memcpy (a1, a2, a3); return t1; } assembly: f: .LFB0: .cfi_startproc jmp memcpy .cfi_endproc Thanks, Prathamesh > > Thanks, > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> > >> >> Thanks, >> >> Prathamesh >> >> > >> >> >> But this now interferes with tail-call optimization, because it is not >> >> >> able to emit memcpy >> >> >> as tail-call anymore due to
Re: [PATCH] eliminate calls to snprintf(0, 0, ...) with known return value (pr78476)
On 2016.11.22 at 20:02 -0700, Martin Sebor wrote: > > gcc/ChangeLog: > > PR tree-optimization/78476 > * gimple-ssa-sprintf.c (struct pass_sprintf_length::call_info): > Add a member. > (handle_gimple_call): Adjust signature. > (try_substitute_return_value): Remove calls to bounded functions > with zero buffer size whose result is known. > (pass_sprintf_length::execute): Adjust call to handle_gimple_call. You left conflict markers in gcc/ChangeLog in your r242854 commit. -- Markus
Re: change initialization of ptrdiff_type_node
On Fri, 25 Nov 2016, Jakub Jelinek wrote: > On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: > > --- a/gcc/lto/lto-lang.c > > +++ b/gcc/lto/lto-lang.c > > @@ -1271,8 +1271,30 @@ lto_init (void) > >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) > > == const_ptr_type_node); > > > > - ptrdiff_type_node = integer_type_node; > > + if (strcmp (PTRDIFF_TYPE, "int") == 0) > > +ptrdiff_type_node = integer_type_node; > > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) > > +ptrdiff_type_node = long_integer_type_node; > > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) > > +ptrdiff_type_node = long_long_integer_type_node; > > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) > > +ptrdiff_type_node = short_integer_type_node; > > + else > > +{ > > + ptrdiff_type_node = NULL_TREE; > > + for (int i = 0; i < NUM_INT_N_ENTS; i++) > > + if (int_n_enabled_p[i]) > > + { > > + char name[50]; > > + sprintf (name, "__int%d", int_n_data[i].bitsize); > > + if (strcmp (name, PTRDIFF_TYPE) == 0) > > + ptrdiff_type_node = int_n_trees[i].signed_type; > > + } > > + if (ptrdiff_type_node == NULL_TREE) > > + gcc_unreachable (); > > +} > > This looks ok to me. But I'd like to see this in build_common_tree_nodes alongside the initialization of size_type_node (and thus removed from c_common_nodes_and_builtins). This way you can simply remove the lto-lang.c code as well. Please then also remove the ptrdiff_type_node re-set from free_lang_data (). > > > > + unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node); > >lto_build_c_type_nodes (); > >gcc_assert (va_list_type_node); > > But why this and the remaining hunks? Nothing in the middle-end > needs it, IMHO it should be kept in c-family/. Yeah, this change looks unnecessary to me. > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > index eec2d4f..6c52387 100644 > > --- a/gcc/tree-core.h > > +++ b/gcc/tree-core.h > > @@ -617,6 +617,7 @@ enum tree_index { > >TI_SIZE_TYPE, > >TI_PID_TYPE, > >TI_PTRDIFF_TYPE, > > + TI_UNSIGNED_PTRDIFF_TYPE, > >TI_VA_LIST_TYPE, > >TI_VA_LIST_GPR_COUNTER_FIELD, > >TI_VA_LIST_FPR_COUNTER_FIELD, > > diff --git a/gcc/tree.h b/gcc/tree.h > > index 62cd7bb..ae69d0d 100644 > > --- a/gcc/tree.h > > +++ b/gcc/tree.h > > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum > > tree_code __code, int __i, > > #define size_type_node global_trees[TI_SIZE_TYPE] > > #define pid_type_node global_trees[TI_PID_TYPE] > > #define ptrdiff_type_node global_trees[TI_PTRDIFF_TYPE] > > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE] > > #define va_list_type_node global_trees[TI_VA_LIST_TYPE] > > #define va_list_gpr_counter_field > > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD] > > #define va_list_fpr_counter_field > > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD] > > > Jakub > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH] remove invalid "tail +16c" in config/acx.m4
Hi all, In "config/acx.m4", there are still some "tail +16c" which are invalid on POSIX systems. In my opinion, all "tail +16c" should be changed to "tail -c +16" directly, as most systems has accept the latter. And, to skip first 16 bytes, we should use "tail -c +17" instead of "tail -c +16". * config/acx.m4:Change "tail +16c" to "tail -c +17". * configure: Regenerate. --- gcc-6.2.0/config/acx.m4 2011-12-18 17:58:37.0 +0800 +++ gcc-6.2.0-bak/config/acx.m4 2016-11-23 10:56:21.065817691 +0800 @@ -404,7 +404,7 @@ AC_DEFUN([ACX_PROG_CMP_IGNORE_INITIAL], [AC_CACHE_CHECK([how to compare bootstrapped objects], gcc_cv_prog_cmp_skip, [ echo abfoo >t1 echo cdfoo >t2 - gcc_cv_prog_cmp_skip='tail +16c $$f1 > tmp-foo1; tail +16c $$f2 > tmp-foo2; cmp tmp-foo1 tmp-foo2' + gcc_cv_prog_cmp_skip='tail -c +17 $$f1 > tmp-foo1; tail -c +17 $$f2 > tmp-foo2; cmp tmp-foo1 tmp-foo2' if cmp t1 t2 2 2 > /dev/null 2>&1; then if cmp t1 t2 1 1 > /dev/null 2>&1; then :
Re: [tree-tailcall] Check if function returns it's argument
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > On 24 November 2016 at 18:08, Richard Bienerwrote: > > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 24 November 2016 at 17:48, Richard Biener wrote: > >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 24 November 2016 at 14:07, Richard Biener wrote: > >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> Hi, > >> >> >> Consider following test-case: > >> >> >> > >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> >> { > >> >> >> __builtin_memcpy (a1, a2, a3); > >> >> >> return a1; > >> >> >> } > >> >> >> > >> >> >> return a1 can be considered equivalent to return value of memcpy, > >> >> >> and the call could be emitted as a tail-call. > >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, > >> >> >> but if it is changed to: > >> >> >> > >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> >> return t1; > >> >> >> > >> >> >> Then memcpy is emitted as a tail-call. > >> >> >> The attached patch tries to handle the former case. > >> >> >> > >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. > >> >> >> Cross tested on arm*-*-*, aarch64*-*-* > >> >> >> Does this patch look OK ? > >> >> > > >> >> > +/* Return arg, if function returns it's argument or NULL if it > >> >> > doesn't. > >> >> > */ > >> >> > +tree > >> >> > +gimple_call_return_arg (gcall *call_stmt) > >> >> > +{ > >> >> > > >> >> > > >> >> > Please just inline it at the single use - the name is not terribly > >> >> > informative. > >> >> > > >> >> > I'm not sure you can rely on code-generation working if you not > >> >> > effectively change the IL to > >> >> > > >> >> > a1 = __builtin_memcpy (a1, a2, a3); > >> >> > return a1; > >> >> > > >> >> > someone more familiar with RTL expansion plus tail call emission on > >> >> > RTL needs to chime in. > >> >> Well I was trying to copy-propagate function's argument into uses of > >> >> it's return value if > >> >> function returned that argument, so the assignment to lhs of call > >> >> could be made redundant. > >> >> > >> >> eg: > >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > >> >> { > >> >> void *t1 = __builtin_memcpy (a1, a2, a3); > >> >> return t1; > >> >> } > >> >> > >> >> After patch, copyprop transformed it into: > >> >> t1 = __builtin_memcpy (a1, a2, a3); > >> >> return a1; > >> > > >> > But that's a bad transform -- if we know that t1 == a1 then it's > >> > better to use t1 as that's readily available in the return register > >> > while the register for a1 might have been clobbered and thus we > >> > need to spill it for the later return. > >> Oh I didn't realize this could possibly pessimize RA. > >> For test-case: > >> > >> void *t1 = memcpy (dest, src, n); > >> if (t1 != dest) > >> __builtin_abort (); > >> > >> we could copy-propagate t1 into cond_expr and make the condition redundant. > >> However I suppose this particular case could be handled with VRP instead > >> (t1 and dest should be marked equivalent) ? > > > > Yeah, exposing this to value-numbering in general can enable some > > optimizations (but I wouldn't put it in copyprop). Note it's then > > difficult to avoid copy-propgating things... > > > > The user can also write > > > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > > { > > __builtin_memcpy (a1, a2, a3); > > return a1; > > } > > > > so it's good to improve code-gen for that (for the tailcall issue). > For the tail-call, issue should we artificially create a lhs and use that > as return value (perhaps by a separate pass before tailcall) ? > > __builtin_memcpy (a1, a2, a3); > return a1; > > gets transformed to: > _1 = __builtin_memcpy (a1, a2, a3) > return _1; > > So tail-call optimization pass would see the IL in it's expected form. As said, a RTL expert needs to chime in here. Iff then tail-call itself should do this rewrite. But if this form is required to make things work (I suppose you checked it _does_ actually work?) then we'd need to make sure later passes do not undo it. So it looks fragile to me. OTOH I seem to remember that the flags we set on GIMPLE are merely a hint to RTL expansion and the tailcalling is verified again there? Thanks, Richard. > Thanks, > Prathamesh > > > > Richard. > > > >> Thanks, > >> Prathamesh > >> > > >> >> But this now interferes with tail-call optimization, because it is not > >> >> able to emit memcpy > >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. > >> >> I am not sure how to workaround this. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> > Richard. > >> >> > >> > > >> > -- > >> > Richard Biener > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, > >> > HRB 21284 (AG Nuernberg) > >> > >> > > > > -- > > Richard Biener > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
Re: change initialization of ptrdiff_type_node
On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: > --- a/gcc/lto/lto-lang.c > +++ b/gcc/lto/lto-lang.c > @@ -1271,8 +1271,30 @@ lto_init (void) >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) > == const_ptr_type_node); > > - ptrdiff_type_node = integer_type_node; > + if (strcmp (PTRDIFF_TYPE, "int") == 0) > +ptrdiff_type_node = integer_type_node; > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) > +ptrdiff_type_node = long_integer_type_node; > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) > +ptrdiff_type_node = long_long_integer_type_node; > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) > +ptrdiff_type_node = short_integer_type_node; > + else > +{ > + ptrdiff_type_node = NULL_TREE; > + for (int i = 0; i < NUM_INT_N_ENTS; i++) > + if (int_n_enabled_p[i]) > + { > + char name[50]; > + sprintf (name, "__int%d", int_n_data[i].bitsize); > + if (strcmp (name, PTRDIFF_TYPE) == 0) > + ptrdiff_type_node = int_n_trees[i].signed_type; > + } > + if (ptrdiff_type_node == NULL_TREE) > + gcc_unreachable (); > +} This looks ok to me. > > + unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node); >lto_build_c_type_nodes (); >gcc_assert (va_list_type_node); But why this and the remaining hunks? Nothing in the middle-end needs it, IMHO it should be kept in c-family/. > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index eec2d4f..6c52387 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -617,6 +617,7 @@ enum tree_index { >TI_SIZE_TYPE, >TI_PID_TYPE, >TI_PTRDIFF_TYPE, > + TI_UNSIGNED_PTRDIFF_TYPE, >TI_VA_LIST_TYPE, >TI_VA_LIST_GPR_COUNTER_FIELD, >TI_VA_LIST_FPR_COUNTER_FIELD, > diff --git a/gcc/tree.h b/gcc/tree.h > index 62cd7bb..ae69d0d 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum tree_code > __code, int __i, > #define size_type_node global_trees[TI_SIZE_TYPE] > #define pid_type_node global_trees[TI_PID_TYPE] > #define ptrdiff_type_nodeglobal_trees[TI_PTRDIFF_TYPE] > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE] > #define va_list_type_nodeglobal_trees[TI_VA_LIST_TYPE] > #define va_list_gpr_counter_field > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD] > #define va_list_fpr_counter_field > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD] Jakub
Re: [Patch, tentative] Use MOVE_MAX_PIECES instead of MOVE_MAX in gimple_fold_builtin_memory_op
On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote: > > Richard Biener writes: > > > On Thu, 24 Nov 2016, Richard Biener wrote: > > > >> On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote: > >> > >> > Hi, > >> > > >> > I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the > >> > avr > >> > target. I found that the (dump) failure is because there are 4 > >> > instances of memcpy, while the testcase expects only 2 for a > >> > non-strict align target like the avr. > >> > > >> > Comparing that with a dump generated by x64_64-pc-linux, I found that > >> > the extra memcpy's come from the forwprop pass, when it replaces > >> > strcat with strlen and memcpy. For x86_64, the memcpy generated gets > >> > folded into a load/store in gimple_fold_builtin_memory_op. That > >> > doesn't happen for the avr because len (2) happens to be bigger than > >> > MOVE_MAX (1). > >> > > >> > The avr can only move 1 byte efficiently from reg <-> memory, but it's > >> > more efficient to load and store 2 bytes than to call memcpy, so > >> > MOVE_MAX_PIECES is set to 2. > >> > > >> > Given that gimple_fold_builtin_memory_op gets to choose between > >> > leaving the memcpy call as is, or breaking it down to a by-pieces > >> > move, shouldn't it use MOVE_MAX_PIECES instead of > >> > MOV_MAX? > >> > > >> > That is what the below patch does, and that makes the test > >> > pass. Does this sound right? > >> > >> No, as we handle both memcpy and memmove this way we rely on > >> the whole storage fit in a single register so we do the > >> right thing for overlapping memory. > > > > So actually your patch doesn't chnage that, the ordering is ensured > > by emitting a single GIMPLE load/store pair. There are only > > four targets defining MOVE_MAX_PIECES, and one (s390) even has > > a smaller MOVE_MAX_PIECES than MOVE_MAX (huh). AVR has larger > > MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much > > sense to me given their very similar description plus the > > fact that AVR can only load a single byte at a time... > > > > The x86 comment says > > > > /* MOVE_MAX_PIECES is the number of bytes at a time which we can > >move efficiently, as opposed to MOVE_MAX which is the maximum > >number of bytes we can move with a single instruction. > > > > which doesn't match up with > > > > @defmac MOVE_MAX > > The maximum number of bytes that a single instruction can move quickly > > between memory and registers or between two memory locations. > > @end defmac > > > > note "quickly" here. But OTOH > > > > @defmac MOVE_MAX_PIECES > > A C expression used by @code{move_by_pieces} to determine the largest unit > > a load or store used to copy memory is. Defaults to @code{MOVE_MAX}. > > @end defmac > > > > here the only difference is "copy memory". But we try to special > > case the one load - one store case, not generally "copy memory". > > > > So I think MOVE_MAX matches my intent when writing the code. > > Ok, but isn't that inconsistent with tree-inline.c:estimate_move_cost, which > considers MOVE_MAX_PIECES and MOVE_RATIO to decide between a libcall and > by-pieces move? Well, I don't understand why we have both MOVE_MAX and MOVE_MAX_PIECES. There are exactly two uses of MOVE_MAX in GCC AFAICS, the gimple-fold.c one and caller-save.c which derives MOVE_MAX_WORDS from it. MOVE_MAX_PIECES has the only use in block move expansion plus the single use in tree-inline.c. So I can't give a reason why one or the other should be more valid but the tree-inline.c one tries to match memcpy expansion (obviously), while the gimple-fold.c one tries to get at the maximum possible single-insn move amount (and AVR is odd here having that lower than MOVE_MAX_PIECES, compared to say s390 which has it the opposite way around). Richard. > Regards > Senthil > > > > > Richard. > > > >> Richard. > >> > >> > Regards > >> > Senthil > >> > > >> > Index: gcc/gimple-fold.c > >> > === > >> > --- gcc/gimple-fold.c(revision 242741) > >> > +++ gcc/gimple-fold.c(working copy) > >> > @@ -703,7 +703,7 @@ > >> >src_align = get_pointer_alignment (src); > >> >dest_align = get_pointer_alignment (dest); > >> >if (tree_fits_uhwi_p (len) > >> > - && compare_tree_int (len, MOVE_MAX) <= 0 > >> > + && compare_tree_int (len, MOVE_MAX_PIECES) <= 0 > >> >/* ??? Don't transform copies from strings with known length > >> > this > >> > confuses the tree-ssa-strlen.c. This doesn't handle > >> > the case in gcc.dg/strlenopt-8.c which is XFAILed for that > >> > > >> > > >> > >> > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)