Re: [PATCH, rs6000] Fix PR target/71698, ICE in reload copying TDmode values to GPR regs
On Thu, Jun 30, 2016 at 05:55:04PM -0500, Peter Bergner wrote: > We currently don't allow TDmode values to use direct moves, since they > live in register pairs and the most significant word is always in the > even-numbered register which does not match subreg ordering in little > endian mode. The following patch fixes PR71698 by disallowing reload > from using direct moves for TDmode values. > > This passed bootstrap and regtesting with no regressions. Ok for trunk? Yes, this is okay. Thanks! > This is also broken on the FSF 6 branch, so is this ok there too after > bootstrap and regtesting there? Okay. Segher
[PATCH], Fix PowerPC PR target/71720 (bad vec_init of V4SF on power9)
Aaron Sawdey was running tests on the Power9 simulator, and he noticed that the XXSPLTW to splat the SFmode value into the vector accessed the wrong element. When I added the more general code for vector splat for power9, I called an insn that reversed the element numbers on little endian, when I should have called the direct function, since the XSCVDPSPN instruction leaves the converted value in real element 0 in the hardware. I did a bootstrap and make check with no regressions, can I check this into the trunk? And can I back port the patch to GCC 6.2 after a burn-in period? [gcc] 2016-06-30 Michael MeissnerPR target/71720 * config/rs6000/vsx.md (vsx_splat_v4sf_internal): When splitting the insns, use vsx_xxspltw_v4sf_direct which does not check for little endian. [gcc/testsuite] 2016-06-30 Michael Meissner PR target/71720 * gcc.target/powerpc/pr71720.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 237895) +++ gcc/config/rs6000/vsx.md(working copy) @@ -2512,9 +2512,8 @@ (define_insn_and_split "*vsx_splat_v4sf_ [(set (match_dup 0) (unspec:V4SF [(match_dup 1)] UNSPEC_VSX_CVDPSPN)) (set (match_dup 0) - (vec_duplicate:V4SF -(vec_select:SF (match_dup 0) - (parallel [(const_int 0)]] + (unspec:V4SF [(match_dup 0) + (const_int 0)] UNSPEC_VSX_XXSPLTW))] "" [(set_attr "type" "vecload,vecperm,mftgpr") (set_attr "length" "4,8,4")]) Index: gcc/testsuite/gcc.target/powerpc/pr71720.c === --- gcc/testsuite/gcc.target/powerpc/pr71720.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr71720.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */ + +/* Verify that we generate xxspltw ,,0 for V4SFmode splat. */ + +vector float +splat_v4sf (float f) +{ + return (vector float) { f, f, f, f }; +} + +/* { dg-final { scan-assembler "xscvdpspn " } } */ +/* { dg-final { scan-assembler "xxspltw .*,.*,0" } } */
Re: RFC: pass to warn on questionable uses of alloca().
On 06/30/2016 06:01 AM, Aldy Hernandez wrote: On 06/18/2016 07:55 PM, Martin Sebor wrote: On 06/16/2016 02:32 AM, Aldy Hernandez wrote: p.s. The pass currently warns on all uses of VLAs. I'm not completely sold on this idea, so perhaps we could remove it, or gate it with a flag. I also think that VLA diagnostics would be better controlled by a separate option, and emit a different diagnostic (one that mentions VLA rather than alloca). Although again, and for VLAs even more so than for alloca, providing an option to have GCC use dynamic allocation, would be an even more robust solution than issuing warnings. IIRC, this was the early implementation of VLAs in GCC so there is a precedent for it. (Though this seems complementary to the warnings.) In addition, I'm of the opinion that potentially unbounded VLA allocation should be checked at runtime and made trap on size overflow in C and throw an exception in C++ (e.g., when int a [A][B] when A * B * sizeof (int) exceeds SIZE_MAX / 2 or some runtime-configurable limit). My C++ patch for bug 69517 does just that (it needs to be resubmitted with the runtime configuration limit added). For a choice of VLA warning options, there already is -Wvla, though it simply points out every VLA definition regardless of its size. It also issues a diagnostic that's questionable in C99 and later modes (that "ISO C90 forbids variable length array" is only relevant in C90; what matters when -Wvla is specified in C99 and C11 modes is that a VLA has been seen). To keep things consistent with -Walloca, perhaps -Wvla could be extended to take an optional argument to specify the VLA threshold. (So that -Wvla=4096 would only diagnose VLA definitions of 4k bytes or more, or unknown size.) Hmmm...I kinda wanted a sane default for -Walloca, and keeping -Walloc and -Wvla consistent would mean I don't get that. Currently, -Walloca would mean "warn on unbounded uses of alloca where n > DEFAULT", whereas -Walloca=0 would mean "warn on all uses of alloca". The problem is that -Wvla means "warn on ALL uses". One consistent option would be to change the definition to: -Walloca: warn on all uses of alloca (strict mode). -Walloca=N: warn on unbounded uses of alloca or bounded uses of n > N. -Wvla: warn on all uses of VLAs (as currently implemented). -Wvla=N: warn on unbounded uses of VLA or bounded uses of n > N. This means we get no defaults, and the user must make the limit explicit, but at least we're consistent with the current use of -Wvla. How about we just use -Walloca (and no separate variants for -Wvla=??), but we document that -Walloca will also flag VLAs (and explain why). Also, we make sure the error messages say "variable-length array" not "alloca" in the VLA case. This way we can have a default, and perhaps a more consistent flag: -Walloca: warn on all unbounded uses of alloca/vlas and bounded uses where n > DEFAULT. -Walloca=0: warn on all uses of alloca/vlas (strict mode). -Walloca=N: warn on all unbounded uses of alloca/vlas and bounded uses where n > N. -Wla: untouched; as is currently implemented. Would this be acceptable, or are y'all really against one -Walloca flag to rule it all? Since they are distinct features my preference is to keep alloca and VLAs under separate warnings, even if the options that control each aren't completely consistent with one another. If you think it's preferable to leave -Wvla unchanged then adding a new warning for the VLA size checking might be a compromise to consider. Alternatively, if you would prefer to have one option control both, then using a different name for the option might work. Incidentally, while checking the manual for an inspiration for a suitable name I came across -Wlarger-than= that sounds like should do exactly what -Walloca/-Wvla-too-big do, except it only looks at constant sizes. Maybe it should be made smarter? Martin
Re: [PATCH], Fix PowerPC PR target/71720 (bad vec_init of V4SF on power9)
On Thu, Jun 30, 2016 at 08:10:41PM -0400, Michael Meissner wrote: > Aaron Sawdey was running tests on the Power9 simulator, and he noticed that > the > XXSPLTW to splat the SFmode value into the vector accessed the wrong element. > When I added the more general code for vector splat for power9, I called an > insn that reversed the element numbers on little endian, when I should have > called the direct function, since the XSCVDPSPN instruction leaves the > converted value in real element 0 in the hardware. The changelog could be clearer (I first thought you attached the wrong patch :-) ) Instead of "use vsx_xxspltw_v4sf_direct" you could say e.g. "emit code matching use vsx_xxspltw_v4sf_direct,". Okay for trunk with the comment clarified, and for 6 later. Segher > * config/rs6000/vsx.md (vsx_splat_v4sf_internal): When splitting > the insns, use vsx_xxspltw_v4sf_direct which does not check for > little endian. > Index: gcc/config/rs6000/vsx.md > === > --- gcc/config/rs6000/vsx.md (revision 237895) > +++ gcc/config/rs6000/vsx.md (working copy) > @@ -2512,9 +2512,8 @@ (define_insn_and_split "*vsx_splat_v4sf_ >[(set (match_dup 0) > (unspec:V4SF [(match_dup 1)] UNSPEC_VSX_CVDPSPN)) > (set (match_dup 0) > - (vec_duplicate:V4SF > - (vec_select:SF (match_dup 0) > - (parallel [(const_int 0)]] > + (unspec:V4SF [(match_dup 0) > + (const_int 0)] UNSPEC_VSX_XXSPLTW))] >"" >[(set_attr "type" "vecload,vecperm,mftgpr") > (set_attr "length" "4,8,4")])
[PATCH, rs6000] Fix PR target/71698, ICE in reload copying TDmode values to GPR regs
We currently don't allow TDmode values to use direct moves, since they live in register pairs and the most significant word is always in the even-numbered register which does not match subreg ordering in little endian mode. The following patch fixes PR71698 by disallowing reload from using direct moves for TDmode values. This passed bootstrap and regtesting with no regressions. Ok for trunk? This is also broken on the FSF 6 branch, so is this ok there too after bootstrap and regtesting there? Peter gcc/ PR target/71698 * config/rs6000/rs6000.c (rs6000_secondary_reload_simple_move): Disallow TDmode values. gcc/testsuite/ PR target/71698 * gcc.target/powerpc/pr71698.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 237893) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -19194,7 +19194,8 @@ rs6000_secondary_reload_simple_move (enu simple move insns are issued. At present, 32-bit integers are not allowed in FPR/VSX registers. Single precision binary floating is not a simple move because we need to convert to the single precision memory layout. - The 4-byte SDmode can be moved. */ + The 4-byte SDmode can be moved. TDmode values are disallowed since they + need special direct move handling, which we do not support yet. */ size = GET_MODE_SIZE (mode); if (TARGET_DIRECT_MOVE && ((mode == SDmode) || (TARGET_POWERPC64 && size == 8)) @@ -19202,7 +19203,7 @@ rs6000_secondary_reload_simple_move (enu || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE))) return true; - else if (TARGET_DIRECT_MOVE_128 && size == 16 + else if (TARGET_DIRECT_MOVE_128 && size == 16 && mode != TDmode && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE) || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE))) return true; Index: gcc/testsuite/gcc.target/powerpc/pr71698.c === --- gcc/testsuite/gcc.target/powerpc/pr71698.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr71698.c (working copy) @@ -0,0 +1,13 @@ +/* Test for a reload ICE arising from trying to direct move a TDmode value. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-require-effective-target dfp } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-O1 -mcpu=power9 -mno-lra" } */ + +extern void testvad128 (int n, ...); +void +testitd128 (_Decimal128 g01d128) +{ + testvad128 (1, g01d128); +}
Re: [PATCH] c/71552 - Confusing error for incorrect struct initialization
On 06/20/2016 08:52 AM, Joseph Myers wrote: On Sat, 18 Jun 2016, Martin Sebor wrote: The attached patch slightly changes the order in which initializers are checked for type compatibility to issue the same error for static initializers of incompatible types as for automatic objects, rather than rejecting the former for their lack of constness first. OK, presuming the patch has passed the usual testing. Thanks. I committed it in r237829. The reporter wants to know if the patch can also be backported to 5 and or 6. Should I go ahead? Martin
Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc
On Thu, 30 Jun 2016, Bill Schmidt wrote: > # Return 1 if the target supports any special run-time requirements > > # for __float128 or _Float128, > > # 0 otherwise. > > > proc check_effective_target_base_quadfloat_support { } { > if { [istarget powerpc*-*-*] } { > return [check_vsx_hw_available] > } > return 1 > } Yes, that seems about right - can be combined with either tests on target names for __float128, or with the compile test for _Float128 (provided that dg-add-options float128 - which is preferred to hardcoding the particular options in the individual tests - can be made to take effect before the compile test for _Float128 gets run). -- Joseph S. Myers jos...@codesourcery.com
Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"
Hi! On Thu, 21 Jan 2016 22:54:26 +0100, I wrote: > Committed to gomp-4_0-branch in r232709: > > commit 41a76d233e714fd7b79dc1f40823f607c38306ba > Author: tschwinge> Date: Thu Jan 21 21:52:50 2016 + > > Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid > offloading" In gomp-4_0-branch r237895, I committed the following patch to 'only trigger the "avoid offloading" mechanism for -O2 and higher', resolving the confusing case that for -O0 and -O1 we'D not emit the diagnostic but still trigger the "avoid offloading" mechanism. commit 68ce05b476b68b50c2ed341ae6a77279850edbb1 Author: tschwinge Date: Thu Jun 30 20:46:07 2016 + Only trigger the "avoid offloading" mechanism for -O2 and higher gcc/ * config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Only trigger the "avoid offloading" mechanism for -O2 and higher. libgomp/ * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: Update. * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Update. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@237895 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp| 5 + gcc/config/nvptx/nvptx.c | 19 ++- libgomp/ChangeLog.gomp| 6 ++ .../libgomp.oacc-c-c++-common/avoid-offloading-1.c| 10 +- .../libgomp.oacc-fortran/avoid-offloading-1.f | 12 +++- 5 files changed, 41 insertions(+), 11 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 9bc1fbe..8c88119 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2016-06-30 Thomas Schwinge + + * config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Only trigger + the "avoid offloading" mechanism for -O2 and higher. + 2016-06-10 James Norris Backport from mainline r236098. diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c index 6d787b0..09a5a62 100644 --- gcc/config/nvptx/nvptx.c +++ gcc/config/nvptx/nvptx.c @@ -4152,8 +4152,13 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level) /* Detect if a function is unsuitable for offloading. */ if (!flag_offload_force && decl) { + /* Trigger the "avoid offloading" mechanism if a OpenACC kernels +construct could not be parallelized, but only do that for -O2 and +higher, as otherwise we're not expecting any parallelization to +happen. */ tree oacc_function_attr = get_oacc_fn_attrib (decl); - if (oacc_function_attr + if (optimize >= 2 + && oacc_function_attr && oacc_fn_attrib_kernels_p (oacc_function_attr)) { bool avoid_offloading_p = true; @@ -4167,14 +4172,10 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level) } if (avoid_offloading_p) { - /* OpenACC kernels constructs will never be parallelized for -optimization levels smaller than -O2; avoid the diagnostic in -this case. */ - if (optimize >= 2) - warning_at (DECL_SOURCE_LOCATION (decl), 0, - "OpenACC kernels construct will be executed " - "sequentially; will by default avoid offloading " - "to prevent data copy penalty"); + warning_at (DECL_SOURCE_LOCATION (decl), 0, + "OpenACC kernels construct will be executed" + " sequentially; will by default avoid offloading to" + " prevent data copy penalty"); DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("omp avoid offloading"), NULL_TREE, DECL_ATTRIBUTES (decl)); diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index af4e0d5..07fe8b7 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,9 @@ +2016-06-30 Thomas Schwinge + + * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: + Update. + * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Update. + 2016-06-10 Thomas Schwinge PR middle-end/71373 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c index 8f50ba3..d5fff2d 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c @@ -15,7 +15,15 @@ int main(void) if (x != 33) __builtin_abort(); -#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia +#if defined
Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc
On Thu, 2016-06-30 at 17:55 +, Joseph Myers wrote: > On Thu, 30 Jun 2016, Bill Schmidt wrote: > > > +# Return 1 if the target supports __float128 at run time, > > +# 0 otherwise. > > + > > +proc check_effective_target___float128_runnable { } { > > I'd think you should have an effective-target for this that's shared > between _Float128 and __float128, possibly one that gets anded with the > one for the relevant type being available at all. Well, as is it can be used for either _Float128 or __float128, so I suppose we could just change the name. check_effective_target_quadfloat_runnable? To provide something minimal to and with the effective-target __float128 check would get the somewhat silly: # Return 1 if the target supports any special run-time requirements # for __float128 or _Float128, # 0 otherwise. proc check_effective_target_base_quadfloat_support { } { if { [istarget powerpc*-*-*] } { return [check_vsx_hw_available] } return 1 } so I think what I have (with a new name) is probably better. > > +# Return 1 if the *-*-*gnu* target supports __float128 at run time, > > +# 0 otherwise. > > *-*-*gnu* here is actually a proxy for feenableexcept trap enablement > being available; if you're adding an effective-target, I'd think it should > be one for feenableexcept with an actual test of function availability, > which gets anded with the others. > Hm, ok. Perhaps for now I will just ditch changing this test and let that be a follow-on opportunity for me or someone else when we aren't under quite such a time crunch. Thanks, Bill
Re: [RFC] __atomic_compare_exchange* optimizations (PR middle-end/66867, alternative)
On Fri, Jun 24, 2016 at 6:35 AM, Jakub Jelinekwrote: > On Thu, Jun 23, 2016 at 05:23:21PM +0200, Jakub Jelinek wrote: >> Thinking about this again, there could be another option - keep >> __atomic_compare_exchange_N in the IL, but under certain conditions (similar >> to what the patch uses in fold_builtin_atomic_compare_exchange) for these >> builtins ignore on the second argument, and if we actually turn var >> into non-addressable, convert the builtin call similarly to what >> fold_builtin_atomic_compare_exchange does in the patch (except the store >> would be non-conditional then; the gimple-fold.c part wouldn't be needed >> then). > > Here is this second approach implemented. Generally it gives slight code > size savings over the other patch on the testcases I've posted (and even the > other patch used to produce generally smaller code than vanilla). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-06-24 Jakub Jelinek > > PR middle-end/66867 > * builtins.c (expand_ifn_atomic_compare_exchange_into_call, > expand_ifn_atomic_compare_exchange): New functions. > * internal-fn.c (expand_ATOMIC_COMPARE_EXCHANGE): New function. > * tree.h (build_call_expr_internal_loc): Rename to ... > (build_call_expr_internal_loc_array): ... this. Fix up type of > last argument. > * internal-fn.def (ATOMIC_COMPARE_EXCHANGE): New internal fn. > * predict.c (expr_expected_value_1): Handle IMAGPART_EXPR of > ATOMIC_COMPARE_EXCHANGE result. > * builtins.h (expand_ifn_atomic_compare_exchange): New prototype. > * gimple-fold.h (optimize_atomic_compare_exchange_p, > fold_builtin_atomic_compare_exchange): New prototypes. > * gimple-fold.c (optimize_atomic_compare_exchange_p, > fold_builtin_atomic_compare_exchange): New functions.. > * tree-ssa.c (execute_update_addresses_taken): If > optimize_atomic_compare_exchange_p, ignore in 2nd argument > of call when finding addressable vars, and if such var becomes > non-addressable, call fold_builtin_atomic_compare_exchange. > This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71716 H.J.
Re: [BUILDROBOT] Selftest failed for i686-wrs-vxworks
On Thu, 2016-06-30 at 08:38 -0400, Nathan Sidwell wrote: > Jan-Benedict, > > > I haven't given it any additional manual testing so far. It's > > pre-installation though. Maybe I'd just set WIND_BASE to some > > arbitrary value, just to make xgcc pass it's initial start-up test > > so > > that it can continue with self-testing? Or shall we set some value > > in gcc/Makefile.in for running the self-test? > > As I recall, WIND_BASE is expected to point at a vxworks install to > pick up > header files. It is an error not to have it set (silently skipping > it leads to > user confusion). > > If that's irrelevant for this testing environment, then setting it to > something > (probably just "", but safer might be > "/These.are.not.the.dirs.you.are.looking.for") should be fine. Sorry about the breakage. The error message appears to affect a few other targets within gcc/Makefile.in. For example: $ make s-macro_list echo | ./xgcc -B./ -B/usr/local/i686-wrs-vxworks/bin/ -isystem /usr/local/i686-wrs-vxworks/include -isystem /usr/local/i686-wrs -vxworks/sys-include -L/home/david/archive/huge/gcc-git-all -configs/selftest-multi-mk/i686-wrs-vxworks/gcc/../ld -E -dM - | \ sed -n -e 's/^#define \([^_][a-zA-Z0-9_]*\).*/\1/p' \ -e 's/^#define \(_[^_A-Z][a-zA-Z0-9_]*\).*/\1/p' | \ sort -u > tmp-macro_list xgcc: fatal error: environment variable ‘WIND_BASE’ not defined compilation terminated. /bin/sh /home/david/coding-3/gcc-git-unittests/src/gcc/../move-if -change tmp-macro_list macro_list cmp: EOF on tmp-macro_list echo timestamp > s-macro_list However the above issue doesn't fail the build: $ echo $? 0 It can be trivially reproduced like this: $ ./xgcc -B. -E - xgcc: fatal error: environment variable ‘WIND_BASE’ not defined compilation terminated. It happens due to evaluating this spec function: "getenv(WIND_BASE /target/h)" from this within gcc/config/vxworks.h: /* Since we provide a default -isystem, expand -isystem on the command line early. */ #undef VXWORKS_ADDITIONAL_CPP_SPEC #define VXWORKS_ADDITIONAL_CPP_SPEC \ "%{!nostdinc: \ %{isystem*} -idirafter \ %{mrtp: %:getenv(WIND_USR /h) \ ;:%:getenv(WIND_BASE /target/h)}}" Hence it appears that passing "-nostdinc" as a param will avoid the error: $ echo | ./xgcc -B. -E - -nostdinc # 1 "" # 1 "" # 1 "" # 1 "" $ echo $? 0 Presumably if you're explicitly building for vxworks you have a vxworks install, so there is a meaningful value to set WIND_BASE to, whereas if you don't have a vxworks install (and are merely building everything as a smoketest), you presumably only want to build the "gcc" subdir, since AFAIK you can't run then driver. So there are at least 2 ways of fixing this: (a) add "-nostdinc" when running the selftests i.e. to the invocations of GCC_FOR_TARGET in the "s-selftest" and "selftest-gdb" clauses of gcc/Makefile.in. I've verified that this fixes the issue for --target=i686-wrs-vxworks. (b) set WIND_BASE to a dummy value in contrib/config-list.mk (if not already set) so that the vxworks targets are able to at least build all of "gcc" without needing a vxworks install. Thoughts?
Re: Improve insert/emplace robustness to self insertion
On 29/06/2016 23:30, Jonathan Wakely wrote: On 29/06/16 21:36 +0100, Jonathan Wakely wrote: On 29/06/16 21:43 +0200, François Dumont wrote: As asked here is now a patch to only fix the robustness issue. The consequence is that it is reverting the latest optimization as, without smart algo, we always need to do a copy to protect against insertion of values contained in the vector as shown by new tests. I don't understand. There is no problem with insert(), only with emplace(), so why do both get worse? Also, the problem is with emplacing from an lvalue, so why do the number of operations change for test02 and test03, which are for xvalues and rvalues? I haven't analyzed your patch, but the results seem wrong. We should not have to do any more work to insert rvalues. What am I missing? It seems to me that the minimal fix would be: --- a/libstdc++-v3/include/bits/vector.tcc +++ b/libstdc++-v3/include/bits/vector.tcc @@ -312,7 +312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } else _M_insert_aux(begin() + (__position - cbegin()), - std::forward<_Args>(__args)...); + _Tp(std::forward<_Args>(__args)...)); return iterator(this->_M_impl._M_start + __n); } This causes regressions in the insert_vs_emplace.cc test because we insert rvalues using emplace: This is also why my patch is impacting insert from xvalue or rvalue. iterator insert(const_iterator __position, value_type&& __x) { return emplace(__position, std::move(__x)); } That's suboptimal, since in the general case we need an extra construction for emplacing, but we know that we don't need to do that when inserting rvalues. Why not ? I realized with your remarks that I was missing some tests in the new self_insert.cc. The ones to insert an rvalue coming from the vector itself. In the attached patch there is those 2 tests, do you agree with expected behavior ? For the moment it doesn't check that the source value has been indeed moved cause it doesn't, I will update it once it does. My patch also reorganize a little bit code to avoid redundant checks to find out if reallocation is necessary or not. I was also thinking about reusing _M_realloc_insert_aux within _M_fill_insert when reallocation is needed. So the correct fix would be to implement inserting rvalues without using emplace. The attached patch is a smaller change, and doesn't change the number of operations for insertions, only for emplacing lvalues. Ok, go ahead, I will try to rebase mine from yours. François diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index eaafa22..c37880a 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -949,7 +949,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus >= 201103L _M_emplace_back_aux(__x); #else - _M_insert_aux(end(), __x); + _M_realloc_insert_aux(end(), __x); #endif } @@ -1435,21 +1435,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus < 201103L void _M_insert_aux(iterator __position, const value_type& __x); -#else - template - static void - _S_insert_aux_assign(iterator __pos, _Args&&... __args) - { *__pos = _Tp(std::forward<_Args>(__args)...); } - - static void - _S_insert_aux_assign(iterator __pos, _Tp&& __arg) - { *__pos = std::move(__arg); } + void + _M_realloc_insert_aux(iterator __position, const value_type& __x); +#else template void _M_insert_aux(iterator __position, _Args&&... __args); template +void +_M_realloc_insert_aux(iterator __position, _Args&&... __args); + + template void _M_emplace_back_aux(_Args&&... __args); #endif diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc index 604be7b..9061593 100644 --- a/libstdc++-v3/include/bits/vector.tcc +++ b/libstdc++-v3/include/bits/vector.tcc @@ -112,27 +112,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #endif { const size_type __n = __position - begin(); - if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage - && __position == end()) - { - _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x); - ++this->_M_impl._M_finish; - } + if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) + if (__position == end()) + { + _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x); + ++this->_M_impl._M_finish; + } + else +#if __cplusplus >= 201103L + _M_insert_aux(begin() + (__position - cbegin()), __x); +#else + _M_insert_aux(__position, __x); +#endif else - { #if __cplusplus >= 201103L - const auto __pos = begin() + (__position - cbegin()); - if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) - { - _Tp __x_copy = __x; - _M_insert_aux(__pos, std::move(__x_copy)); - } - else - _M_insert_aux(__pos, __x); +
Re: [PATCH] Fix tree-ssa-strlen.c ICE (PR tree-optimization/71707)
On June 30, 2016 7:38:05 PM GMT+02:00, Jakub Jelinekwrote: >Hi! > >I thought for ADDR_EXPRs we couldn't reach this, but we actually can, >when >there is an SSA_NAME with POINTER_PLUS_EXPR of ADDR_EXPR and constant. > >The following patch handles it. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. >2016-06-30 Jakub Jelinek > > PR tree-optimization/71707 > * tree-ssa-strlen.c (get_stridx_plus_constant): Handle already present > strinfo even for ADDR_EXPR ptr. > > * gcc.dg/strlenopt-29.c: New test. > >--- gcc/tree-ssa-strlen.c.jj 2016-06-29 16:10:27.0 +0200 >+++ gcc/tree-ssa-strlen.c 2016-06-30 10:28:30.605430361 +0200 >@@ -677,8 +677,14 @@ get_stridx_plus_constant (strinfo *bases > { > if (r == 0) > { >-gcc_assert (TREE_CODE (ptr) == SSA_NAME); >-ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx; >+if (TREE_CODE (ptr) == SSA_NAME) >+ ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx; >+else >+ { >+int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0)); >+if (pidx != NULL && *pidx == 0) >+ *pidx = si->idx; >+ } > return si->idx; > } > break; >--- gcc/testsuite/gcc.dg/strlenopt-29.c.jj 2016-06-30 >10:36:34.673371348 +0200 >+++ gcc/testsuite/gcc.dg/strlenopt-29.c2016-06-30 10:38:21.731032244 >+0200 >@@ -0,0 +1,27 @@ >+/* PR tree-optimization/71707 */ >+/* { dg-do run } */ >+/* { dg-options "-O2 -fdump-tree-strlen" } */ >+ >+#include "strlenopt.h" >+ >+char a[32]; >+size_t b; >+ >+__attribute__((noinline, noclone)) char * >+foo (void) >+{ >+ char *p = memcpy (a, "a", 2) + 1; >+ memcpy ([1], "b", 2); >+ b = strlen (a) + strlen ([1]) + strlen (p); >+ return p; >+} >+ >+int >+main () >+{ >+ if (foo () != [1] || b != 4) >+abort (); >+ return 0; >+} >+ >+/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ > > Jakub
[PATCH 1/2] C: convert return type of lookup_name_fuzzy from tree to const char *
The followup patch implements lookup_name_fuzzy for the C++ frontend. It's cleaner for that implementation to return a const char *, so this patch updates the implementation in the C frontend to have the same return type. Successfully bootstrapped on x86_64-pc-linux-gnu (in combination with the followup patch). OK for trunk? gcc/c-family/ChangeLog: * c-common.h (lookup_name_fuzzy): Convert return type from tree to const char *. gcc/c/ChangeLog: * c-decl.c (implicit_decl_warning): Update for conversion of return type of lookup_name_fuzzy to const char *. (undeclared_variable): Likewise. (lookup_name_fuzzy): Convert return type from tree to const char *. * c-parser.c (c_parser_declaration_or_fndef): Update for conversion of return type of lookup_name_fuzzy to const char *. (c_parser_parameter_declaration): Likewise. gcc/ChangeLog: * gcc-rich-location.c (gcc_rich_location::add_fixit_misspelled_id): New overload, taking a const char *. * gcc-rich-location.h (gcc_rich_location::add_fixit_misspelled_id): Likewise. --- gcc/c-family/c-common.h | 2 +- gcc/c/c-decl.c | 22 +- gcc/c/c-parser.c| 10 +- gcc/gcc-rich-location.c | 13 + gcc/gcc-rich-location.h | 2 ++ 5 files changed, 34 insertions(+), 15 deletions(-) diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 3ad5400..44d98d1 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -997,7 +997,7 @@ enum lookup_name_fuzzy_kind { /* Any name. */ FUZZY_LOOKUP_NAME }; -extern tree lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind); +extern const char *lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind); extern bool vector_targets_convertible_p (const_tree t1, const_tree t2); extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note); diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 8b966fe..09f7f79 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -3088,7 +3088,7 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl) if (warn_implicit_function_declaration) { bool warned; - tree hint = NULL_TREE; + const char *hint = NULL; if (!olddecl) hint = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); @@ -3099,7 +3099,7 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl) richloc.add_fixit_misspelled_id (loc, hint); warned = pedwarn_at_rich_loc (, OPT_Wimplicit_function_declaration, - "implicit declaration of function %qE; did you mean %qE?", + "implicit declaration of function %qE; did you mean %qs?", id, hint); } else @@ -3112,7 +3112,7 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl) richloc.add_fixit_misspelled_id (loc, hint); warned = warning_at_rich_loc (, OPT_Wimplicit_function_declaration, - G_("implicit declaration of function %qE;did you mean %qE?"), + G_("implicit declaration of function %qE;did you mean %qs?"), id, hint); } else @@ -3433,14 +3433,14 @@ undeclared_variable (location_t loc, tree id) if (current_function_decl == 0) { - tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); + const char *guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); if (guessed_id) { gcc_rich_location richloc (loc); richloc.add_fixit_misspelled_id (loc, guessed_id); error_at_rich_loc (, "%qE undeclared here (not in a function);" -" did you mean %qE?", +" did you mean %qs?", id, guessed_id); } else @@ -3451,7 +3451,7 @@ undeclared_variable (location_t loc, tree id) { if (!objc_diagnose_private_ivar (id)) { - tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); + const char *guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME); if (guessed_id) { gcc_rich_location richloc (loc); @@ -3459,7 +3459,7 @@ undeclared_variable (location_t loc, tree id) error_at_rich_loc (, "%qE undeclared (first use in this function);" -" did you mean %qE?", +" did you mean %qs?", id, guessed_id); } else @@ -4010,7 +4010,7 @@ find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode *hashnode, It also looks for start_typename keywords, to detect "singed" vs "signed" typos. */ -tree +const char * lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind) { gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE); @@ -4079,7 +4079,11 @@ lookup_name_fuzzy (tree name,
[PATCH 2/2] C++ FE: handle misspelled identifiers and typenames
This is a port of the C frontend's r237714 [1] to the C++ frontend: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01052.html offering spelling suggestions for misspelled identifiers, macro names, and some keywords (e.g. "singed" vs "signed" aka PR c/70339). Unlike the C frontend, there doesn't seem to be an easy way to distinguish between cases where we're expecting a typename vs a variable name, so some of the logic is a little different. Examples of suggestions can be seen in the test case. Successfully bootstrapped on x86_64-pc-linux-gnu (in combination with the prior patch); adds 240 PASS results to g++.sum. OK for trunk? Dave [1] aka 8469aece13814deddf2cd80538d33c2d0a8d60d9 in the git mirror gcc/c/ChangeLog: PR c/70339 * c-decl.c (struct edit_distance_traits): Move to spellcheck-tree.h (best_macro_match): Likewise, converting from a typedef to a subclass. (find_closest_macro_cpp_cb): Move to spellcheck-tree.c. (lookup_name_fuzzy): Update for change of best_macro_match to a subclass with a ctor that calls cpp_forall_identifiers. gcc/cp/ChangeLog: PR c/70339 * name-lookup.c: Include gcc-rich-location.h, spellcheck-tree.h, and parser.h. (suggest_alternatives_for): If no candidates are found, try lookup_name_fuzzy and report if if finds a suggestion. (consider_binding_level): New function. (consider_binding_levels): New function. (lookup_name_fuzzy) New function. * parser.c: Include gcc-rich-location.h. (cp_lexer_next_token_is_decl_specifier_keyword): Move most of logic into... (cp_keyword_starts_decl_specifier_p): ...this new function. (cp_parser_diagnose_invalid_type_name): When issuing "does not name a type" errors, attempt to make a suggestion using lookup_name_fuzzy. * parser.h (cp_keyword_starts_decl_specifier_p): New prototype. * search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Don't attempt to access TYPE_FIELDS within a TYPE_PACK_EXPANSION. gcc/ChangeLog: PR c/70339 * diagnostic-show-locus.c (diagnostic_show_locus): If this is the same location as last time, don't skip if we have fix-it hints. Clarify the skipping logic by converting it from one "if" clause to repeated "if" clauses. * spellcheck-tree.c: Include "cpplib.h". (find_closest_macro_cpp_cb): Move here from c/c-decl.c. (best_macro_match::best_macro_match): New constructor. * spellcheck-tree.h (struct edit_distance_traits): Move here from c/c-decl.c. (class best_macro_match): Move here from c/c-decl.c, converting from a typedef to a subclass, gaining a ctor. gcc/testsuite/ChangeLog: PR c/70339 * g++.dg/spellcheck-identifiers.C: New test case, based on gcc.dg/spellcheck-identifiers.c. * g++.dg/spellcheck-typenames.C: New test case, based on gcc.dg/spellcheck-typenames.c --- gcc/c/c-decl.c| 42 + gcc/cp/name-lookup.c | 140 +- gcc/cp/parser.c | 43 +++-- gcc/cp/parser.h | 1 + gcc/cp/search.c | 4 + gcc/diagnostic-show-locus.c | 15 +- gcc/spellcheck-tree.c | 31 gcc/spellcheck-tree.h | 26 +++ gcc/testsuite/g++.dg/spellcheck-identifiers.C | 255 ++ gcc/testsuite/g++.dg/spellcheck-typenames.C | 84 + 10 files changed, 584 insertions(+), 57 deletions(-) create mode 100644 gcc/testsuite/g++.dg/spellcheck-identifiers.C create mode 100644 gcc/testsuite/g++.dg/spellcheck-typenames.C diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 09f7f79..dc14485 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -3955,45 +3955,6 @@ lookup_name_in_scope (tree name, struct c_scope *scope) return NULL_TREE; } -/* Specialization of edit_distance_traits for preprocessor macros. */ - -template <> -struct edit_distance_traits -{ - static size_t get_length (cpp_hashnode *hashnode) - { -return hashnode->ident.len; - } - - static const char *get_string (cpp_hashnode *hashnode) - { -return (const char *)hashnode->ident.str; - } -}; - -/* Specialization of best_match<> for finding the closest preprocessor - macro to a given identifier. */ - -typedef best_matchbest_macro_match; - -/* A callback for cpp_forall_identifiers, for use by lookup_name_fuzzy. - Process HASHNODE and update the best_macro_match instance pointed to be - USER_DATA. */ - -static int -find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode *hashnode, - void *user_data) -{ - if (hashnode->type != NT_MACRO) -return 1; - - best_macro_match *bmm = (best_macro_match *)user_data; -
[RFC] Change order of Fortran BLOCK_VARS (PR fortran/71687)
Hi! While usually the order of vars in block scope isn't that significant, e.g. on the following testcase with -fopenmp -fstack-arrays it does matter if we have local declaration of a temporary and then local declaration of a VLA that uses that other temporary as the TYPE_SIZE_UNIT, or the other way around (we ICE if we first see the VLA and firstprivatize the variable, only to later on discover it is a local var). pushdecl is called on them in the right order (scalar temporary first, then VLA that uses it), but the names chain is LIFO. Actually, with gfc_merge_block_scope it seems we end up in quite randomish order. So, is it intentional that pushdecl / getdecls acts like a LIFO? Though, it seems user vars are pushdecled in the reverse order of declarations, but gfc_add_decl_to_function is called in the user declared order, so perhaps for the following patch we'd also need to decl = nreverse (saved_function_decls); in gfc_generate_function_code and similarly in gfc_process_block_locals decl = nreverse (saved_local_decls); What do you think about this? Note, nreverse is fairly cheap, it is linear in the chain length. 2016-06-30 Jakub JelinekPR fortran/71687 * f95-lang.c (struct binding_level): Add reversed field. (clear_binding_level): Adjust initializer. (getdecls): If reversed is clear, set it and nreverse the names chain before returning it. (poplevel): Use getdecls. * gfortran.dg/gomp/pr71687.f90: New test. --- gcc/fortran/f95-lang.c.jj 2016-06-29 10:46:33.0 +0200 +++ gcc/fortran/f95-lang.c 2016-06-30 17:11:44.350240191 +0200 @@ -286,6 +286,9 @@ binding_level { tree blocks; /* The binding level containing this one (the enclosing binding level). */ struct binding_level *level_chain; + /* True if nreverse has been already called on names; if false, names + are ordered from newest declaration to oldest one. */ + bool reversed; }; /* The binding level currently in effect. */ @@ -296,7 +299,7 @@ static GTY(()) struct binding_level *cur static GTY(()) struct binding_level *global_binding_level; /* Binding level structures are initialized by copying this one. */ -static struct binding_level clear_binding_level = { NULL, NULL, NULL }; +static struct binding_level clear_binding_level = { NULL, NULL, NULL, false }; /* Return true if we are in the global binding level. */ @@ -310,6 +313,11 @@ global_bindings_p (void) tree getdecls (void) { + if (!current_binding_level->reversed) +{ + current_binding_level->reversed = true; + current_binding_level->names = nreverse (current_binding_level->names); +} return current_binding_level->names; } @@ -347,7 +355,7 @@ poplevel (int keep, int functionbody) binding level that we are about to exit and which is returned by this routine. */ tree block_node = NULL_TREE; - tree decl_chain = current_binding_level->names; + tree decl_chain = getdecls (); tree subblock_chain = current_binding_level->blocks; tree subblock_node; --- gcc/testsuite/gfortran.dg/gomp/pr71687.f90.jj 2016-06-30 17:12:27.279702475 +0200 +++ gcc/testsuite/gfortran.dg/gomp/pr71687.f90 2016-06-30 17:12:53.480374296 +0200 @@ -0,0 +1,11 @@ +! PR fortran/71687 +! { dg-do compile } +! { dg-additional-options "-fstack-arrays -O2" } + +subroutine s (n, x) + integer :: n + real :: x(n) +!$omp parallel + x(1:n) = x(n:1:-1) +!$omp end parallel +end Jakub
Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc
On Thu, 30 Jun 2016, Bill Schmidt wrote: > +# Return 1 if the target supports __float128 at run time, > +# 0 otherwise. > + > +proc check_effective_target___float128_runnable { } { I'd think you should have an effective-target for this that's shared between _Float128 and __float128, possibly one that gets anded with the one for the relevant type being available at all. > +# Return 1 if the *-*-*gnu* target supports __float128 at run time, > +# 0 otherwise. *-*-*gnu* here is actually a proxy for feenableexcept trap enablement being available; if you're adding an effective-target, I'd think it should be one for feenableexcept with an actual test of function availability, which gets anded with the others. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, rs6000] Enable some existing __float128 tests for powerpc64*
On Thu, 30 Jun 2016, Bill Schmidt wrote: > On Wed, 2016-06-29 at 16:37 +, Joseph Myers wrote: > > On Tue, 28 Jun 2016, Bill Schmidt wrote: > > > > > -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */ > > > +/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* powerpc64*-*-* > > > } } */ > > > /* { dg-options "-pedantic" } */ > > > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } > > > } */ > > > > Rather than duplicating powerpc64 references everywhere, wouldn't it be > > better to add an effective-target keyword __float128, meaning that > > __float128 is available? Along with { dg-add-options float128 }. > > Sure, I can do that. I guess I need to stay away from the name > check_effective_target_float128, though, as in your pending patch that > will mean availability of _Float128. Do you have a naming preference? My suggestion is check_effective_target___float128, so __float128 is the effective-target keyword. (You don't need separate naming for the dg-add-options case, since the same options will work for both __float128 and _Float128.) -- Joseph S. Myers jos...@codesourcery.com
[committed] Fix ICE with !$omp target update (PR fortran/71705)
Hi! The following patch fixes ICE on target update construct clauses. While the testcase is (at runtime) invalid in OpenMP 4.0, it is valid in 4.5, and we shouldn't ICE on it anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk, queued for 6.2 backport. 2016-06-30 Jakub JelinekPR fortran/71705 * trans-openmp.c (gfc_trans_omp_clauses): Set TREE_ADDRESSABLE on decls in to/from clauses. * gfortran.dg/gomp/pr71705.f90: New test. --- gcc/fortran/trans-openmp.c.jj 2016-06-03 21:25:17.0 +0200 +++ gcc/fortran/trans-openmp.c 2016-06-30 12:39:15.253931614 +0200 @@ -2182,6 +2182,8 @@ gfc_trans_omp_clauses (stmtblock_t *bloc tree decl = gfc_get_symbol_decl (n->sym); if (gfc_omp_privatize_by_reference (decl)) decl = build_fold_indirect_ref (decl); + else if (DECL_P (decl)) + TREE_ADDRESSABLE (decl) = 1; if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))) { tree type = TREE_TYPE (decl); --- gcc/testsuite/gfortran.dg/gomp/pr71705.f90.jj 2016-06-30 12:50:49.076186209 +0200 +++ gcc/testsuite/gfortran.dg/gomp/pr71705.f90 2016-06-30 12:48:06.0 +0200 @@ -0,0 +1,7 @@ +! PR fortran/71705 +! { dg-do compile } + + real :: x + x = 0.0 + !$omp target update to(x) +end Jakub
[committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)
Hi! The Fortran parser apparently relies in functions that have still undecided kind of the result that ST_GET_FCN_CHARACTERISTICS artificial statement is returned before any executable statements in the function. In normal statements that is ensured through decode_statement calling decode_specification_statement, which parses just a subset of statements, but for OpenMP we need to do something similar. If we figure out we want only the case_omp_decl statements, for any other we just try to gfc_match the keyword and if we match it, it means we'd be about to return an OpenMP executable statement, so instead return ST_GET_FCN_CHARACTERISTICS. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk, queued for 6.2 backport. Cesar, note OpenACC will need something similar (though, decode_acc_statement uses just the match macro, so you'll need another one for the executable statements). 2016-06-30 Jakub JelinekPR fortran/71704 * parse.c (matchs, matcho): Move right before decode_omp_directive. If spec_only, only gfc_match the keyword and if successful, goto do_spec_only. (matchds, matchdo): Define. (decode_omp_directive): Add spec_only local var and set it. Use matchds or matchdo macros instead of matchs or matcho for declare target, declare simd, declare reduction and threadprivate directives. Return ST_GET_FCN_CHARACTERISTICS if a non-declarative directive could be matched. (next_statement): For ST_GET_FCN_CHARACTERISTICS restore gfc_current_locus from old_locus even if there is no label. * gfortran.dg/gomp/pr71704.f90: New test. --- gcc/fortran/parse.c.jj 2016-06-01 14:20:51.0 +0200 +++ gcc/fortran/parse.c 2016-06-30 15:32:20.003410398 +0200 @@ -589,28 +589,6 @@ decode_statement (void) return ST_NONE; } -/* Like match, but set a flag simd_matched if keyword matched. */ -#define matchs(keyword, subr, st) \ -do { \ - if (match_word_omp_simd (keyword, subr, _locus, \ - _matched) == MATCH_YES) \ - return st; \ - else \ - undo_new_statement (); \ -} while (0); - -/* Like match, but don't match anything if not -fopenmp. */ -#define matcho(keyword, subr, st) \ -do { \ - if (!flag_openmp)\ - ; \ - else if (match_word (keyword, subr, _locus) \ - == MATCH_YES)\ - return st; \ - else \ - undo_new_statement (); \ -} while (0); - static gfc_statement decode_oacc_directive (void) { @@ -702,12 +680,63 @@ decode_oacc_directive (void) return ST_NONE; } +/* Like match, but set a flag simd_matched if keyword matched + and if spec_only, goto do_spec_only without actually matching. */ +#define matchs(keyword, subr, st) \ +do { \ + if (spec_only && gfc_match (keyword) == MATCH_YES) \ + goto do_spec_only; \ + if (match_word_omp_simd (keyword, subr, _locus, \ + _matched) == MATCH_YES) \ + return st; \ + else \ + undo_new_statement (); \ +} while (0); + +/* Like match, but don't match anything if not -fopenmp + and if spec_only, goto do_spec_only without actually matching. */ +#define matcho(keyword, subr, st) \ +do { \ + if (!flag_openmp)\ + ; \ + else if (spec_only && gfc_match (keyword) == MATCH_YES) \ + goto do_spec_only; \ + else if (match_word (keyword, subr, _locus) \ + == MATCH_YES)\ + return st; \ + else \ + undo_new_statement (); \ +} while (0); + +/* Like match, but set a flag simd_matched if keyword matched. */ +#define matchds(keyword, subr, st)
[PATCH] Fix tree-ssa-strlen.c ICE (PR tree-optimization/71707)
Hi! I thought for ADDR_EXPRs we couldn't reach this, but we actually can, when there is an SSA_NAME with POINTER_PLUS_EXPR of ADDR_EXPR and constant. The following patch handles it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-06-30 Jakub JelinekPR tree-optimization/71707 * tree-ssa-strlen.c (get_stridx_plus_constant): Handle already present strinfo even for ADDR_EXPR ptr. * gcc.dg/strlenopt-29.c: New test. --- gcc/tree-ssa-strlen.c.jj2016-06-29 16:10:27.0 +0200 +++ gcc/tree-ssa-strlen.c 2016-06-30 10:28:30.605430361 +0200 @@ -677,8 +677,14 @@ get_stridx_plus_constant (strinfo *bases { if (r == 0) { - gcc_assert (TREE_CODE (ptr) == SSA_NAME); - ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx; + if (TREE_CODE (ptr) == SSA_NAME) + ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx; + else + { + int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0)); + if (pidx != NULL && *pidx == 0) + *pidx = si->idx; + } return si->idx; } break; --- gcc/testsuite/gcc.dg/strlenopt-29.c.jj 2016-06-30 10:36:34.673371348 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-29.c 2016-06-30 10:38:21.731032244 +0200 @@ -0,0 +1,27 @@ +/* PR tree-optimization/71707 */ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +#include "strlenopt.h" + +char a[32]; +size_t b; + +__attribute__((noinline, noclone)) char * +foo (void) +{ + char *p = memcpy (a, "a", 2) + 1; + memcpy ([1], "b", 2); + b = strlen (a) + strlen ([1]) + strlen (p); + return p; +} + +int +main () +{ + if (foo () != [1] || b != 4) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ Jakub
[PATCHv2, testsuite] Enable some existing __float128 tests for powerpc
Hi, My previous patch added to the specific target list for the tests, and Joseph suggested it would be better to use an effective target check. This version does this. To avoid a name clash with Joseph's pending patch, the effective target names include __float128 rather than float128. I discovered that most of the tests work even for -m32 on big-endian, except those that depend on int128 support, so I've loosened the 64-bit restriction. I used the term "runnable" rather than the more common "hw_available" for the dg-run tests, to avoid a point of potential confusion. __float128_hw_available for PowerPC implies hardware support for the individual instructions, but we want to run these tests even when those are not present. However, we need VSX hardware support because of ABI restrictions for __float128. Tested on powerpc64[le]-unknown-linux-gnu. Is this ok for trunk, and eventual backport to gcc-6-branch? Thanks, Bill 2016-06-30 Bill Schmidt* gcc.dg/const-float128-ped.c: Require __float128 effective target, and specify additional options for powerpc*-*-*. * gcc.dg/const-float128.c: Likewise. * gcc.dg/torture/float128-cmp-invalid.c: Require __float128_runnable effective target, and specify additional options for powerpc*-*-*. * gcc.dg/torture/float128-div-underflow.c: Likewise. * gcc.dg/torture/float128-exact-underflow.c: Require __float128_runnable_gnu effective target, and specify additional options for powerpc*-*-*. * gcc.dg/torture/float128-extend-nan.c: Require __float128_runnable effective target, and specify additional options for powerpc*-*-*. * gcc.dg/torture/float128-nan.c: Likewise. * gcc.dg/torture/fp-int-convert-float128-timode-2.c: Likewise. * gcc.dg/torture/fp-int-convert-float128-timode-3.c: Likewise. * gcc.dg/torture/fp-int-convert-float128-timode.c: Likewise. * lib/target-supports.exp (check_effective_target___float128): New. (check_effective_target___float128_runnable): New. (check_effective_target___float128_runnable_gnu): New. Index: gcc/testsuite/gcc.dg/const-float128-ped.c === --- gcc/testsuite/gcc.dg/const-float128-ped.c (revision 237802) +++ gcc/testsuite/gcc.dg/const-float128-ped.c (working copy) @@ -1,5 +1,7 @@ /* Test 'q' suffix with -pedantic on __float128 type constants. */ -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-do compile } */ +/* { dg-require-effective-target __float128 } */ /* { dg-options "-pedantic" } */ +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */ __float128 a = 123.456789q; /* { dg-warning "non-standard suffix on floating constant" } */ Index: gcc/testsuite/gcc.dg/const-float128.c === --- gcc/testsuite/gcc.dg/const-float128.c (revision 237802) +++ gcc/testsuite/gcc.dg/const-float128.c (working copy) @@ -1,6 +1,8 @@ /* Test 'q' and 'Q' suffixes on __float128 type constants. */ -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-do compile } */ +/* { dg-require-effective-target __float128 } */ /* { dg-options "" } */ +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */ __float128 a = 123.456789q; __float128 b = 123.456789Q; Index: gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c === --- gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c (revision 237802) +++ gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c (working copy) @@ -1,7 +1,9 @@ /* Test for "invalid" exceptions from __float128 comparisons. */ -/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */ +/* { dg-do run } */ /* { dg-options "" } */ +/* { dg-require-effective-target __float128_runnable } */ /* { dg-require-effective-target fenv_exceptions } */ +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */ #include #include Index: gcc/testsuite/gcc.dg/torture/float128-div-underflow.c === --- gcc/testsuite/gcc.dg/torture/float128-div-underflow.c (revision 237802) +++ gcc/testsuite/gcc.dg/torture/float128-div-underflow.c (working copy) @@ -1,7 +1,9 @@ /* Test for spurious underflow from __float128 division. */ -/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */ +/* { dg-do run } */ /* { dg-options "" } */ +/* { dg-require-effective-target __float128_runnable } */ /* { dg-require-effective-target fenv_exceptions } */ +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */ #include #include Index: gcc/testsuite/gcc.dg/torture/float128-exact-underflow.c === ---
[lra] Cleanup the use of offmemok and don't count spilling cost for it
On 29/06/16 22:31, Jiong Wang wrote: Andreas Krebbel writes: On 06/28/2016 04:16 PM, Jiong Wang wrote: ... So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a fix here. The following draft patch fix this, my fix may be in correct as normally we will allow illegal constant offset if it's associated with virtual frame register before virtual register elimination, I don't know if that should be considered here. Also I don't know if such check should be constrainted under some architecture features. --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx addr, bool strict) || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS)) return false; } + + if (ad.disp + && !s390_short_displacement (ad.disp)) +return false; + return true; } Whether short displacement is required or not depends on the instruction. We always relied on reload to take care of this. legitimate_address_p needs to accept the union of all valid adresses on S/390. That change probably would cause a severe performance regression. Meanwhile I feel LRA might be too strict here, as we can't assume all address legitimized before lra, right? we still need to handle reload illegal address. While now, a reload of set (mem (reg + reg + offset)) regA (The offset is out of range that the inner address doesn't pass constraint check.) into regB <- reg + reg + offset (set (mem (regB)) regA) This is exactly what is supposed to happen in that case. All addresses which might be invalid to be used directly in an insn can be handled with load address or load address relative long (+ secondary reload for odd addresses). is treate as spill after r237277, so I feel the following check in lra-constraints.c also needs a relax? if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0)) Comments? I am testing a fix which teach LRA don't add spill cost if it's "offmemok". Here is the patch, From my understanding, "offmemok" is used to represent a memory operand who's address we want to reload, and searching of it's reference location seems confirmed my understanding as it's always used together with MEM_P check. So this patch does the following modifications: * Only set offmemok to true if MEM_P is also true, as otherwise offmemok is not used. * Remove redundant MEM_P check which was used together with offmemok. * Avoid the addition of spilling cost if offmemok be true as an address calculation reload is not spilling. bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm. OK for trunk? 2016-06-30 Jiong Wanggcc * lra-constraints.c (process_alt_operands): Only set "offmemok" for MEM_P. Remove redundant MEM_P check if it's used together with "offmemok" check. Don't add spilling cost for "offmemok". (curr_insn_transform): Remove redundant MEM_P check. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index bf08dce..4546f1e 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1967,7 +1967,6 @@ process_alt_operands (int only_alternative) combination, because we can't reload it. */ if (curr_alt_offmemok[m] - && MEM_P (*curr_id->operand_loc[m]) && curr_alt[m] == NO_REGS && ! curr_alt_win[m]) continue; } @@ -2089,7 +2088,8 @@ process_alt_operands (int only_alternative) || equiv_substition_p[nop]) badop = false; constmemok = true; - offmemok = true; + if (MEM_P (op)) + offmemok = true; break; case CT_ADDRESS: @@ -2436,8 +2436,7 @@ process_alt_operands (int only_alternative) reject += LRA_MAX_REJECT; } - if (! (MEM_P (op) && offmemok) - && ! (const_to_mem && constmemok)) + if (!offmemok && ! (const_to_mem && constmemok)) { /* We prefer to reload pseudos over reloading other things, since such reloads may be able to be @@ -2488,7 +2487,9 @@ process_alt_operands (int only_alternative) Code below increases the reject for both pseudo and non-pseudo spill. */ - if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0)) + if (no_regs_p + && !offmemok + && !(REG_P (op) && hard_regno[nop] < 0)) { if (lra_dump_file != NULL) fprintf @@ -3909,7 +3910,7 @@ curr_insn_transform (bool check_only_p) appearing where an offsettable address will do. It also may be a case when the address should be special in other words not a general one (e.g. it needs no index reg). */ - if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op)) + if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i]) { enum reg_class rclass; rtx *loc = (op, 0);
Re: [PATCH][ARM] Delete thumb_reload_in_h
Ping. Looking back at this it's probably obvious, but since I had already asked for approval... Thanks, Kyrill On 17/06/16 15:38, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00821.html Thanks, Kyrill On 10/06/16 15:55, Kyrill Tkachov wrote: Hi all, This function just ICEs and isn't actually called from anywhere. It was introduced back in 2000 as part of a large merge introducing Thumb support and was aborting even then. I don't think having it around is of any benefit. Tested on arm-none-eabi. Ok for trunk? Thanks, Kyrill 2016-06-10 Kyrylo Tkachov* config/arm/arm.c (thumb_reload_in_hi): Delete. * config/arm/arm-protos.h (thumb_reload_in_hi): Delete prototype.
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
This is the full test failure summary. I will compare with the previous commit tomorrow. The asan failure and the uninit-19 failure are interesting. uninit-19 should not have failed, but I have no idea what's going on with asan. I'll have a closer look tomorrow and look for other failing tests. cat <<'EOF' | Native configuration is x86_64-apple-darwin15.3.0 === g++ tests === Running target unix FAIL: g++.dg/debug/dwarf2/imported-decl-2.C -std=gnu++98 scan-assembler-times ascii "0".*ascii "0".*DIE .0x[0-9a-f]*. DW_TAG_imported_declaration 1 FAIL: g++.dg/debug/dwarf2/imported-decl-2.C -std=gnu++11 scan-assembler-times ascii "0".*ascii "0".*DIE .0x[0-9a-f]*. DW_TAG_imported_declaration 1 FAIL: g++.dg/debug/dwarf2/imported-decl-2.C -std=gnu++14 scan-assembler-times ascii "0".*ascii "0".*DIE .0x[0-9a-f]*. DW_TAG_imported_declaration 1 FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 92) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 93) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 94) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 95) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 96) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 98) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 99) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 100) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 101) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++11 (test for errors, line 102) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 92) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 93) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 94) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 95) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 96) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 98) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 99) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 100) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 101) FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C -std=c++14 (test for errors, line 102) FAIL: g++.dg/ext/sync-4.C -std=gnu++98 execution test FAIL: g++.dg/ext/sync-4.C -std=gnu++11 execution test FAIL: g++.dg/ext/sync-4.C -std=gnu++14 execution test FAIL: g++.dg/ipa/pr67056.C scan-ipa-dump cp "Speculative outer type:struct CompositeClass" FAIL: c-c++-common/pr60226.c -std=gnu++98 (test for excess errors) FAIL: c-c++-common/pr60226.c -std=gnu++11 (test for excess errors) FAIL: c-c++-common/pr60226.c -std=gnu++14 (test for excess errors) FAIL: g++.dg/tree-prof/pr57451.C compilation, -fprofile-generate -D_PROFILE_GENERATE UNRESOLVED: g++.dg/tree-prof/pr57451.C execution, -fprofile-generate -D_PROFILE_GENERATE UNRESOLVED: g++.dg/tree-prof/pr57451.C compilation, -fprofile-use -D_PROFILE_USE UNRESOLVED: g++.dg/tree-prof/pr57451.C execution,-fprofile-use -D_PROFILE_USE FAIL: g++.dg/tree-prof/pr63581.C compilation, -fprofile-generate -D_PROFILE_GENERATE UNRESOLVED: g++.dg/tree-prof/pr63581.C execution, -fprofile-generate -D_PROFILE_GENERATE UNRESOLVED: g++.dg/tree-prof/pr63581.C compilation, -fprofile-use -D_PROFILE_USE UNRESOLVED: g++.dg/tree-prof/pr63581.C execution,-fprofile-use -D_PROFILE_USE === g++ Summary === # of expected passes98650 # of unexpected failures32 # of expected failures314 # of unresolved testcases6 # of unsupported tests4169 /Users/manishearth/dev/gcc-build/gcc/testsuite/g++/../../xg++ version 7.0.0 20160630 (experimental) (GCC) === gcc tests === Running target unix FAIL: c-c++-common/asan/strncpy-overflow-1.c -O0 execution test FAIL: gcc.dg/debug/dwarf2/prod-options.c scan-assembler DW_AT_producer: "GNU C FAIL: gcc.dg/darwin-minversion-1.c (test for excess errors) FAIL: gcc.dg/darwin-minversion-2.c (test for excess errors) FAIL: gcc.dg/darwin-version-1.c (test for excess errors) FAIL: gcc.dg/framework-1.c (test for excess errors) FAIL: gcc.dg/uninit-19.c (test for warnings, line 22) FAIL: gcc.dg/uninit-19.c (test for excess errors) FAIL: c-c++-common/pr60226.c -Wc++-compat (test for excess errors) FAIL: gcc.dg/graphite/scop-19.c scan-tree-dump-times graphite "number of SCoPs: 0" 1 FAIL: gcc.dg/torture/pr68264.c -O0 execution test FAIL: gcc.dg/torture/pr68264.c -O1 execution test FAIL: gcc.dg/torture/pr68264.c -O2 execution test FAIL: gcc.dg/torture/p
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
On Thu, 30 Jun 2016, Richard Biener wrote: points-to analysis already has the constraint that POINTER_PLUS_EXPR cannot leave the object op0 points to. Of course currently nothing uses the fact whether points-to computes pointed-to as nothing (aka NULL) - so the argument may be moot. Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR handling should be clearly separate from PLUS_EXPR and that we have flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid object pointer (and thus you can do 4 + -4 and reach NULL). Thanks. So the tricky point is that we are not allowed to transform g into f below: char*f(char*p){return p+4;} char*g(char*p){return (char*)((intptr_t)p+4);} That makes sense and seems much easier to guarantee than I feared, nice. (on the other hand, only RTL is able to simplify (long)p+4-(long)(p+4)) -- Marc Glisse
Re: [PATCH][AArch64] Increase code alignment
Evandro Menezes wrote: On 06/29/16 07:59, James Greenhalgh wrote: > On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote: >> ping >> >> >> From: Wilco Dijkstra >> Sent: 03 June 2016 11:51 >> To: GCC Patches >> Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; >> jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes >> Subject: [PATCH][AArch64] Increase code alignment >> >> Increase loop alignment on Cortex cores to 8 and set function alignment to >> 16. This makes things consistent across big.LITTLE cores, improves >> performance of benchmarks with tight loops and reduces performance variations >> due to small changes in code layout. It looks almost all AArch64 cores agree >> on alignment of 16 for function, and 8 for loops and branches, so we should >> change -mcpu=generic as well if there is no disagreement - feedback welcome. >> >> OK for commit? > Hi Wilco, > > Sorry for the delay. > > This patch is OK for trunk. > > I hope we can continue the discussion as to whether there is a set of > values for -mcpu=generic that better suits the range of cores we now > support. > > After Wilco's patch, and using the values in the proposed vulcan and > qdf24xx structures, and with the comments from this thread, the state of > the tuning structures will be: > > -mcpu= : function-jump-loop alignment > > cortex-a35: 16-8-8 > cortex-a53: 16-8-8 > cortex-a57: 16-8-8 > cortex-a72: 16-8-8 > cortex-a73: 16-8-4 (Though I'm guessing this is just an ordering issue on > when Kyrill copied the cost table. Kyrill/Wilco do you > want to spin an update for the Cortex-A73 alignment? > Consider it preapproved if you do) > exynos-m1 : 4-4-4 (2: 16-4-16, 3: 8-4-4) > thunderx : 8-8-8 (But 16-8-8 acceptable/maybe better) > xgene1 :16-8-16 > qdf24xx :16-8-16 > vulcan :16-8-16 > > Generic is currently set to 8-8-4, which doesn't look representative of > these individual cores at all. > > Running down that list, I see a very compelling case for 16 for function > alignment (from comments in this thread, it is second choice, but not too > bad for exynos-m1, thunderx is maybe better at 16, but should not be > worse). I also see a close to unanimous case for 8 for jump alignment, > though I note that in Evandro's experiments 8 never came out as "best" > for exynos-m1. Did you get a feel in your experiments for what the > performance penalty of aligning jumps to 8 would be? That generic is > currently set to 8 seems like a good tie-breaker if the performance impact > for exynos-m1 would be minimal, as it gives no change from the current > behaviour. > > For loop alignment I see a split straight down the middle between 8 > and 16 (exynos-m1/cortex-a73 are the outliers at 4, but second place for > exynos-m1 was 16-4-16, and cortex-a73 might just be an artifact of when the > table was copied). > > From that, and as a starting point for discussion, I'd suggest that > 16-8-8 or 16-8-16 are most representative from the core support that has > been added to GCC. > > Wilco, the Cortex-A cores tip the balance in favour of 16-8-8. As you've > most recently looked at the cost of setting loop alignment to 16, do you > have any comments on why you chose to standardize on 16-8-8 rather than > 16-8-16, or what the cost would be of 16-8-16? > > I've widened the CC list, and I'd appreciate any comments on what we all > think will be the right alignment parameters for -mcpu=generic. This is a very good summary. However, I think that it should also consider the effects on code size. Evidently, an alignment of 16 has the greatest probability of increasing code size, whereas an alignment of 4 doesn't increase it. Likewise, what is aligned also matters, for each of them have a different frequency of instances. Arguably, the function alignment has the least effect in code size, followed by jumps and then loops, based on typical frequencies. In specific cases, the weights may move somewhat, of course. As I stated in my previous reply, I also tracked the code size when experimenting with different alignments. It was clear form my data that the jump alignment was the most critical to code size (~3% on average), followed by loop alignment (<1%) and function alignment (<1%). Therefore, I'd argue against setting the alignment for jumps to 16. The case can be made for an alignment of 8 for them, when the cost to code size is more modest. From a performance perspective, generic alignments at 16-8-8 would be the 4th rank on Exynos M1, with a negligible code size penalty (<1%) over the current 8-4-4. -- Hi Evandro, Agreed, the codesize impact is important as well, so there is a balance to be made. Looking at the codesize cost, branch alignment has the largest cost but also the least benefit. It isn't helped by GCC's very simplistic way of aligning - our goal is to improve fetching of hot code,
Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.
On Jun 23, 2016, at 5:21 AM, Senthil Kumar Selvarajwrote: > > 2. Even if (1) is fixed, the custom section (.foo) is not mapped to > any output section or region in the linker script. The linker can > error out only if the contents overflow a region. If the segment isn't loaded, then it's size can't matter any? If it is loaded, and it goes onto a machine with a finite size, then the loader can (should) complain if it can't fit, and/or the linker can know that size, as it is a fixed finite size. The size would usually be part of a board support package and can include a memory segment with that size.
Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.
On Jun 23, 2016, at 4:00 AM, Georg-Johann Laywrote: > Binutils don't produce a message That's unfortunate. > so there is nothing to scan for. Hacking binutils is beyond my scope. That's fine. > avrtest behaves just as if the program under test would call abort. That's unfortunate. Would be nice if they said, can't load program, or some other unique thing we could check for. > There are at least 2 other AVR simulators; dunno how they would handle the > situation. That doesn't matter, as the tool chain has to be ported to each. Of course, if they can self agree on common things, that would be best. If they do, then the work on our side is 1/2 the work. > I don't see how an a-posteriori test could be independent of simulator, > independent of board descriptions and all that stuff. I can explain if detail, if you want. Program loader always says 'program load failed, program too big', when that is true, then dejagnu can look for this string, and always know that the test case is unsupported, doesn't matter the board file, the platform, the OS, the chip architecture or the time of day. The reason is the program loader must know onto what it is loading. Likewise, when linking, the board file can have details, like this target has 96KB of ram, and some linkers, given that detail, can create a memory segment that is that size, and as the linker fills that segment, it can print, segment overflowed. Binutils usually does this for free, _if_ you give it memory segments. On virtual memory systems, we never do, as there is little concept of memory overload at static link time; but, for fixed memory systems, it is perfectly reasonable to have memory segments that have this size, if one wants. A program loader that can't complain in some fashion, is, well, not a nice program. For binutils, you can have: MEMORY { SRAM(rwx) : ORIGIN = 0x, LENGTH = 128K } to define SRAM as being 128KB for example. The linker script can then put things into sram, and when it overflows, it will print an error. By doing this, one never has to worry about load failures. > Is this one ok? Ok.
Re: [PATCH, rs6000] Enable some existing __float128 tests for powerpc64*
On Wed, 2016-06-29 at 16:37 +, Joseph Myers wrote: > On Tue, 28 Jun 2016, Bill Schmidt wrote: > > > -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */ > > +/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* powerpc64*-*-* } > > } */ > > /* { dg-options "-pedantic" } */ > > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } > > */ > > Rather than duplicating powerpc64 references everywhere, wouldn't it be > better to add an effective-target keyword __float128, meaning that > __float128 is available? Along with { dg-add-options float128 }. Sure, I can do that. I guess I need to stay away from the name check_effective_target_float128, though, as in your pending patch that will mean availability of _Float128. Do you have a naming preference? > > Also: powerpc-* targets with -m64 should always be handled in tests > identically to powerpc64 targets, while powerpc64 targets with -m32 should > presumably not run these tests. That is, testing for powerpc64*-*-* is > actually (always, for any test, not just these ones) incorrect in both > directions (just as it's always incorrect for a test to list just one of > i?86-*-* and x86_64-*-*, rather than listing both and then adding any > restrictions required to 32-bit or 64-bit). > > > -/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */ > > +/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* powerpc64*-*-* } } */ > > /* { dg-options "" } */ > > /* { dg-require-effective-target fenv_exceptions } */ > > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } > > */ > > Also: for execution tests, if you add extra options, you also need to > restrict the test to running when the hardware actually supports the > required features. > All good points, will fix in next revision. Thanks! Bill
Re: [PATCH PR70729] The second part of patch.
Richard, Could you please review additional simple fix for 70729 - we need to nullify safelen field of loops containing simduid intrinsics like GOMP_SIMD_LANE (introduced e.g. for private variables). I checked that this fix cures regression which was missed by me since AVX2 machine is required for libgomp.fortran/examples-4/simd-2.f90. Regression testing and bootstrapping did not show any new failures. Is it OK for trunk? Patch: diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2669813..9fbd183 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab) gcc_assert (TREE_CODE (arg) == SSA_NAME); simduid_to_vf *p = NULL, data; data.simduid = DECL_UID (SSA_NAME_VAR (arg)); + /* Need to nullify loop safelen field since it's value is not + valid after transformation. */ + if (bb->loop_father && bb->loop_father->safelen > 0) +bb->loop_father->safelen = 0; if (htab) { p = htab->find (); ChangeLog: 2016-06-30 Yuri RumyantsevPR tree-optimization/70729 * tree-vectorizer.c (adjust_simduid_builtins): Nullify safelen field of loop since it can be not valid after transformation. 2016-06-30 17:28 GMT+03:00 Jakub Jelinek : > On Thu, Jun 30, 2016 at 04:25:16PM +0200, Thomas Schwinge wrote: >> Hi! >> >> On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsev >> wrote: >> > Thanks for your help. >> > Could you please look at the following simple patch which cures >> > regression - we need to nullify loop safelen field in >> > adjust_simduid_builtins: >> > >> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c >> > index 2669813..f70380c 100644 >> > --- a/gcc/tree-vectorizer.c >> > +++ b/gcc/tree-vectorizer.c >> > @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table >> > *htab) >> > gcc_assert (TREE_CODE (arg) == SSA_NAME); >> > simduid_to_vf *p = NULL, data; >> > data.simduid = DECL_UID (SSA_NAME_VAR (arg)); >> > + /* Need to nullify safelen fielf of loop since it's vale is not >> > +valid after transformation. */ > > s/fielf/field/ > s/vale/value/ > >> > + if (bb->loop_father && bb->loop_father->safelen > 0) >> > + bb->loop_father->safelen = 0; >> > if (htab) >> > { >> > p = htab->find (); > > Jakub
Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
On 07/06/16 17:56, Kyrill Tkachov wrote: > Hi all, > > This patch addresses an deficiency we have in handling vector > lane-to-lane moves in the AArch64 backend. > Generally we can use the INS (element) instruction but, as a user > complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html > we don't. James had a patch adding an appropriate combine pattern some > time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) > but it never got applied. > > This patch is a rebase of that patch that adds necessary > vec_merge+vec_duplicate+vec_select combine pattern. > I chose to use a define_insn rather than the define_insn_and_split in > that patch that just deletes the instruction when > the source and destination registers are the same, as I think that's not > he combine patterns job to delete the redundant instruction > but rather some other passes job. Also, I was not able to create a > testcase where it would make a difference. > > Also, this patch doesn't reimplement that vcopy*lane* intrinsics from > inline assembly to a vget_lane+vset_lane combo. > This can be done as a separate patch on top of this one. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Also tested on aarch64_be-none-elf. > > Ok for trunk? > OK. R. > Thanks, > Kyrill > > 2016-06-07 James Greenhalgh> Kyrylo Tkachov > > * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): > New define_insn. > (*aarch64_simd_vec_copy_lane_): Likewise. > > 2016-06-07 James Greenhalgh > Kyrylo Tkachov > > * gcc.target/aarch64/vget_set_lane_1.c: New test. > > aarch64-ins-vec.patch > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set" >[(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")] > ) > > +(define_insn "*aarch64_simd_vec_copy_lane" > + [(set (match_operand:VALL 0 "register_operand" "=w") > + (vec_merge:VALL > + (vec_duplicate:VALL > + (vec_select: > + (match_operand:VALL 3 "register_operand" "w") > + (parallel > + [(match_operand:SI 4 "immediate_operand" "i")]))) > + (match_operand:VALL 1 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD" > + { > +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); > +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); > +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); > + > +return "ins\t%0.[%p2], %3.[%4]"; > + } > + [(set_attr "type" "neon_ins")] > +) > + > +(define_insn "*aarch64_simd_vec_copy_lane_" > + [(set (match_operand:VALL 0 "register_operand" "=w") > + (vec_merge:VALL > + (vec_duplicate:VALL > + (vec_select: > + (match_operand: 3 "register_operand" "w") > + (parallel > + [(match_operand:SI 4 "immediate_operand" "i")]))) > + (match_operand:VALL 1 "register_operand" "0") > + (match_operand:SI 2 "immediate_operand" "i")))] > + "TARGET_SIMD" > + { > +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); > +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); > +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, > +INTVAL (operands[4]))); > + > +return "ins\t%0.[%p2], %3.[%4]"; > + } > + [(set_attr "type" "neon_ins")] > +) > + > (define_insn "aarch64_simd_lshr" > [(set (match_operand:VDQ_I 0 "register_operand" "=w") > (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") > diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > new file mode 100644 > index > ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > @@ -0,0 +1,72 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "arm_neon.h" > + > +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \ > +TYPE1 __attribute__((noinline,noclone)) \ > +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) > \ > +{\ > + return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\ > + a, INDEX1); \ > +} > + > +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6) > +BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6) > +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6) > +/* { dg-final
Re: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets
On May 5, 2016, at 8:14 AM, Robert Suchanekwrote: > > I'm resending this patch as it has been rebased and updated. I reverted a > change > to check_effective_target_vect_call_lrint procedure because it does not use > cached result. Ok. Please ensure that the compilation flag is mixed into the test case name so that as you iterate over them, the test case names are unique.
[PATCH][ARM] -mpure-code option for ARM
Hello, This patch adds the -mpure-code option for ARM. This option ensures functions are put into sections that contain only code and no data. To ensure this throughout compilation we give these sections the ARM processor-specific ELF section attribute "SHF_ARM_PURECODE". This option is only supported for non-pic code for armv7-m targets. This patch introduces a new target hook 'TARGET_ASM_ELF_FLAGS_NUMERIC'. This target hook enables a target to use the numeric value for elf section attributes rather than their alphabetical representation. If TARGET_ASM_ELF_FLAGS_NUMERIC returns TRUE, the existing 'default_elf_asm_named_section', will print the numeric value of the section attributes for the current section. This target hook has two parameters: unsigned int FLAGS, the input parameter that tells the function the current section's attributes; unsigned int *NUM, used to pass down the numerical representation of the section's attributes. The default implementation for TARGET_ASM_ELF_FLAGS_NUMERIC will return false, so existing behavior is not changed. Bootstrapped and tested for arm-none-linux-gnueabihf. Further tested for arm-none-eabi with a Cortex-M3 target. gcc/ChangeLog: 2016-06-30 Andre VieiraTerry Guo * target.def (elf_flags_numeric): New target hook. * targhooks.h (default_asm_elf_flags_numeric): New. * varasm.c (default_asm_elf_flags_numeric): New. (default_elf_asm_named_section): Use new target hook. * config/arm/arm.opt (mpure-code): New. * config/arm/arm.h (SECTION_ARM_PURECODE): New. * config/arm/arm.c (arm_asm_init_sections): Add section attribute to default text section if -mpure-code. (arm_option_check_internal): Diagnose use of option with non supported targets and/or options. (arm_asm_elf_flags_numeric): New. (arm_function_section): New. (arm_elf_section_type_flags): New. * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable for -mpure-code. * gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New. * gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise. gcc/testsuite/ChangeLog: 2016-06-30 Andre Vieira Terry Guo * gcc.target/arm/pure-code/ffunction-sections.c: New. * gcc.target/arm/pure-code/no-literal-pool.c: New. * gcc.target/arm/pure-code/pure-code.exp: New. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f0cdd669191689bc5dcf3a7c2b60da5a2d201e3f..d10605cee0e6e0e07bbb4e1910d30c91443f8d17 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2263,4 +2263,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); /* For switching between functions with different target attributes. */ #define SWITCHABLE_TARGET 1 +/* Define SECTION_ARM_PURECODE as the ARM specific section attribute + representation for SHF_ARM_PURECODE in GCC. */ +#define SECTION_ARM_PURECODE SECTION_MACH_DEP + #endif /* ! GCC_ARM_H */ >From 6f3e37973e0c4dbd393325addf42265c42726240 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Tue, 14 Jun 2016 11:17:12 +0100 Subject: [PATCH] -mpure-code for ARM --- gcc/config/arm/arm.c | 150 - gcc/config/arm/arm.h | 4 + gcc/config/arm/arm.opt | 6 + gcc/config/arm/elf.h | 3 +- gcc/doc/invoke.texi| 11 +- gcc/doc/tm.texi| 6 + gcc/doc/tm.texi.in | 2 + gcc/target.def | 10 ++ gcc/targhooks.h| 2 + .../gcc.target/arm/pure-code/ffunction-sections.c | 17 +++ .../gcc.target/arm/pure-code/no-literal-pool.c | 73 ++ .../gcc.target/arm/pure-code/pure-code.exp | 54 gcc/varasm.c | 57 +--- 13 files changed, 366 insertions(+), 29 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/ffunction-sections.c create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool.c create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f60955438d6f1cc5d996e7eacd4b453213044181..f5cb301a0efb23dc66e4d220b4a8f32e3670e744 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -214,8 +214,8 @@ static bool arm_return_in_memory (const_tree, const_tree); static void arm_unwind_emit (FILE *, rtx_insn *); static bool arm_output_ttype (rtx); static void arm_asm_emit_except_personality (rtx); -static void arm_asm_init_sections (void); #endif +static void arm_asm_init_sections (void); static rtx
Re: [PATCH PR70729] The second part of patch.
On Thu, Jun 30, 2016 at 04:25:16PM +0200, Thomas Schwinge wrote: > Hi! > > On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsev> wrote: > > Thanks for your help. > > Could you please look at the following simple patch which cures > > regression - we need to nullify loop safelen field in > > adjust_simduid_builtins: > > > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > > index 2669813..f70380c 100644 > > --- a/gcc/tree-vectorizer.c > > +++ b/gcc/tree-vectorizer.c > > @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table > > *htab) > > gcc_assert (TREE_CODE (arg) == SSA_NAME); > > simduid_to_vf *p = NULL, data; > > data.simduid = DECL_UID (SSA_NAME_VAR (arg)); > > + /* Need to nullify safelen fielf of loop since it's vale is not > > +valid after transformation. */ s/fielf/field/ s/vale/value/ > > + if (bb->loop_father && bb->loop_father->safelen > 0) > > + bb->loop_father->safelen = 0; > > if (htab) > > { > > p = htab->find (); Jakub
Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
Ping. Thanks, Kyrill On 22/06/16 11:07, Kyrill Tkachov wrote: Ping. Richard, Marcus, do you have any feedback on this? https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html Thanks, Kyrill On 14/06/16 10:36, James Greenhalgh wrote: On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote: Hi all, This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend. Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never got applied. This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction but rather some other passes job. Also, I was not able to create a testcase where it would make a difference. Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo. This can be done as a separate patch on top of this one. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? This looks OK to me, but as it is based on my code I probably can't approve it within the spirit of the write access policies (I only have localized review permission). Best wait for Richard/Marcus or a global reviewer to take a look. Thanks, Kyrill 2016-06-07 James GreenhalghKyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): New define_insn. (*aarch64_simd_vec_copy_lane_): Likewise. Watch your ChangeLog formatting. Thanks, James 2016-06-07 James Greenhalgh Kyrylo Tkachov * gcc.target/aarch64/vget_set_lane_1.c: New test.
Re: [PATCH PR70729] The second part of patch.
Hi! On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsevwrote: > Thanks for your help. > Could you please look at the following simple patch which cures > regression - we need to nullify loop safelen field in > adjust_simduid_builtins: > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index 2669813..f70380c 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab) > gcc_assert (TREE_CODE (arg) == SSA_NAME); > simduid_to_vf *p = NULL, data; > data.simduid = DECL_UID (SSA_NAME_VAR (arg)); > + /* Need to nullify safelen fielf of loop since it's vale is not > +valid after transformation. */ > + if (bb->loop_father && bb->loop_father->safelen > 0) > + bb->loop_father->safelen = 0; > if (htab) > { > p = htab->find (); Quickly testing that by manually compiling libgomp.fortran/examples-4/simd-2.f90 with -O2, I see that the dump files generated by -fdump-tree-all are now back to what they were before your change, and when manually running that executable, it exits normally, and the following valgrind diagnostics no longer show up (that I saw after your change): ==5942== Use of uninitialised value of size 8 ==5942==at 0x40083D: __simd2_mod_MOD_work (simd-2.f90:18) ==5942==by 0x4008D4: MAIN__ (simd-2.f90:68) ==5942==by 0x4005DC: main (simd-2.f90:58) ==5942== ==5942== Use of uninitialised value of size 8 ==5942==at 0x400845: __simd2_mod_MOD_work (simd-2.f90:18) ==5942==by 0x4008D4: MAIN__ (simd-2.f90:68) ==5942==by 0x4005DC: main (simd-2.f90:58) ==5942== ==5942== Invalid read of size 8 ==5942==at 0x400845: __simd2_mod_MOD_work (simd-2.f90:18) ==5942==by 0x4008D4: MAIN__ (simd-2.f90:68) ==5942==by 0x4005DC: main (simd-2.f90:58) ==5942== Address 0x5d3 is not stack'd, malloc'd or (recently) free'd (Address 0x5d3 looks highly suspicious.) In case that's relevant: $ sed -e '/^$/Q' < /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz stepping: 9 microcode : 0x1b cpu MHz : 1744.875 cache size : 6144 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt bugs: bogomips: 5382.58 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Grüße Thomas > 2016-06-30 15:21 GMT+03:00 Jakub Jelinek : > > On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote: > >> Hi Grüße. > >> > >> Could you please tell me how to reproduce your regression - did not > >> see any new failures in my local area: > >> > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 (test for excess errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 execution test > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 execution test > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > >> -funroll-loops -fpeel-loops -ftracer -finline-functions (test for > >> excess errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution > >> test > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess > >> errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess errors) > >> PASS: libgomp.fortran/examples-4/simd-2.f90 -Os execution test > >> > >> I used HASWELL machine for it since test requires target avx. > > > > It fails everywhere for me, without offloading configured, both 32-bit and > > 64-bit. > > $ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ >
Re: [Patch AArch64] Fix PR target/63874
On 31/03/16 14:11, Ramana Radhakrishnan wrote: > Hi, > > In this PR we have a situation where we aren't really detecting > weak references vs weak definitions. If one has a weak definition > that binds locally there's no reason not to put out PC relative > relocations. > > However if you have a genuine weak reference that is > known not to bind locally it makes very little sense > to put out an entry into the literal pool which doesn't always > work with DSOs and shared objects. > > Tested aarch64-none-linux-gnu bootstrap and regression test with no > regressions > > This is not a regression and given what we've seen recently with protected > symbols and binds_locally_p I'd rather this were queued for GCC 7. > > Ok ? > Yes, this looks fine. Sorry for the delay replying. R. > regards > Ramana > > gcc/ > > * config/aarch64/aarch64.c (aarch64_classify_symbol): Typo in comment fixed. > Only force to memory if it is a weak external reference. > > > gcc/testsuite > > * gcc.target/aarch64/pr63874.c: New test. > > > pr63874.txt > > > commit e41d4bd6abbee99628909d4af612504844dee640 > Author: Ramana Radhakrishnan> Date: Thu Mar 31 13:47:33 2016 +0100 > > fix PR63874 > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index cf1239d..6782316 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -9387,15 +9387,18 @@ aarch64_classify_symbol (rtx x, rtx offset) >switch (aarch64_cmodel) > { > case AARCH64_CMODEL_TINY: > - /* When we retreive symbol + offset address, we have to make sure > + /* When we retrieve symbol + offset address, we have to make sure >the offset does not cause overflow of the final address. But >we have no way of knowing the address of symbol at compile time >so we can't accurately say if the distance between the PC and >symbol + offset is outside the addressible range of +/-1M in the >TINY code model. So we rely on images not being greater than >1M and cap the offset at 1M and anything beyond 1M will have to > - be loaded using an alternative mechanism. */ > - if (SYMBOL_REF_WEAK (x) > + be loaded using an alternative mechanism. Furthermore if the > + symbol is a weak reference to something that isn't known to > + resolve to a symbol in this module, then force to memory. */ > + if ((SYMBOL_REF_WEAK (x) > +&& !aarch64_symbol_binds_local_p (x)) > || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575) > return SYMBOL_FORCE_TO_MEM; > return SYMBOL_TINY_ABSOLUTE; > @@ -9403,7 +9406,8 @@ aarch64_classify_symbol (rtx x, rtx offset) > case AARCH64_CMODEL_SMALL: > /* Same reasoning as the tiny code model, but the offset cap here is >4G. */ > - if (SYMBOL_REF_WEAK (x) > + if ((SYMBOL_REF_WEAK (x) > +&& !aarch64_symbol_binds_local_p (x)) > || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263), > HOST_WIDE_INT_C (4294967264))) > return SYMBOL_FORCE_TO_MEM; > diff --git a/gcc/testsuite/gcc.target/aarch64/pr63874.c > b/gcc/testsuite/gcc.target/aarch64/pr63874.c > new file mode 100644 > index 000..1a745a0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr63874.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-skip-if "Not applicable for mcmodel=large" { aarch64*-*-* } { > "-mcmodel=large" } { "" } } */ > + > +extern void __attribute__((weak)) foo_weakref (void); > +void __attribute__((weak, noinline)) bar (void) > +{ > + return; > +} > +void (*f) (void); > +void (*g) (void); > + > +int > +main (void) > +{ > + f = _weakref; > + g = > + return 0; > +} > + > +/* { dg-final { scan-assembler-not "adr*foo_weakref" } } */ > +/* { dg-final { scan-assembler-not "\\.(word|xword)\tbar" } } */ >
Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target
On Wednesday 29 June 2016 21:58:40 Jonathan Wakely wrote: > On 29/06/16 13:49 -0700, Mike Stump wrote: > >Please include the libstdc++ list, they don't all read the other list. > > And the documentation clearly says (in two places) that all libstdc++ > patches must go to the libstdc++ list. Oops my apologize, I did not think of libstdc++ has a separate component than GCC. I'll keep that in mind for next time. Committed now. Best regards, Thomas
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
I'm getting a failure on ./gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c with the latest patch. Unsure if it's related; we're not doing any pointer arithmetic there. I'll test it again on trunk without the patch and see what happens. -Manish On Thu, Jun 30, 2016 at 7:18 PM, Richard Bienerwrote: > On Thu, Jun 30, 2016 at 3:38 PM, Marc Glisse wrote: >> On Thu, 30 Jun 2016, Manish Goregaokar wrote: >> >>> Alright, the following patch was tested and it works >>> >>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c >>> index 3b9500d..0d82018 100644 >>> --- a/gcc/fold-const.c >>> +++ b/gcc/fold-const.c >>> @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code, >>> switch (code) >>> { >>> case POINTER_PLUS_EXPR: >>> + return flag_delete_null_pointer_checks >>> +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) >>> +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); >>> case PLUS_EXPR: >>> if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >>> { >> >> >> So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not >> sure if it is strictly legal in all languages, but I wouldn't be surprised >> if there was at least one language where optimizations could lead to such >> expressions. >> >> I think this is an exciting optimization, but I was always too scared to try >> anything like this. > > ;) > > points-to analysis already has the constraint that POINTER_PLUS_EXPR cannot > leave the object op0 points to. Of course currently nothing uses the > fact whether > points-to computes pointed-to as nothing (aka NULL) - so the argument > may be moot. > > Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR > handling should be clearly separate from PLUS_EXPR and that we have > flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid > object pointer (and thus you can do 4 + -4 and reach NULL). > > Richard. > >> -- >> Marc Glisse
[PATCH][RFC] Fix PR71632, remove parts of TER
The following patch fixes PR71632 by removing delayed expansion of TERed defs. Instead it adds code to apply the scheduling effect to the GIMPLE IL (so you also get better interleaved GIMPLE stmt / generated RTL dumps in .expand). This removes the quadratic re-expansion of TERed RHS as seen in the testcase. It affects initial RTL generation in following ways: 1) it may expand stmts with unused LHS 2) it may expand dead code when use stmts are expanded and those use get_gimple_for_ssa, expanding its operands (again). 3) it no longer automatically tries to combine with defs doing memory loads (in case we TERed the load and expansion of the RHS resulted in a MEM). 4) the depth-first, left-to-right "expansion" of ops might not be a 100% match of what expand currently does I expect that 1) and 2) are mostly a non-issue (and not worse than the effect seen in the PR). I also expect that 3) is "quickly" recovered by fwprop or combine. While I'm quite sure 3) exists I'm not 100% sure we never return a non-constant/reg for SSA name expansion (considering expand modifiers like EXPAND_SUM). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I think this is also an important step towards eventually doing a meaningful GIMPLE-level scheduling (for register pressure, also to remove this awkward code-placement code from reassoc for example). Without this TER would disrupt the result quite a bit (so the scheduling part of TER would be disabled when scheduling on GIMPLE). Comments? I'll throw this on one of our spec testers tonight any testing on non-x86 archs appreciated. Thanks, Richard. PS: avoid_deep_ter_for_debug could go away as well I think. 2016-06-30 Richard BienerPR middle-end/71632 * expr.c (expand_expr_real_1): Do not expand TERed SSA name def rhs. * cfgexpand.c (expand_gimple_basic_block): Expand defs of TERed SSA names. Move debug stmt use handling ... * tree-outof-ssa.c (move_uses_r): ... here. New helper for ... (rewrite_trees): ... code to apply the scheduling effect of TER to the GIMPLE IL. (remove_ssa_form): Move rewrite_trees call. * gcc.dg/torture/pr71632.c: New testcase. Index: gcc/expr.c === *** gcc/expr.c (revision 237873) --- gcc/expr.c (working copy) *** expand_expr_real_1 (tree exp, rtx target *** 9695,9704 LAST_VIRTUAL_REGISTER + 1); } ! g = get_gimple_for_ssa_name (exp); ! /* For EXPAND_INITIALIZER try harder to get something simpler. */ ! if (g == NULL ! && modifier == EXPAND_INITIALIZER && !SSA_NAME_IS_DEFAULT_DEF (exp) && (optimize || !SSA_NAME_VAR (exp) || DECL_IGNORED_P (SSA_NAME_VAR (exp))) --- 9677,9685 LAST_VIRTUAL_REGISTER + 1); } ! g = NULL; ! /* For EXPAND_INITIALIZER try to get something simpler. */ ! if (modifier == EXPAND_INITIALIZER && !SSA_NAME_IS_DEFAULT_DEF (exp) && (optimize || !SSA_NAME_VAR (exp) || DECL_IGNORED_P (SSA_NAME_VAR (exp))) Index: gcc/cfgexpand.c === *** gcc/cfgexpand.c (revision 237873) --- gcc/cfgexpand.c (working copy) *** expand_gimple_basic_block (basic_block b *** 5510,5611 basic_block new_bb; stmt = gsi_stmt (gsi); - - /* If this statement is a non-debug one, and we generate debug -insns, then this one might be the last real use of a TERed -SSA_NAME, but where there are still some debug uses further -down. Expanding the current SSA name in such further debug -uses by their RHS might lead to wrong debug info, as coalescing -might make the operands of such RHS be placed into the same -pseudo as something else. Like so: - a_1 = a_0 + 1; // Assume a_1 is TERed and a_0 is dead - use(a_1); - a_2 = ... -#DEBUG ... => a_1 -As a_0 and a_2 don't overlap in lifetime, assume they are coalesced. -If we now would expand a_1 by it's RHS (a_0 + 1) in the debug use, -the write to a_2 would actually have clobbered the place which -formerly held a_0. - -So, instead of that, we recognize the situation, and generate -debug temporaries at the last real use of TERed SSA names: - a_1 = a_0 + 1; -#DEBUG #D1 => a_1 - use(a_1); - a_2 = ... -#DEBUG ... => #D1 -*/ - if (MAY_HAVE_DEBUG_INSNS - && SA.values - && !is_gimple_debug (stmt)) - { - ssa_op_iter iter; - tree op; - gimple *def; - - location_t sloc = curr_insn_location (); - - /* Look for SSA names that have their
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
On Thu, Jun 30, 2016 at 3:38 PM, Marc Glissewrote: > On Thu, 30 Jun 2016, Manish Goregaokar wrote: > >> Alright, the following patch was tested and it works >> >> diff --git a/gcc/fold-const.c b/gcc/fold-const.c >> index 3b9500d..0d82018 100644 >> --- a/gcc/fold-const.c >> +++ b/gcc/fold-const.c >> @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code, >> switch (code) >> { >> case POINTER_PLUS_EXPR: >> + return flag_delete_null_pointer_checks >> +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) >> +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); >> case PLUS_EXPR: >> if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >> { > > > So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not > sure if it is strictly legal in all languages, but I wouldn't be surprised > if there was at least one language where optimizations could lead to such > expressions. > > I think this is an exciting optimization, but I was always too scared to try > anything like this. ;) points-to analysis already has the constraint that POINTER_PLUS_EXPR cannot leave the object op0 points to. Of course currently nothing uses the fact whether points-to computes pointed-to as nothing (aka NULL) - so the argument may be moot. Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR handling should be clearly separate from PLUS_EXPR and that we have flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid object pointer (and thus you can do 4 + -4 and reach NULL). Richard. > -- > Marc Glisse
Re: [PATCH PR70729] The second part of patch.
Hi Jacub, Thanks for your help. Could you please look at the following simple patch which cures regression - we need to nullify loop safelen field in adjust_simduid_builtins: diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2669813..f70380c 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab) gcc_assert (TREE_CODE (arg) == SSA_NAME); simduid_to_vf *p = NULL, data; data.simduid = DECL_UID (SSA_NAME_VAR (arg)); + /* Need to nullify safelen fielf of loop since it's vale is not +valid after transformation. */ + if (bb->loop_father && bb->loop_father->safelen > 0) + bb->loop_father->safelen = 0; if (htab) { p = htab->find (); Thanks. Yuri. 2016-06-30 15:21 GMT+03:00 Jakub Jelinek: > On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote: >> Hi Grüße. >> >> Could you please tell me how to reproduce your regression - did not >> see any new failures in my local area: >> >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 (test for excess errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 execution test >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 execution test >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions (test for >> excess errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution >> test >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess >> errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test >> PASS: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess errors) >> PASS: libgomp.fortran/examples-4/simd-2.f90 -Os execution test >> >> I used HASWELL machine for it since test requires target avx. > > It fails everywhere for me, without offloading configured, both 32-bit and > 64-bit. > $ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ > ../../../../libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90 > -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/ > -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs > -I/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp > -I../../../../libgomp/testsuite/../../include > -I../../../../libgomp/testsuite/.. -fmessage-length=0 > -fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -fopenmp > -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ >-O2 -msse2 -mavx > -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs > > -fintrinsic-modules-path=/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp >-L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs > -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ > > -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs > -lgfortran -lm -o ./simd-2.exe -Wl,-rpath,../.libs/ -march=haswell > $ ./simd-2.exe > > Program received signal SIGSEGV: Segmentation fault - invalid memory > reference. > > Backtrace for this error: > #0 0x7F7548CDBD58 > #1 0x7F7548CDAEE0 > #2 0x3405234A4F > #3 0x400849 in __simd2_mod_MOD_work > #4 0x4008D4 in MAIN__ at simd-2.f90:? > Segmentation fault (core dumped) > > With/without -march=haswell, -mtune=haswell etc. > > Jakub
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
On Thu, 30 Jun 2016, Manish Goregaokar wrote: Alright, the following patch was tested and it works diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 3b9500d..0d82018 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code, switch (code) { case POINTER_PLUS_EXPR: + return flag_delete_null_pointer_checks +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); case PLUS_EXPR: if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) { So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not sure if it is strictly legal in all languages, but I wouldn't be surprised if there was at least one language where optimizations could lead to such expressions. I think this is an exciting optimization, but I was always too scared to try anything like this. -- Marc Glisse
Re: [PATCH 2/2] gcc/genrecog: Don't warn for missing mode on special predicates
* Richard Sandiford[2016-06-15 19:07:56 +0100]: > Andrew Burgess writes: > > In md.texi it says: > > > > Predicates written with @code{define_special_predicate} do not get any > > automatic mode checks, and are treated as having special mode handling > > by @command{genrecog}. > > > > However, in genrecog, when validating a SET pattern, if either the > > source or destination is missing a mode then a warning is given, even if > > there's a predicate defined with define_special_predicate. > > > > This commit silences the warning for special predicates. > > > > gcc/ChangeLog: > > > > * genrecog.c (validate_pattern): Don't warn about missing mode for > > define_special_predicate predicates. > > Acked-by: Andrew Burgess > > --- > > gcc/ChangeLog | 5 + > > gcc/genrecog.c | 22 +++--- > > 2 files changed, 24 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/genrecog.c b/gcc/genrecog.c > > index a9f5a4a..7596552 100644 > > --- a/gcc/genrecog.c > > +++ b/gcc/genrecog.c > > @@ -674,9 +674,25 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx > > set, int set_code) > > && !CONST_WIDE_INT_P (src) > > && GET_CODE (src) != CALL) > > { > > - const char *which; > > - which = (dmode == VOIDmode ? "destination" : "source"); > > - message_at (info->loc, "warning: %s missing a mode?", which); > > + const char *which_msg; > > + rtx which; > > + const char *pred_name; > > + const struct pred_data *pred; > > + > > + which_msg = (dmode == VOIDmode ? "destination" : "source"); > > + which = (dmode == VOIDmode ? dest : src); > > + pred_name = XSTR (which, 1); > > + if (pred_name[0] != 0) > > + { > > + pred = lookup_predicate (pred_name); > > + if (!pred) > > + error_at (info->loc, "unknown predicate '%s'", pred_name); > > + } > > + else > > + pred = 0; > > + if (!pred || !pred->special) > > + message_at (info->loc, "warning: %s missing a mode?", > > + which_msg); > > There's no guarantee at this point that "which" is a match_operand. > Also, I think the earlier: > > /* The operands of a SET must have the same mode unless one > is VOIDmode. */ > else if (dmode != VOIDmode && smode != VOIDmode && dmode != smode) > error_at (info->loc, "mode mismatch in set: %smode vs %smode", > GET_MODE_NAME (dmode), GET_MODE_NAME (smode)); > > should be skipped for special predicates too. > > How about generalising: > > /* The mode of an ADDRESS_OPERAND is the mode of the memory > reference, not the mode of the address. */ > if (GET_CODE (src) == MATCH_OPERAND > && ! strcmp (XSTR (src, 1), "address_operand")) > ; > > to: > > if (special_predicate_operand_p (src) > || special_predicate_operand_p (dest)) > ; > > with a new special_predicate_operand_p helper? I don't think we should > duplicate the "unknown predicate" error here; the helper can just return > false for unknown predicates. Thanks for taking the time to review and provide feedback. Sorry it has take a while to get around to this patch again. I've updated the patch inline with your feedback. How's this? Thanks, Andrew --- gcc/genrecog: Don't warn for missing mode on special predicates In md.texi it says: Predicates written with @code{define_special_predicate} do not get any automatic mode checks, and are treated as having special mode handling by @command{genrecog}. In genrecog, when validating a SET pattern, there is already a special case for 'address_operand' which is a special predicate, however, other special predicates fall through to the code which checks for incorrect use of VOIDmode. This commit adds a new function for detecting special predicates, and then generalises the check in validate_pattern so that mode checking is skipped for all special predicates. gcc/ChangeLog: * genrecog.c (special_predicate_operand_p): New function. (predicate_name): Move function. (validate_pattern): Don't warn about missing mode for all define_special_predicate predicates. --- gcc/ChangeLog | 7 +++ gcc/genrecog.c | 50 +++--- 2 files changed, 42 insertions(+), 15 deletions(-) diff --git a/gcc/genrecog.c b/gcc/genrecog.c index a9f5a4a..7c56225 100644 --- a/gcc/genrecog.c +++ b/gcc/genrecog.c @@ -463,6 +463,38 @@ constraints_supported_in_insn_p (rtx insn) || GET_CODE (insn) == DEFINE_PEEPHOLE2); } +/* Return the name of the predicate matched by MATCH_RTX. */ + +static const char * +predicate_name (rtx match_rtx) +{ + if (GET_CODE (match_rtx) == MATCH_SCRATCH) +return "scratch_operand"; + else +return XSTR (match_rtx, 1);
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
Alright, the following patch was tested and it works diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 3b9500d..0d82018 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code, switch (code) { case POINTER_PLUS_EXPR: + return flag_delete_null_pointer_checks +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); case PLUS_EXPR: if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) { -- 2.8.3 -Manish On Thu, Jun 30, 2016 at 6:45 PM, Richard Bienerwrote: > On Thu, Jun 30, 2016 at 2:03 PM, Manish Goregaokar wrote: >> What about ptr + intptr_t? I guess we should check nonnegative for op1 then? > > op1 will always be nonnegative as it is forced to sizetype type (which > is unsigned). > > Richard. > >> -Manish >> >> >> On Thu, Jun 30, 2016 at 5:25 PM, Richard Biener >> wrote: >>> On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar >>> wrote: gcc/ChangeLog: PR c/71699 * fold-const.c (tree_binary_nonzero_warnv_p): Allow pointer addition to also be considered nonzero. --- gcc/fold-const.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 3b9500d..eda713e 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code, { case POINTER_PLUS_EXPR: case PLUS_EXPR: - if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + || POINTER_TYPE_P (type)) { + /* Pointers are always nonnegative, check integers. */ + if (ANY_INTEGRAL_TYPE_P (type)) + { + sub_strict_overflow_p = false; + if (!tree_expr_nonnegative_warnv_p (op0, + _strict_overflow_p) + || !tree_expr_nonnegative_warnv_p (op1, + _strict_overflow_p)) +return false; + } /* With the presence of negative values it is hard to say something. */ - sub_strict_overflow_p = false; - if (!tree_expr_nonnegative_warnv_p (op0, - _strict_overflow_p) - || !tree_expr_nonnegative_warnv_p (op1, - _strict_overflow_p)) -return false; + >>> >>> Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be >>> treated >>> as signed. >>> >>> POINTER_PLUS_EXPR is special in other ways in that iff >>> flag_delete_null_pointer_checks >>> is set we can assume that ptr + non-zero is never zero. Thus simply do >>> >>>case POINTER_PLUS_EXPR: >>> return flag_delete_null_pointer_checks >>>&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) >>> || tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); >>> >>> OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR >>> perspective as GCC does >>> not have a special tree code for pointer subtraction (but IIRC it uses >>> MINUS_EXPR on integers >>> for this). >>> >>> Richard. >>> >>> /* One of operands must be positive and the other non-negative. */ /* We don't set *STRICT_OVERFLOW_P here: even if this value overflows, on a twos-complement machine the sum of two -- 2.8.3
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
On Thu, Jun 30, 2016 at 2:03 PM, Manish Goregaokarwrote: > What about ptr + intptr_t? I guess we should check nonnegative for op1 then? op1 will always be nonnegative as it is forced to sizetype type (which is unsigned). Richard. > -Manish > > > On Thu, Jun 30, 2016 at 5:25 PM, Richard Biener > wrote: >> On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar >> wrote: >>> gcc/ChangeLog: >>> PR c/71699 >>> * fold-const.c (tree_binary_nonzero_warnv_p): Allow >>> pointer addition to also be considered nonzero. >>> --- >>> gcc/fold-const.c | 20 +--- >>> 1 file changed, 13 insertions(+), 7 deletions(-) >>> >>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c >>> index 3b9500d..eda713e 100644 >>> --- a/gcc/fold-const.c >>> +++ b/gcc/fold-const.c >>> @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code, >>> { >>> case POINTER_PLUS_EXPR: >>> case PLUS_EXPR: >>> - if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >>> + if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >>> + || POINTER_TYPE_P (type)) >>> { >>> + /* Pointers are always nonnegative, check integers. */ >>> + if (ANY_INTEGRAL_TYPE_P (type)) >>> + { >>> + sub_strict_overflow_p = false; >>> + if (!tree_expr_nonnegative_warnv_p (op0, >>> + _strict_overflow_p) >>> + || !tree_expr_nonnegative_warnv_p (op1, >>> + _strict_overflow_p)) >>> +return false; >>> + } >>>/* With the presence of negative values it is hard >>> to say something. */ >>> - sub_strict_overflow_p = false; >>> - if (!tree_expr_nonnegative_warnv_p (op0, >>> - _strict_overflow_p) >>> - || !tree_expr_nonnegative_warnv_p (op1, >>> - _strict_overflow_p)) >>> -return false; >>> + >> >> Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be >> treated >> as signed. >> >> POINTER_PLUS_EXPR is special in other ways in that iff >> flag_delete_null_pointer_checks >> is set we can assume that ptr + non-zero is never zero. Thus simply do >> >>case POINTER_PLUS_EXPR: >> return flag_delete_null_pointer_checks >>&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) >> || tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); >> >> OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR >> perspective as GCC does >> not have a special tree code for pointer subtraction (but IIRC it uses >> MINUS_EXPR on integers >> for this). >> >> Richard. >> >> >>>/* One of operands must be positive and the other non-negative. */ >>>/* We don't set *STRICT_OVERFLOW_P here: even if this value >>> overflows, on a twos-complement machine the sum of two >>> -- >>> 2.8.3
Re: Determine more IVs to be non-overflowing
Hi, i sent the email before, but it seems it never left my machine. Sorry for possible duplicates. This is updated version which does not overflow (by capping nit), but still using widest_int for the calculatoins. It seems more readable to do so than using wi:add/wi:mult/wi:lt and overflow checks, but i can definitly update the patch to do it, too. Bootstrapped/regtested x86_64-linux, OK? Honza * tree-scalar-evolution.h (iv_can_overflow_p): Declare. * tree-scalar-evolution.c (iv_can_overflow_p): New funcition. (simple_iv): Use it. * tree-ssa-loop-niter.c (nowrap_type_p): Use ANY_INTEGRAL_TYPE_P * gcc.dg/tree-ssa/scev-14.c: New testcase. Index: testsuite/gcc.dg/tree-ssa/scev-14.c === --- testsuite/gcc.dg/tree-ssa/scev-14.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/scev-14.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ +int a[100]; +void t(unsigned int n) +{ + unsigned int i; + for (i=0; iwidest_int::from (type_max, sgn) + || (widest_int::from (type_min, sgn) + > (widest_int::from (base_min, base_sgn) ++ widest_int::from (step_min, step_sgn) * (nit + 1 +return true; + return false; +} + /* Checks whether use of OP in USE_LOOP behaves as a simple affine iv with respect to WRTO_LOOP and returns its base and step in IV if possible (see analyze_scalar_evolution_in_loop for more details on USE_LOOP @@ -3375,8 +3430,12 @@ simple_iv (struct loop *wrto_loop, struc if (tree_contains_chrecs (iv->base, NULL)) return false; - iv->no_overflow = (!folded_casts && ANY_INTEGRAL_TYPE_P (type) -&& TYPE_OVERFLOW_UNDEFINED (type)); + iv->no_overflow = !folded_casts && nowrap_type_p (type); + + if (!iv->no_overflow + && !iv_can_overflow_p (wrto_loop, type, iv->base, iv->step)) +iv->no_overflow = true; + /* Try to simplify iv base: Index: tree-scalar-evolution.h === --- tree-scalar-evolution.h (revision 237856) +++ tree-scalar-evolution.h (working copy) @@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi extern bool expression_expensive_p (tree); extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv *, bool); +extern bool iv_can_overflow_p (struct loop *, tree, tree, tree); extern tree compute_overall_effect_of_inner_loop (struct loop *, tree); /* Returns the basic block preceding LOOP, or the CFG entry block when Index:
Re: [BUILDROBOT] Selftest failed for i686-wrs-vxworks
Jan-Benedict, I haven't given it any additional manual testing so far. It's pre-installation though. Maybe I'd just set WIND_BASE to some arbitrary value, just to make xgcc pass it's initial start-up test so that it can continue with self-testing? Or shall we set some value in gcc/Makefile.in for running the self-test? As I recall, WIND_BASE is expected to point at a vxworks install to pick up header files. It is an error not to have it set (silently skipping it leads to user confusion). If that's irrelevant for this testing environment, then setting it to something (probably just "", but safer might be "/These.are.not.the.dirs.you.are.looking.for") should be fine. nathan
Re: [PATCH PR70729] The second part of patch.
On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote: > Hi Grüße. > > Could you please tell me how to reproduce your regression - did not > see any new failures in my local area: > > PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 (test for excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for > excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution > test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -Os execution test > > I used HASWELL machine for it since test requires target avx. It fails everywhere for me, without offloading configured, both 32-bit and 64-bit. $ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ ../../../../libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90 -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/ -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs -I/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp -I../../../../libgomp/testsuite/../../include -I../../../../libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -fopenmp -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ -O2 -msse2 -mavx -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs -fintrinsic-modules-path=/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs -lgfortran -lm -o ./simd-2.exe -Wl,-rpath,../.libs/ -march=haswell $ ./simd-2.exe Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7F7548CDBD58 #1 0x7F7548CDAEE0 #2 0x3405234A4F #3 0x400849 in __simd2_mod_MOD_work #4 0x4008D4 in MAIN__ at simd-2.f90:? Segmentation fault (core dumped) With/without -march=haswell, -mtune=haswell etc. Jakub
Re: [PATCH PR70729] The second part of patch.
Hi Grüße. Could you please tell me how to reproduce your regression - did not see any new failures in my local area: PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -O0 execution test PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 execution test PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test PASS: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess errors) PASS: libgomp.fortran/examples-4/simd-2.f90 -Os execution test I used HASWELL machine for it since test requires target avx. 2016-06-29 19:19 GMT+03:00 Thomas Schwinge: > Hi! > > On Tue, 28 Jun 2016 18:50:37 +0300, Yuri Rumyantsev > wrote: >> Here is the second part of patch to improve loop invariant code motion >> for loop marked with pragma omp simd. >> >> Bootstrapping and regression testing did not show any new failures. > > That got committed in r237844. In my testing, that causes (or exposes?) > the following regressions (both with offloading enabled and disabled): > > PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess > errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -O1 execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess > errors) > -PASS: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test > +FAIL: libgomp.fortran/examples-4/simd-2.f90 -O2 execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > errors) > -PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > +FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess > errors) > -PASS: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test > +FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -g execution test > PASS: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess > errors) > PASS: libgomp.fortran/examples-4/simd-2.f90 -Os execution test > PASS: libgomp.fortran/examples-4/simd-3.f90 -O0 (test for excess > errors) > > These are segmentation faults, for example: > > Program received signal SIGSEGV: Segmentation fault - invalid memory > reference. > > Backtrace for this error: > #0 0x7fb65eeb52ef in ??? > #1 0x400845 in __simd2_mod_MOD_add2 > at > [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:18 > #2 0x400845 in __simd2_mod_MOD_work > at > [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:29 > #3 0x4008d4 in MAIN__ > at > [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:68 > #4 0x4005dc in main > at > [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:58 > Segmentation fault (core dumped) > >> PR tree-optimization/70729 >> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider memory reference as >> independent in loops having positive safelen value. >> * tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since >> it may be not valid after vectorization. > > > Grüße > Thomas
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
What about ptr + intptr_t? I guess we should check nonnegative for op1 then? -Manish On Thu, Jun 30, 2016 at 5:25 PM, Richard Bienerwrote: > On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar wrote: >> gcc/ChangeLog: >> PR c/71699 >> * fold-const.c (tree_binary_nonzero_warnv_p): Allow >> pointer addition to also be considered nonzero. >> --- >> gcc/fold-const.c | 20 +--- >> 1 file changed, 13 insertions(+), 7 deletions(-) >> >> diff --git a/gcc/fold-const.c b/gcc/fold-const.c >> index 3b9500d..eda713e 100644 >> --- a/gcc/fold-const.c >> +++ b/gcc/fold-const.c >> @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code, >> { >> case POINTER_PLUS_EXPR: >> case PLUS_EXPR: >> - if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >> + if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) >> + || POINTER_TYPE_P (type)) >> { >> + /* Pointers are always nonnegative, check integers. */ >> + if (ANY_INTEGRAL_TYPE_P (type)) >> + { >> + sub_strict_overflow_p = false; >> + if (!tree_expr_nonnegative_warnv_p (op0, >> + _strict_overflow_p) >> + || !tree_expr_nonnegative_warnv_p (op1, >> + _strict_overflow_p)) >> +return false; >> + } >>/* With the presence of negative values it is hard >> to say something. */ >> - sub_strict_overflow_p = false; >> - if (!tree_expr_nonnegative_warnv_p (op0, >> - _strict_overflow_p) >> - || !tree_expr_nonnegative_warnv_p (op1, >> - _strict_overflow_p)) >> -return false; >> + > > Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be > treated > as signed. > > POINTER_PLUS_EXPR is special in other ways in that iff > flag_delete_null_pointer_checks > is set we can assume that ptr + non-zero is never zero. Thus simply do > >case POINTER_PLUS_EXPR: > return flag_delete_null_pointer_checks >&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) > || tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); > > OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR > perspective as GCC does > not have a special tree code for pointer subtraction (but IIRC it uses > MINUS_EXPR on integers > for this). > > Richard. > > >>/* One of operands must be positive and the other non-negative. */ >>/* We don't set *STRICT_OVERFLOW_P here: even if this value >> overflows, on a twos-complement machine the sum of two >> -- >> 2.8.3
Re: RFC: pass to warn on questionable uses of alloca().
On 06/18/2016 07:55 PM, Martin Sebor wrote: On 06/16/2016 02:32 AM, Aldy Hernandez wrote: p.s. The pass currently warns on all uses of VLAs. I'm not completely sold on this idea, so perhaps we could remove it, or gate it with a flag. I also think that VLA diagnostics would be better controlled by a separate option, and emit a different diagnostic (one that mentions VLA rather than alloca). Although again, and for VLAs even more so than for alloca, providing an option to have GCC use dynamic allocation, would be an even more robust solution than issuing warnings. IIRC, this was the early implementation of VLAs in GCC so there is a precedent for it. (Though this seems complementary to the warnings.) In addition, I'm of the opinion that potentially unbounded VLA allocation should be checked at runtime and made trap on size overflow in C and throw an exception in C++ (e.g., when int a [A][B] when A * B * sizeof (int) exceeds SIZE_MAX / 2 or some runtime-configurable limit). My C++ patch for bug 69517 does just that (it needs to be resubmitted with the runtime configuration limit added). For a choice of VLA warning options, there already is -Wvla, though it simply points out every VLA definition regardless of its size. It also issues a diagnostic that's questionable in C99 and later modes (that "ISO C90 forbids variable length array" is only relevant in C90; what matters when -Wvla is specified in C99 and C11 modes is that a VLA has been seen). To keep things consistent with -Walloca, perhaps -Wvla could be extended to take an optional argument to specify the VLA threshold. (So that -Wvla=4096 would only diagnose VLA definitions of 4k bytes or more, or unknown size.) Hmmm...I kinda wanted a sane default for -Walloca, and keeping -Walloc and -Wvla consistent would mean I don't get that. Currently, -Walloca would mean "warn on unbounded uses of alloca where n > DEFAULT", whereas -Walloca=0 would mean "warn on all uses of alloca". The problem is that -Wvla means "warn on ALL uses". One consistent option would be to change the definition to: -Walloca: warn on all uses of alloca (strict mode). -Walloca=N: warn on unbounded uses of alloca or bounded uses of n > N. -Wvla: warn on all uses of VLAs (as currently implemented). -Wvla=N: warn on unbounded uses of VLA or bounded uses of n > N. This means we get no defaults, and the user must make the limit explicit, but at least we're consistent with the current use of -Wvla. How about we just use -Walloca (and no separate variants for -Wvla=??), but we document that -Walloca will also flag VLAs (and explain why). Also, we make sure the error messages say "variable-length array" not "alloca" in the VLA case. This way we can have a default, and perhaps a more consistent flag: -Walloca: warn on all unbounded uses of alloca/vlas and bounded uses where n > DEFAULT. -Walloca=0: warn on all uses of alloca/vlas (strict mode). -Walloca=N: warn on all unbounded uses of alloca/vlas and bounded uses where n > N. -Wla: untouched; as is currently implemented. Would this be acceptable, or are y'all really against one -Walloca flag to rule it all? Aldy
Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost
On 06/21/2016 05:50 PM, James Greenhalgh wrote: For the default implementation, if the parameters are not set, I just multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short of ideas on how best to form the default implementation. Yeah, this does seem kind of arbitrary. It looks especialy odd considering that BRANCH_COST is likely to already vary between the size/speed cases. What's wrong with just multiplying through by CNI(1)? I'm not sure we want params for this; targets should just eventually upgrade their cost models. The new default causes some changes in generated conditional move sequences for x86_64. Whether these changes are for the better or not I can't say. How about arm/aarch64? I think some benchmark results might be good to have. Bernhard already pointed out some issues with the patch; I'll omit these. +(max_noce_ifcvt_seq_cost, + "This hook should return a value in the same units as\n\ +@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\ +a sequence generated by the RTL if-conversion pass when conditional\n\ +execution is not available. There's still the issue that we're also replacing instructions when doing if-conversion. Let's say in this case, /* Convert "if (test) x = a; else x = b", for A and B constant. Also allow A = y + c1, B = y + c2, with a common y between A and B. */ we're removing two assignments for the purposes of optimizing for size, and one assignment when considering optimization for speed. This needs to factor into the cost calculations somehow if we want to do it properly. I think we can leave the hook as-is, but maybe add documentation to the effect of "The caller should increase the limit by the cost of whatever instructions are removed in the transformation." +/* Default implementation of TARGET_RTX_BRANCH_COST. */ Wrong name for the hook. + +unsigned int +default_max_noce_ifcvt_seq_cost (bool speed_p, edge e) +{ + bool predictable_p = predictable_edge_p (e); + /* For size, some targets like to set a BRANCH_COST of zero to disable + ifcvt, continue to allow that. Then multiply through by + COSTS_N_INSNS (1) so we're in a comparable base. */ + + if (!speed_p) +return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1); Blank line before the comment would be more readable. + enum compiler_param param = predictable_p + ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST + : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST; When splitting expressions across multiple lines, wrap in parens so that emacs formats them automatically. Bernd
Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokarwrote: > gcc/ChangeLog: > PR c/71699 > * fold-const.c (tree_binary_nonzero_warnv_p): Allow > pointer addition to also be considered nonzero. > --- > gcc/fold-const.c | 20 +--- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > index 3b9500d..eda713e 100644 > --- a/gcc/fold-const.c > +++ b/gcc/fold-const.c > @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code, > { > case POINTER_PLUS_EXPR: > case PLUS_EXPR: > - if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) > + if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) > + || POINTER_TYPE_P (type)) > { > + /* Pointers are always nonnegative, check integers. */ > + if (ANY_INTEGRAL_TYPE_P (type)) > + { > + sub_strict_overflow_p = false; > + if (!tree_expr_nonnegative_warnv_p (op0, > + _strict_overflow_p) > + || !tree_expr_nonnegative_warnv_p (op1, > + _strict_overflow_p)) > +return false; > + } >/* With the presence of negative values it is hard > to say something. */ > - sub_strict_overflow_p = false; > - if (!tree_expr_nonnegative_warnv_p (op0, > - _strict_overflow_p) > - || !tree_expr_nonnegative_warnv_p (op1, > - _strict_overflow_p)) > -return false; > + Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be treated as signed. POINTER_PLUS_EXPR is special in other ways in that iff flag_delete_null_pointer_checks is set we can assume that ptr + non-zero is never zero. Thus simply do case POINTER_PLUS_EXPR: return flag_delete_null_pointer_checks && (tree_expr_nonzero_warnv_p (op0, strict_overflow_p) || tree_expr_nonzero_warnv_p (op1, strict_overflow_p)); OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR perspective as GCC does not have a special tree code for pointer subtraction (but IIRC it uses MINUS_EXPR on integers for this). Richard. >/* One of operands must be positive and the other non-negative. */ >/* We don't set *STRICT_OVERFLOW_P here: even if this value > overflows, on a twos-complement machine the sum of two > -- > 2.8.3
[BUILDROBOT] Selftest failed for i686-wrs-vxworks
Hi Nathan! The recent self-testing fails for i686-wrs-vxworks: /home/jbglaw/build-configlist_mk/i686-wrs-vxworks/build-gcc/mk/i686-wrs-vxworks/./gcc/xgcc -B/home/jbglaw/build-configlist_mk/i686-wrs-vxworks/build-gcc/mk/i686-wrs-vxworks/./gcc/ -xc -S -c /dev/null -fself-test xgcc: fatal error: environment variable ‘WIND_BASE’ not defined compilation terminated. Makefile:1877: recipe for target 's-selftest' failed make[2]: *** [s-selftest] Error 1 That's straight out of config-list.mk, see eg. build http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=567929 I haven't given it any additional manual testing so far. It's pre-installation though. Maybe I'd just set WIND_BASE to some arbitrary value, just to make xgcc pass it's initial start-up test so that it can continue with self-testing? Or shall we set some value in gcc/Makefile.in for running the self-test? MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: 17:44 <@uschebit> Evangelist ist doch ein Vertriebler the second : für unverkäufliche Produkte, oder? (#korsett, 20120821) signature.asc Description: Digital signature
Re: [PATCH, www] Mention download_prerequisites in installation docs
Improved wording after a chat with `redi` on IRC. gcc/ChangeLog: * doc/install.texi: Mention download_prerequisites --- gcc/doc/install.texi | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 9248f0d..de43e36 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -351,7 +351,10 @@ versions may work in some cases, but it's safer to use the exact versions documented. We appreciate bug reports about problems with newer versions, though. If your OS vendor provides packages for the support libraries then using those packages may be the simplest way to -install the libraries. +install the libraries. Alternatively, running +@code{contrib/download_prerequisites} from the root directory of the source +tree will fetch these libraries and place them in the appropriate locations +for the build. @table @asis @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later) -- 2.8.3 -Manish On Thu, Jun 30, 2016 at 12:37 PM, Manish Goregaokarwrote: > 2016-06-30 Manish Goregaokar > > gcc/ChangeLog: > * doc/install.texi: Mention download_prerequisites > --- > gcc/doc/install.texi | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi > index 9248f0d..0070fdf 100644 > --- a/gcc/doc/install.texi > +++ b/gcc/doc/install.texi > @@ -351,7 +351,9 @@ versions may work in some cases, but it's safer to > use the exact > versions documented. We appreciate bug reports about problems with > newer versions, though. If your OS vendor provides packages for the > support libraries then using those packages may be the simplest way to > -install the libraries. > +install the libraries. Alternatively, running > +@code{contrib/download_prerequisites} from the repository root will fetch > +these libraries and place them in the appropriate locations for build. > > @table @asis > @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later) > -- > 2.8.3
[PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks
gcc/ChangeLog: PR c/71699 * fold-const.c (tree_binary_nonzero_warnv_p): Allow pointer addition to also be considered nonzero. --- gcc/fold-const.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 3b9500d..eda713e 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code, { case POINTER_PLUS_EXPR: case PLUS_EXPR: - if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + || POINTER_TYPE_P (type)) { + /* Pointers are always nonnegative, check integers. */ + if (ANY_INTEGRAL_TYPE_P (type)) + { + sub_strict_overflow_p = false; + if (!tree_expr_nonnegative_warnv_p (op0, + _strict_overflow_p) + || !tree_expr_nonnegative_warnv_p (op1, + _strict_overflow_p)) +return false; + } /* With the presence of negative values it is hard to say something. */ - sub_strict_overflow_p = false; - if (!tree_expr_nonnegative_warnv_p (op0, - _strict_overflow_p) - || !tree_expr_nonnegative_warnv_p (op1, - _strict_overflow_p)) -return false; + /* One of operands must be positive and the other non-negative. */ /* We don't set *STRICT_OVERFLOW_P here: even if this value overflows, on a twos-complement machine the sum of two -- 2.8.3
[PATCH] [ARC] Various small miscellaneous fixes.
Small patches batch. Ok to apply? Claudiu gcc/ 2016-05-09 Claudiu Zissulescu* config/arc/arc.c (arc_process_double_reg_moves): Change. * config/arc/arc.md (movsi_insn): Disable unsupported move instructions for ARCv2 cores. (movdi): Use prepare_move_operands. (movsf, movdf): Use move_dest_operand predicate. (arc_process_double_reg_moves): Change. * config/arc/constraints.md (Chs): Enable when barrel shifter is present. * config/arc/fpu.md (divsf3): Change. * config/arc/fpx.md (dexcl_3op_peep2_insn): Dx data register is also a destination. (dexcl_3op_peep2_insn_nores): Likewise. * config/arc/arc.h (SHIFT_COUNT_TRUNCATED): Define to one. (LINK_COMMAND_SPEC): Remove. --- gcc/config/arc/arc.c | 5 + gcc/config/arc/arc.h | 27 +++ gcc/config/arc/arc.md | 35 +++ gcc/config/arc/constraints.md | 3 ++- gcc/config/arc/fpu.md | 4 +++- gcc/config/arc/fpx.md | 26 -- 6 files changed, 40 insertions(+), 60 deletions(-) diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c index 985df81..0830af3 100644 --- a/gcc/config/arc/arc.c +++ b/gcc/config/arc/arc.c @@ -9025,10 +9025,7 @@ arc_process_double_reg_moves (rtx *operands) rtx srcLow = simplify_gen_subreg (SImode, src, DFmode, TARGET_BIG_ENDIAN ? 4 : 0); - emit_insn (gen_rtx_UNSPEC_VOLATILE (Pmode, - gen_rtvec (3, dest, srcHigh, srcLow), - VUNSPEC_ARC_DEXCL_NORES)); - + emit_insn (gen_dexcl_2op (dest, srcHigh, srcLow)); } else gcc_unreachable (); diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index f1705d5..f8f195d 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -124,24 +124,6 @@ extern const char *arc_cpu_to_as (int argc, const char **argv); %{!marclinux*: %{pg|p|profile:-marclinux_prof;: -marclinux}} \ %{!z:-z max-page-size=0x2000 -z common-page-size=0x2000} \ %{shared:-shared}" -/* Like the standard LINK_COMMAND_SPEC, but add %G when building - a shared library with -nostdlib, so that the hidden functions of libgcc - will be incorporated. - N.B., we don't want a plain -lgcc, as this would lead to re-exporting - non-hidden functions, so we have to consider libgcc_s.so.* first, which in - turn should be wrapped with --as-needed. */ -#define LINK_COMMAND_SPEC "\ -%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\ -%(linker) %l " LINK_PIE_SPEC "%X %{o*} %{A} %{d} %{e*} %{m} %{N} %{n} %{r}\ -%{s} %{t} %{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}\ -%{static:} %{L*} %(mfwrap) %(link_libgcc) %o\ -%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1):\ - %:include(libgomp.spec)%(link_gomp)}\ -%(mflib)\ -%{fprofile-arcs|fprofile-generate|coverage:-lgcov}\ -%{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\ -%{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }}" - #else #define LINK_SPEC "%{mbig-endian:-EB} %{EB} %{EL}\ %{pg|p:-marcelf_prof;mA7|mARC700|mcpu=arc700|mcpu=ARC700: -marcelf}" @@ -1563,13 +1545,10 @@ extern int arc_return_address_regs[4]; /* Undo the effects of the movmem pattern presence on STORE_BY_PIECES_P . */ #define MOVE_RATIO(SPEED) ((SPEED) ? 15 : 3) -/* Define this to be nonzero if shift instructions ignore all but the low-order - few bits. Changed from 1 to 0 for rotate pattern testcases - (e.g. 20020226-1.c). This change truncates the upper 27 bits of a word - while rotating a word. Came to notice through a combine phase - optimization viz. a << (32-b) is equivalent to a << (-b). +/* Define this to be nonzero if shift instructions ignore all but the + low-order few bits. */ -#define SHIFT_COUNT_TRUNCATED 0 +#define SHIFT_COUNT_TRUNCATED 1 /* Value is 1 if truncating an integer of INPREC bits to OUTPREC bits is done just by pretending it is already truncated. */ diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index 5accf4a..c86fc02 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -699,9 +699,9 @@ ; the iscompact attribute allows the epilogue expander to know for which ; insns it should lengthen the return insn. ; N.B. operand 1 of alternative 7 expands into pcl,symbol@gotpc . -(define_insn "*movsi_insn" ; 0 1 23 4 5 6 7 8 9 10 11 1213 14 15 16 17 18 19 202122 23 2425 26 27 28 29 - [(set (match_operand:SI 0 "move_dest_operand" "=Rcq,Rcq#q,w, h, w,w, w, w, w, w,???w, ?w, w,Rcq#q, w,Rcq, S, Us<,RcqRck,!*x, r,!*Rsd,!*Rcd,r,Ucm, Usd,m,???m,VUsc,VUsc") - (match_operand:SI 1 "move_src_operand" " cL,
Re: [PATCH 0/9] remove some manual memory management
On 06/29/2016 02:26 PM, tbsaunde+...@tbsaunde.org wrote: patches individually bootstrapped and regtested on x86_64-linux-gnu, ok? I think these all look sensible. ChangeLogs ought to have slightly more information than "Adjust" in some cases, especially when you're changing function arguments. Bernd
[PATCH 1/2] [ARC] Update INSN_LENGTH_ALIGNMENT.
Update the INSN_LENGTH_ALIGNMENT macro to handle jump tables placed in program memory. Ok to apply? Claudiu gcc/ 2016-06-20 Claudiu Zissulescu* config/arc/arc.h (INSN_LENGTH_ALIGNMENT): Change. --- gcc/config/arc/arc.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index 3376ad8..d2adf4d 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -1519,10 +1519,10 @@ extern int arc_return_address_regs[4]; #define ASM_OUTPUT_BEFORE_CASE_LABEL(FILE, PREFIX, NUM, TABLE) \ ASM_OUTPUT_ALIGN ((FILE), ADDR_VEC_ALIGN (TABLE)); -#define INSN_LENGTH_ALIGNMENT(INSN) \ - ((JUMP_P (INSN) \ +#define INSN_LENGTH_ALIGNMENT(INSN) \ + ((JUMP_TABLE_DATA_P (INSN) \ && GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC \ -&& GET_MODE (PATTERN (INSN)) == QImode) \ +&& GET_MODE (PATTERN (INSN)) == QImode) \ ? 0 : length_unit_log) /* Define if operations between registers always perform the operation -- 1.9.1
[PATCH 2/2] [ARC] Fix mul32x16 patterns.
The mululw instructions are wrongly used in the patterns, fix them. Okt to apply? Claudiu gcc/ 2016-06-28 Claudiu Zissulescu* config/arc/arc.md (umul_600): Change. (umul64_600): Likewise. --- gcc/config/arc/arc.md | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index 2464a19..41b8eed 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -1937,13 +1937,11 @@ (const_int 0 (clobber (match_operand:SI 3 "acc1_operand" ""))] "TARGET_MULMAC_32BY16_SET" - "@mululw 0, %0, %1 -mululw 0, %0, %1 -mululw%? 0, %1, %0" + "mululw 0, %0, %1" [(set_attr "length" "4,4,8") - (set_attr "type" "mulmac_600, mulmac_600, mulmac_600") - (set_attr "predicable" "no, no, yes") - (set_attr "cond" "nocond, canuse_limm, canuse")]) + (set_attr "type" "mulmac_600") + (set_attr "predicable" "no") + (set_attr "cond" "nocond")]) (define_insn "mac_600" [(set (match_operand:SI 2 "acc2_operand" "") @@ -2372,13 +2370,11 @@ (const_int 0 ] "TARGET_MULMAC_32BY16_SET" - "@mululw 0, %0, %1 -mululw 0, %0, %1 -mululw%? 0, %1, %0" + "mululw 0, %0, %1" [(set_attr "length" "4,4,8") (set_attr "type" "mulmac_600") - (set_attr "predicable" "no,no,yes") - (set_attr "cond" "nocond, canuse_limm, canuse")]) + (set_attr "predicable" "no") + (set_attr "cond" "nocond")]) (define_insn "umac64_600" -- 1.9.1
Re: [Patch AArch64] Fix PR target/63874
On Thu, May 5, 2016 at 8:45 AM, Ramana Radhakrishnanwrote: > On Thu, Mar 31, 2016 at 2:11 PM, Ramana Radhakrishnan > wrote: >> Hi, >> >> In this PR we have a situation where we aren't really detecting >> weak references vs weak definitions. If one has a weak definition >> that binds locally there's no reason not to put out PC relative >> relocations. >> >> However if you have a genuine weak reference that is >> known not to bind locally it makes very little sense >> to put out an entry into the literal pool which doesn't always >> work with DSOs and shared objects. >> >> Tested aarch64-none-linux-gnu bootstrap and regression test with no >> regressions >> >> This is not a regression and given what we've seen recently with protected >> symbols and binds_locally_p I'd rather this were queued for GCC 7. >> >> Ok ? > > Ping ^ 2. > > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01680.html Ping ^3 https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01680.html Ramana > > regards > Ramana >> >> regards >> Ramana >> >> gcc/ >> >> * config/aarch64/aarch64.c (aarch64_classify_symbol): Typo in comment fixed. >> Only force to memory if it is a weak external reference. >> >> >> gcc/testsuite >> >> * gcc.target/aarch64/pr63874.c: New test.
Re: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets
* Claudiu Zissulescu[2016-06-30 07:36:32 +]: > I think the compactsi makes no sense for ARCv2 without the add_s > reg, reg,PCL instruction. Moreover, I have proposed a patch about > this issue months ago, See: > https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01462.html. However, > our maintainer is missing in action. OK, thanks for the link, I don't see a need for my patch then. Lets hope we get some of your outstanding patches reviewed soon. Thanks, Andrew
Fix PR fortran/71688
Hi, PR 71688 is about an ICE in cgraphunit.c caused by the fact that Fortran FE creates two separate call-graph nodes for a single function decl, if you are interested, complete backtraces leading to the point of creating them are in bugzilla. The intuitive fix, changing one of these points so that they call cgraph::get_create rather than cgraph_node::create works and given the comment just before the line also seems like the correct thing to do: /* Register this function with cgraph just far enough to get it added to our parent's nested function list. If there are static coarrays in this function, the nested _caf_init function has already called cgraph_create_node, which also created the cgraph node for this function. */ It is interesting that the bug lurked so long there. I have bootstrapped and tested the patch below on x86_64-linux, is it OK for trunk and (after a while) for all active release branches? Thanks, Martin 2016-06-29 Martin JamborPR fortran/71688 * trans-decl.c (gfc_generate_function_code): Use get_create rather than create to get a call graph node. testsuite/ gfortran.dg/pr71688.f90: New test. diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index 2f5e434..0e68736 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -6336,7 +6336,7 @@ gfc_generate_function_code (gfc_namespace * ns) function has already called cgraph_create_node, which also created the cgraph node for this function. */ if (!has_coarray_vars || flag_coarray != GFC_FCOARRAY_LIB) - (void) cgraph_node::create (fndecl); + (void) cgraph_node::get_create (fndecl); } else cgraph_node::finalize_function (fndecl, true); diff --git a/gcc/testsuite/gfortran.dg/pr71688.f90 b/gcc/testsuite/gfortran.dg/pr71688.f90 new file mode 100644 index 000..dbb6d18 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr71688.f90 @@ -0,0 +1,13 @@ +! { dg-do compile } +! { dg-options "-fcoarray=lib" } + +program p + call s +contains + subroutine s + real :: x[*] = 1 + block + end block + x = 2 + end +end
Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)
On 06/29/2016 01:00 PM, Richard Biener wrote: > Ok. Might want to backport the inline_call hunk to all affected branches. > > Richard. Done in GCC-6 branch. Older branches are not affected.
Re: [PATCH][AArch64] Canonicalize Cortex core tunings
On 29/06/16 18:03, Wilco Dijkstra wrote: > This patch sets the branch cost to the same most optimal setting for all > Cortex cores, > reducing codesize and improving performance due to using more CSEL > instructions. > Set the autoprefetcher model in Cortex-A72 to weak like the others. Enable > AES fusion > in Cortex-A35. As a result generated code is now more similar as well as > more optimal > across Cortex cores. > > Regress passes, OK for commit? > OK. R. > ChangeLog: > 2016-06-29 Wilco Dijkstra> > * config/aarch64/aarch64.c (cortexa35_tunings): > Enable AES fusion. Use cortexa57_branch_cost. > (cortexa53_tunings): Use cortexa57_branch_cost. > (cortexa72_tunings): Use cortexa57_branch_cost. > Use AUTOPREFETCHER_WEAK. > (cortexa73_tunings): Use cortexa57_branch_cost. > --- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 7617f9fb273d17653df95eaf22b3cbeae3862230..5a20e175e72b2848499f07fcdf1856b3a7817e49 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -448,11 +448,11 @@ static const struct tune_params cortexa35_tunings = >_addrcost_table, >_regmove_cost, >_vector_cost, > - _branch_cost, > + _branch_cost, >_approx_modes, >4, /* memmov_cost */ >1, /* issue_rate */ > - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD > + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD > | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ >8, /* function_align. */ >8, /* jump_align. */ > @@ -474,7 +474,7 @@ static const struct tune_params cortexa53_tunings = >_addrcost_table, >_regmove_cost, >_vector_cost, > - _branch_cost, > + _branch_cost, >_approx_modes, >4, /* memmov_cost */ >2, /* issue_rate */ > @@ -526,7 +526,7 @@ static const struct tune_params cortexa72_tunings = >_addrcost_table, >_regmove_cost, >_vector_cost, > - _branch_cost, > + _branch_cost, >_approx_modes, >4, /* memmov_cost */ >3, /* issue_rate */ > @@ -542,7 +542,7 @@ static const struct tune_params cortexa72_tunings = >2, /* min_div_recip_mul_df. */ >0, /* max_case_values. */ >0, /* cache_line_size. */ > - tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ >(AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ > }; > > @@ -552,7 +552,7 @@ static const struct tune_params cortexa73_tunings = >_addrcost_table, >_regmove_cost, >_vector_cost, > - _branch_cost, > + _branch_cost, >_approx_modes, >4, /* memmov_cost. */ >2, /* issue_rate. */ >
Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant
On 28/06/16 08:54, Richard Biener wrote: On Thu, 16 Jun 2016, Kyrill Tkachov wrote: On 15/06/16 22:53, Marc Glisse wrote: On Wed, 15 Jun 2016, Kyrill Tkachov wrote: This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following feedback. I've changed the code to cast the operand to an unsigned type before applying the multiplication algorithm and cast it back to the signed type at the end. Whether to perform the cast is now determined by the function cast_mult_synth_to_unsigned in which I've implemented the cases that Marc mentioned in [1]. Please do let me know if there are any other cases that need to be handled. Ah, I never meant those cases as an exhaustive list, I was just looking for examples showing that the transformation was unsafe, and those 2 came to mind: - x*15 -> x*16-x the second one can overflow even when the first one doesn't. - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's redundant with the negate_variant check?) On the other hand, as long as we remain in the 'positive' operations, turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2 cannot cause spurious overflows. But I didn't look at the algorithm closely enough to characterize the safe cases. Now if you have done it, that's good :-) Otherwise, we might want to err on the side of caution. I'll be honest, I didn't give it much thought beyond convincing myself that the two cases you listed are legitimate. Looking at expand_mult_const in expmed.c can be helpful (where it updates val_so_far for checking purposes) to see the different algorithm cases. I think the only steps that could cause overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final step is negate_variant, which are what you listed (and covered in this patch). richi is away on PTO for the time being though, so we have some time to convince ourselves :) ;) I think the easiest way would be to always use unsigned arithmetic. While VRP doesn't do anything on vector operations we still have some match.pd patterns that rely on correct overflow behavior and those may be enabled for vector types as well. That's fine with me. Here's the patch that performs the casts to unsigned and back when the input type doesn't wrap on overflow. Bootstrapped and tested on arm, aarch64, x86_64. How's this? Thanks, Kyrill 2016-06-30 Kyrylo TkachovPR target/65951 * tree-vect-patterns.c: Include mult-synthesis.h (target_supports_mult_synth_alg): New function. (apply_binop_and_append_stmt): Likewise. (vect_synth_mult_by_constant): Likewise. (target_has_vecop_for_code): Likewise. (vect_recog_mult_pattern): Use above functions to synthesize vector multiplication by integer constants. 2016-06-30 Kyrylo Tkachov * gcc.dg/vect/vect-mult-const-pattern-1.c: New test. * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c new file mode 100644 index ..e5dba82d7fa955a6a37a0eabf980127e464ac77b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c @@ -0,0 +1,41 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ + +#include +#include "tree-vect.h" + +#define N 256 + +__attribute__ ((noinline)) void +foo (long long *arr) +{ + for (int i = 0; i < N; i++) +arr[i] *= 123; +} + +int +main (void) +{ + check_vect (); + long long data[N]; + int i; + + for (i = 0; i < N; i++) +{ + data[i] = i; + __asm__ volatile (""); +} + + foo (data); + for (i = 0; i < N; i++) +{ + if (data[i] / 123 != i) + __builtin_abort (); + __asm__ volatile (""); +} + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c new file mode 100644 index ..83019c96910b866e364a7c2e00261a1ded13cb53 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c @@ -0,0 +1,41 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ + +#include +#include "tree-vect.h" + +#define N 256 + +__attribute__ ((noinline)) void +foo (long long *arr) +{ + for (int i = 0; i < N; i++) +arr[i] *= -19594LL; +} + +int +main (void) +{ + check_vect (); + long long data[N]; + int i; + + for (i = 0; i < N; i++) +{ + data[i] = i; + __asm__ volatile (""); +} + + foo (data); + for (i = 0; i < N; i++) +{ + if (data[i] / -19594LL != i) + __builtin_abort (); +
Re: [patch] Get rid of stack trampolines for nested functions
On Thu, Jun 30, 2016 at 12:08 AM, Eric Botcazouwrote: > Hi, > > this patch implements generic support for the elimination of stack trampolines > and, consequently, of the need to make the stack executable when pointers to > nested functions are used. That's done on a per-language and per-target basis > (i.e. there is 1 language hook and 1 target hook to parameterize it) and there > are no changes whatsoever in code generation if both are not turned on (and > the patch implements a -ftrampolines option to let the user override them). > > The idea is based on the fact that, for targets using function descriptors as > per their ABI like IA-64, AIX or VMS platforms, stack trampolines "degenerate" > into descriptors built at run time on the stack and thus made up of data only, > which in turn means that the stack doesn't need to be made executable. As you are not able to handle all trampolines that way I always wondered if we can allocate space for the trampolines statically for those cases we can compute do not happen in a recursive path (or avoid them if we do not need a static chain anyway, not sure if we optimize for this case currently). Richard. > This descriptor-based scheme is implemented generically for nested functions, > i.e. the nested function lowering pass builds generic descriptors instead of > trampolines on the stack when encountering pointers to nested functions, which > means that there are 2 kinds of pointers to functions and therefore a run-time > identification mechanism is needed for indirect calls to distinguish them. > > Because of that, enabling the support breaks binary compatibility (for code > manipulating pointers to nested functions). That's OK for Ada and nested > functions are first-class citizens in the language anyway so we really need > this, but not for C so for example Ada doesn't use it at the interface with C > (when objects have "convention C" in Ada parlance). > > This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been > using it on native platforms (Linux, Windows, Solaris, etc) for years. > > OK for the mainline? > > > 2016-06-29 Eric Botcazou > > PR ada/37139 > PR ada/67205 > * common.opt (-ftrampolines): New option. > * doc/invoke.texi (Code Gen Options): Document it. > * doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS > * doc/tm.texi: Regenerate. > * builtins.def: Add init_descriptor and adjust_descriptor. > * builtins.c (expand_builtin_init_trampoline): Do not issue a warning > on platforms with descriptors. > (expand_builtin_init_descriptor): New function. > (expand_builtin_adjust_descriptor): Likewise. > (expand_builtin) : New case. > : Likewise. > * calls.c (prepare_call_address): Remove SIBCALLP parameter and add > FLAGS parameter. Deal with indirect calls by descriptor and adjust. > Set STATIC_CHAIN_REG_P on the static chain register, if any. > (call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor. > (expand_call): Likewise. Move around call to prepare_call_address > and pass all flags to it. > * cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR. > * gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value. > (gimple_call_set_by_descriptor): New setter. > (gimple_call_by_descriptor_p): New getter. > * gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR. > (gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR. > * langhooks.h (struct lang_hooks): Add custom_function_descriptors. > * langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define. > (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS. > * rtl.h (STATIC_CHAIN_REG_P): New macro. > * rtlanal.c (find_first_parameter_load): Skip static chain registers. > * target.def (custom_function_descriptors): New POD hook. > * tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR. > (CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR. > * tree-core.h (ECF_BY_DESCRIPTOR): New mask. > Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR. > * tree.c (make_node_stat) : Set function alignment to > DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY. > (build_common_builtin_nodes): Initialize init_descriptor and > adjust_descriptor. > * tree-nested.c: Include target.h. > (struct nesting_info): Add 'any_descr_created' field. > (get_descriptor_type): New function. > (lookup_element_for_decl): New function extracted from... > (create_field_for_decl): Likewise. > (lookup_tramp_for_decl): ...here. Adjust. > (lookup_descr_for_decl): New function. > (convert_tramp_reference_op): Deal with
RE: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets
I think the compactsi makes no sense for ARCv2 without the add_s reg, reg,PCL instruction. Moreover, I have proposed a patch about this issue months ago, See: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01462.html. However, our maintainer is missing in action. Cheers, Claudiu > -Original Message- > From: Andrew Burgess [mailto:andrew.burg...@embecosm.com] > Sent: Thursday, June 30, 2016 1:43 AM > To: gcc-patches@gcc.gnu.org; g...@amylaar.uk > Cc: claudiu.zissule...@synopsys.com; Andrew Burgess >> Subject: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets > > The ARCv2 targets don't support add_s REG,REG,pcl like earlier ARC > targets did. This instruction format is used when generating case jump > tables. > > This commit updates the code so that ARCv2 targets avoid the unsupported > instruction. This does mean the code is slightly longer on ARCv2, so > the instruction length has changed accordingly. > > gcc/ChangeLog: > > * config/arc/arc.md (casesi_compact_jump): Avoid generating add_s > REG,REG,pcl on ARCv2 targets. > > gcc/testsuite/ChangeLog: > > * gcc.target/arc/switch-1.c: New file. > --- > gcc/ChangeLog | 5 > gcc/config/arc/arc.md | 50 > ++--- > gcc/testsuite/ChangeLog | 4 +++ > gcc/testsuite/gcc.target/arc/switch-1.c | 42 > +++ > 4 files changed, 79 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arc/switch-1.c > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index 1102c53..0f39d49 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -3950,6 +3950,7 @@ >int unalign = arc_get_unalign (); >rtx xop[3]; >const char *s; > + int vec_offset = TARGET_V2 ? 12 : 10; > >xop[0] = operands[0]; >xop[2] = operands[2]; > @@ -3962,11 +3963,11 @@ >2 of these are for alignment, and are anticipated in the length >of the ADDR_DIFF_VEC. */ >if (unalign && !satisfies_constraint_Rcq (xop[0])) > - s = \"add2 %2,pcl,%0\n\tld_s %2,[%2,12]\"; > + s = \"add2 %2,pcl,%0\n\tld_s %2,[%2,12]\"; /* Length = 6 bytes. */ >else if (unalign) > - s = \"add_s %2,%0,2\n\tld.as %2,[pcl,%2]\"; > + s = \"add_s %2,%0,2\n\tld.as %2,[pcl,%2]\"; /* Length = 6 bytes. */ >else > - s = \"add %2,%0,2\n\tld.as %2,[pcl,%2]\"; > + s = \"add %2,%0,2\n\tld.as %2,[pcl,%2]\";/* Length = 8 bytes. */ >arc_clear_unalign (); >break; > case HImode: > @@ -3974,26 +3975,26 @@ > { > if (satisfies_constraint_Rcq (xop[0])) > { > - s = \"add_s %2,%0,%1\n\tld%_.as %2,[pcl,%2]\"; > - xop[1] = GEN_INT ((10 - unalign) / 2U); > + s = \"add_s %2,%0,%1\n\tld%_.as %2,[pcl,%2]\"; /* Length = 6 > bytes. */ > + xop[1] = GEN_INT ((vec_offset - unalign) / 2U); > } > else > { > - s = \"add1 %2,pcl,%0\n\tld%__s %2,[%2,%1]\"; > - xop[1] = GEN_INT (10 + unalign); > + s = \"add1 %2,pcl,%0\n\tld%__s %2,[%2,%1]\";/* Length = 6 > bytes. */ > + xop[1] = GEN_INT (vec_offset + unalign); > } > } >else > { > if (satisfies_constraint_Rcq (xop[0])) > { > - s = \"add_s %2,%0,%1\n\tld%_.x.as %2,[pcl,%2]\"; > - xop[1] = GEN_INT ((10 - unalign) / 2U); > + s = \"add_s %2,%0,%1\n\tld%_.x.as %2,[pcl,%2]\"; /* Length = 6 > bytes. */ > + xop[1] = GEN_INT ((vec_offset - unalign) / 2U); > } > else > { > - s = \"add1 %2,pcl,%0\n\tld%__s.x %2,[%2,%1]\"; > - xop[1] = GEN_INT (10 + unalign); > + s = \"add1 %2,pcl,%0\n\tld%__s.x %2,[%2,%1]\";/* Length = 6 > bytes. */ > + xop[1] = GEN_INT (vec_offset + unalign); > } > } >arc_toggle_unalign (); > @@ -4001,17 +4002,18 @@ > case QImode: >if (ADDR_DIFF_VEC_FLAGS (diff_vec).offset_unsigned) > { > - if ((rtx_equal_p (xop[2], xop[0]) > -|| find_reg_note (insn, REG_DEAD, xop[0])) > + if (!TARGET_V2 > + && (rtx_equal_p (xop[2], xop[0]) > + || find_reg_note (insn, REG_DEAD, xop[0])) > && satisfies_constraint_Rcq (xop[0])) > { > - s = \"add_s %0,%0,pcl\n\tldb_s %2,[%0,%1]\"; > - xop[1] = GEN_INT (8 + unalign); > + s = \"add %0,%0,pcl\n\tldb_s %2,[%0,%1]\";/* Length = 6 > bytes. */ > + xop[1] = GEN_INT ((vec_offset - 2) + unalign); > } > else > { > - s = \"add %2,%0,pcl\n\tldb_s %2,[%2,%1]\"; > - xop[1] = GEN_INT (10 + unalign); > + s = \"add %2,%0,pcl\n\tldb_s %2,[%2,%1]\";/* Length = 6 > bytes. */ > + xop[1] = GEN_INT (vec_offset + unalign); >
Re: [PATCH] Fix fold_binary RROTATE_EXPR opts (PR middle-end/71693)
On Wed, 29 Jun 2016, Jakub Jelinek wrote: > Hi! > > This issue is latent on the trunk/6.2, fails with checking in 5.x/4.9. > As usually in fold_*_loc, arg0/arg1 is STRIP_NOPS op0/op1, so might have > incompatible type, so we need to convert it to the right type first > before optimizing. > > Bootstrapped/regtested on x86_64-linux/i686-linux, ok everywhere? Ok. Thanks, Richard. > 2016-06-29 Jakub Jelinek> > PR middle-end/71693 > * fold-const.c (fold_binary_loc) : Cast > TREE_OPERAND (arg0, 0) and TREE_OPERAND (arg0, 1) to type > first when permuting bitwise operation with rotate. Cast > TREE_OPERAND (arg0, 0) to type when cancelling two rotations. > > * gcc.c-torture/compile/pr71693.c: New test. > > --- gcc/fold-const.c.jj 2016-06-14 12:17:20.530282919 +0200 > +++ gcc/fold-const.c 2016-06-29 16:39:15.175562715 +0200 > @@ -10294,11 +10294,15 @@ fold_binary_loc (location_t loc, > || TREE_CODE (arg0) == BIT_IOR_EXPR > || TREE_CODE (arg0) == BIT_XOR_EXPR) > && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST) > - return fold_build2_loc (loc, TREE_CODE (arg0), type, > - fold_build2_loc (loc, code, type, > - TREE_OPERAND (arg0, 0), arg1), > - fold_build2_loc (loc, code, type, > - TREE_OPERAND (arg0, 1), arg1)); > + { > + tree arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); > + tree arg01 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1)); > + return fold_build2_loc (loc, TREE_CODE (arg0), type, > + fold_build2_loc (loc, code, type, > +arg00, arg1), > + fold_build2_loc (loc, code, type, > +arg01, arg1)); > + } > >/* Two consecutive rotates adding up to the some integer >multiple of the precision of the type can be ignored. */ > @@ -10307,7 +10311,7 @@ fold_binary_loc (location_t loc, > && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST > && wi::umod_trunc (wi::add (arg1, TREE_OPERAND (arg0, 1)), >prec) == 0) > - return TREE_OPERAND (arg0, 0); > + return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); > >return NULL_TREE; > > --- gcc/testsuite/gcc.c-torture/compile/pr71693.c.jj 2016-06-29 > 16:39:15.175562715 +0200 > +++ gcc/testsuite/gcc.c-torture/compile/pr71693.c 2016-06-29 > 16:39:15.175562715 +0200 > @@ -0,0 +1,10 @@ > +/* PR middle-end/71693 */ > + > +unsigned short v; > + > +void > +foo (int x) > +{ > + v = unsigned short) (0x0001 | (x & 0x0070) | 0x0100) & 0x00ffU) << 8) > + | (((unsigned short) (0x0001 | (x & 0x0070) | 0x0100) >> 8) & > 0x00ffU)); > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH, www] Mention download_prerequisites in installation docs
2016-06-30 Manish Goregaokargcc/ChangeLog: * doc/install.texi: Mention download_prerequisites --- gcc/doc/install.texi | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 9248f0d..0070fdf 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -351,7 +351,9 @@ versions may work in some cases, but it's safer to use the exact versions documented. We appreciate bug reports about problems with newer versions, though. If your OS vendor provides packages for the support libraries then using those packages may be the simplest way to -install the libraries. +install the libraries. Alternatively, running +@code{contrib/download_prerequisites} from the repository root will fetch +these libraries and place them in the appropriate locations for build. @table @asis @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later) -- 2.8.3
Fix fir PR71696 in Libiberty Demangler (6)
Hi, The attached patch fixes the stack overflow in the demangler due to cycles in the references of “remembered” mangled types (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71696). The methods demangle_signature and do_arg in cplus-dem.c allow to “remember” mangled type names that can later be referenced and will also be demangled. The method demangle_args demangles those types following any references. So, if there is a cycle in the referencing (or in the simplest case a self-reference), the method enters infinite recursion. The patch tracks the mangled types that are currently being demangled in a new variable called work->proctypevec. If a referenced type is currently being demangled, the demangling is marked as not successful. Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test case added to libiberty/testsuite/demangler-expected and checked PR71696 is resolved. Index: libiberty/ChangeLog === --- libiberty/ChangeLog (revision 237852) +++ libiberty/ChangeLog (working copy) @@ -1,3 +1,21 @@ +2016-06-30 Marcel Böhme+ + * cplus-dem.c: Prevent infinite recursion when there is a cycle + in the referencing of remembered mangled types. + (work_stuff): New stack to keep track of the remembered mangled + types that are currently being processed. + (push_processed_type): New method to push currently processed + remembered type onto the stack. + (pop_processed_type): New method to pop currently processed + remembered type from the stack. + (work_stuff_copy_to_from): Copy values of new variables. + (delete_non_B_K_work_stuff): Free stack memory. + (demangle_args): Push/Pop currently processed remembered type. + (do_type): Do not demangle a cyclic reference and push/pop + referenced remembered type. + + * testsuite/demangle-expected: Add tests. + 2016-05-31 Alan Modra * xmemdup.c (xmemdup): Use xmalloc rather than xcalloc. Index: libiberty/cplus-dem.c === --- libiberty/cplus-dem.c (revision 237852) +++ libiberty/cplus-dem.c (working copy) @@ -144,6 +144,9 @@ struct work_stuff string* previous_argument; /* The last function argument demangled. */ int nrepeats; /* The number of times to repeat the previous argument. */ + int *proctypevec; /* Indices of currently processed remembered typevecs. */ + int proctypevec_size; + int nproctypes; }; #define PRINT_ANSI_QUALIFIERS (work -> options & DMGL_ANSI) @@ -436,6 +439,10 @@ iterate_demangle_function (struct work_stuff *, static void remember_type (struct work_stuff *, const char *, int); +static void push_processed_type (struct work_stuff *, int); + +static void pop_processed_type (struct work_stuff *); + static void remember_Btype (struct work_stuff *, const char *, int, int); static int register_Btype (struct work_stuff *); @@ -1302,6 +1309,13 @@ work_stuff_copy_to_from (struct work_stuff *to, st memcpy (to->btypevec[i], from->btypevec[i], len); } + if (from->proctypevec) +{ + to->proctypevec = XNEWVEC (int, from->proctypevec_size); + memcpy (to->proctypevec, from->proctypevec, + from->proctypevec_size * sizeof(int)); +} + if (from->ntmpl_args) to->tmpl_argvec = XNEWVEC (char *, from->ntmpl_args); @@ -1336,6 +1350,12 @@ delete_non_B_K_work_stuff (struct work_stuff *work work -> typevec = NULL; work -> typevec_size = 0; } + if (work -> proctypevec != NULL) +{ + free (work -> proctypevec); + work -> proctypevec = NULL; + work -> proctypevec_size = 0; +} if (work->tmpl_argvec) { int i; @@ -3554,6 +3574,8 @@ static int do_type (struct work_stuff *work, const char **mangled, string *result) { int n; + int i; + int is_proctypevec; int done; int success; string decl; @@ -3566,6 +3588,7 @@ do_type (struct work_stuff *work, const char **man done = 0; success = 1; + is_proctypevec = 0; while (success && !done) { int member; @@ -3626,7 +3649,14 @@ do_type (struct work_stuff *work, const char **man success = 0; } else - { + for (i = 0; i < work -> nproctypes; i++) + if (work -> proctypevec [i] == n) + success = 0; + + if (success) + { + is_proctypevec = 1; + push_processed_type (work, n); remembered_type = work -> typevec[n]; mangled = _type; } @@ -3849,6 +3879,9 @@ do_type (struct work_stuff *work, const char **man string_delete (result); string_delete (); + if (is_proctypevec) +pop_processed_type (work); + if (success) /* Assume an integral type, if we're not
Re: [AArch64] - feedback on the approach followed for vulcan scheduler
On Wed, Jun 29, 2016 at 9:59 AM, Virendra Pathakwrote: > Hi gcc-patches group, > > I am working on adding vulcan.md (machine description) for vulcan cpu > in the aarch64 port. However, before proposing the final patch, I would like > the basic approach to be reviewed by you all (as changes are there in > aarch64.md) > > In vulcan, a (load/store) instruction could be scheduled to cpu units in > different way based on the addressing mode(e.g. load, or load+integer). > So the requirement is to identify the addressing mode of (load/store) > instruction's operand while scheduling. > > For this purpose, a new attribute "addr_type" has been added in the > aarch64.md file. This helps in identifying which operands of (load/store) > instruction should be considered for finding the addressing mode. > > vulcan.md, while scheduling, calls a new function aarch64_mem_type_p > in the aarch64.c (via match_test) to decide the scheduling option based > on the addressing mode. > > I have copied the code snippet below (complete patch is attached with > this mail). > > Kindly review and give your feedback/comment. > Also if you think there could be an better alternative way, kindly suggest. Since this is only about post and pre increment being split into two micro-ops. I suspect something like what rs600 back-end does is better: ;; Does this instruction use update addressing? ;; This is used for load and store insns. See the comments for "indexed". (define_attr "update" "no,yes" (if_then_else (ior (match_operand 0 "update_address_mem") (match_operand 1 "update_address_mem")) (const_string "yes") (const_string "no"))) Where update_address_mem is defined as: ;; Return 1 if the operand is a MEM with an update-form address. This may ;; also include update-indexed form. (define_special_predicate "update_address_mem" (match_test "(MEM_P (op) && (GET_CODE (XEXP (op, 0)) == PRE_INC || GET_CODE (XEXP (op, 0)) == PRE_DEC || GET_CODE (XEXP (op, 0)) == PRE_MODIFY))")) Note you need to include the POST ones for AARCH64 but it should be similar enough. And then you just check the attr update. Thanks, Andrew > > Thanks in advance for your time. > > > > FILE - gcc/config/aarch64/aarch64-protos.h > > /* Mask bits to use for for aarch64_mem_type_p. Unshifted/shifted index >register variants are separated for scheduling purposes because the >distinction matters on some cores. */ > #define AARCH64_ADDR_REG_IMM 0x01 > #define AARCH64_ADDR_REG_WB0x02 > #define AARCH64_ADDR_REG_REG 0x04 > #define AARCH64_ADDR_REG_SHIFT 0x08 > #define AARCH64_ADDR_REG_EXT 0x10 > #define AARCH64_ADDR_REG_SHIFT_EXT 0x20 > #define AARCH64_ADDR_LO_SUM0x40 > #define AARCH64_ADDR_SYMBOLIC 0x80 > > > > > FILE - gcc/config/aarch64/aarch64.md > > (define_attr "addr_type" "none,op0,op1,op0addr,op1addr,lo_sum,wb" > (const_string "none")) > > > (define_insn "*mov_aarch64" > [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, > *w,r,*w, m, m, r,*w,*w") > (match_operand:SHORT 1 "general_operand" " r,M,D,m, > m,rZ,*w,*w, r,*w"))] > "(register_operand (operands[0], mode) > || aarch64_reg_or_zero (operands[1], mode))" > > { >switch (which_alternative) > { > case 0: >return "mov\t%w0, %w1"; > case 1: >return "mov\t%w0, %1"; > case 2: >return aarch64_output_scalar_simd_mov_immediate (operands[1], > mode); > case 3: >return "ldr\t%w0, %1"; > case 4: >return "ldr\t%0, %1"; > case 5: >return "str\t%w1, %0"; > case 6: >return "str\t%1, %0"; > case 7: >return "umov\t%w0, %1.[0]"; > case 8: >return "dup\t%0., %w1"; > case 9: >return "dup\t%0, %1.[0]"; > default: >gcc_unreachable (); > } > } > [(set_attr "type" "mov_reg,mov_imm,neon_move,load1,load1,store1,store1,\ > neon_to_gp,neon_from_gp,neon_dup") >(set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes") >(set_attr "addr_type" "*,*,*,op1,op1,op0,op0,*,*,*")] > ) > > > > > FILE - gcc/config/aarch64/vulcan.md > ;; Integer loads and stores. > > (define_insn_reservation "vulcan_load_basic" 4 > (and (eq_attr "tune" "vulcan") >(eq_attr "type" "load1") >(match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_SYMBOLIC > | AARCH64_ADDR_REG_IMM > | AARCH64_ADDR_LO_SUM)")) > "vulcan_ls01") > > (define_insn_reservation "vulcan_load_automod" 4 > (and (eq_attr "tune" "vulcan") >(eq_attr "type" "load1") >(match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_REG_WB)")) > "vulcan_ls01,vulcan_i012") > > > > > FILE -