[PATCH][2/n] Always 64bit-HWI cleanups
The following changes the configury to insist on [u]int64_t being available and removes the very old __int64 case. Autoconf doesn't check for it, support came in via a big merge in Dec 2002, r60174, and it was never used on the libcpp side until I fixed that with the last patch of this series, so we couldn't have relied on it at least since libcpp was introduced. Both libcpp and vmsdbg.h now use [u]int64_t, switching HOST_WIDE_INT to literally use int64_t has to be done with the grand renaming of all users due to us using 'unsigned HOST_WIDE_INT'. Btw, I couldn't find any standard way of writing [u]int64_t literals (substitution for HOST_WIDE_INT_C) nor one for printf formats (substitutions for HOST_WIDE_INT_PRINT and friends). I'll consider doing s/HOST_WIDE_INT/[U]INT64/ there if nobody comes up with a better plan. Unfortunately any followup will be the whole renaming game at once due to the 'unsigned' issue. I'll make sure to propose a hwint.h-only patch with a renaming guide for review and expect the actual renaming to take place using a script. Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok? After this patch you may use [u]int64_t freely in host sources (lto-plugin already does that - heh). Thanks, Richard. 2014-05-23 Richard Biener rguent...@suse.de libcpp/ * configure.ac: Remove long long and __int64 type checks, add check for uint64_t and fail if that wasn't found. * include/cpplib.h (cpp_num_part): Use uint64_t. * config.in: Regenerate. * configure: Likewise. gcc/ * configure.ac: Drop __int64 type check. Insist that we found uint64_t and int64_t. * hwint.h (HOST_BITS_PER___INT64): Remove. (HOST_BITS_PER_WIDE_INT): Define to 64 and remove __int64 case. (HOST_WIDE_INT_PRINT_*): Remove 32bit case. (HOST_WIDEST_INT*): Define to HOST_WIDE_INT*. (HOST_WIDEST_FAST_INT): Remove __int64 case. * vmsdbg.h (struct _DST_SRC_COMMAND): Use int64_t for dst_q_src_df_rms_cdt. * configure: Regenerate. * config.in: Likewise. Index: libcpp/config.in === *** libcpp/config.in(revision 210847) --- libcpp/config.in(working copy) *** *** 180,188 /* Define to 1 if you have the locale.h header file. */ #undef HAVE_LOCALE_H - /* Define to 1 if the system has the type `long long'. */ - #undef HAVE_LONG_LONG - /* Define to 1 if you have the memory.h header file. */ #undef HAVE_MEMORY_H --- 180,185 *** *** 231,239 /* Define to 1 if you have the unistd.h header file. */ #undef HAVE_UNISTD_H - /* Define to 1 if the system has the type `__int64'. */ - #undef HAVE___INT64 - /* Define as const if the declaration of iconv() needs const. */ #undef ICONV_CONST --- 228,233 *** *** 264,275 /* The size of `long', as computed by sizeof. */ #undef SIZEOF_LONG - /* The size of `long long', as computed by sizeof. */ - #undef SIZEOF_LONG_LONG - - /* The size of `__int64', as computed by sizeof. */ - #undef SIZEOF___INT64 - /* If using the C implementation of alloca, define if you know the direction of stack growth for your system; otherwise it will be automatically deduced at runtime. --- 258,263 *** *** 340,345 --- 328,338 /* Define to 1 if you need to in order for `stat' and other things to work. */ #undef _POSIX_SOURCE + /* Define for Solaris 2.5.1 so the uint64_t typedef from sys/synch.h, +pthread.h, or semaphore.h is not used. If the typedef were allowed, the +#define below would cause a syntax error. */ + #undef _UINT64_T + /* Define to empty if `const' does not conform to ANSI C. */ #undef const *** *** 361,366 --- 354,363 /* Define to `int' if sys/types.h does not define. */ #undef ssize_t + /* Define to the type of an unsigned integer type of width exactly 64 bits if +such a type exists and the standard includes do not define it. */ + #undef uint64_t + /* Define to the type of an unsigned integer type wide enough to hold a pointer, if such a type exists, and if the system does not define it. */ #undef uintptr_t Index: libcpp/configure === *** libcpp/configure(revision 210847) --- libcpp/configure(working copy) *** $as_echo $ac_res 6; } *** 1822,1827 --- 1822,1879 } # ac_fn_c_check_type + # ac_fn_c_find_uintX_t LINENO BITS VAR + # + # Finds an unsigned integer type with width BITS, setting cache variable VAR + # accordingly. + ac_fn_c_find_uintX_t () + { + as_lineno=${as_lineno-$1} as_lineno_stack=as_lineno_stack=$as_lineno_stack + { $as_echo $as_me:${as_lineno-$LINENO}: checking for uint$2_t 5 + $as_echo_n checking for uint$2_t... 6
Re: [COMMITTED 1/2] Just enumerate all GF_OMP_FOR_KIND_* and GF_OMP_TARGET_KIND_*.
I think it was supposed to note that it uses two bits in the mask. Did Jakub approve these patches you are committing now? Thanks, Richard. On Fri, May 23, 2014 at 1:32 PM, Thomas Schwinge tho...@codesourcery.com wrote: From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/ * gimple.h (enum gf_mask): Rewrite 0 shift expressions used for GF_OMP_FOR_KIND_MASK, GF_OMP_FOR_KIND_FOR, GF_OMP_FOR_KIND_DISTRIBUTE, GF_OMP_FOR_KIND_SIMD, GF_OMP_FOR_KIND_CILKSIMD, GF_OMP_TARGET_KIND_MASK, GF_OMP_TARGET_KIND_REGION, GF_OMP_TARGET_KIND_DATA, GF_OMP_TARGET_KIND_UPDATE. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@210854 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 7 +++ gcc/gimple.h | 18 +- 2 files changed, 16 insertions(+), 9 deletions(-) diff --git gcc/ChangeLog gcc/ChangeLog index d351c0b..fa2f3c3 100644 --- gcc/ChangeLog +++ gcc/ChangeLog @@ -1,5 +1,12 @@ 2014-05-23 Thomas Schwinge tho...@codesourcery.com + * gimple.h (enum gf_mask): Rewrite 0 shift expressions used + for GF_OMP_FOR_KIND_MASK, GF_OMP_FOR_KIND_FOR, + GF_OMP_FOR_KIND_DISTRIBUTE, GF_OMP_FOR_KIND_SIMD, + GF_OMP_FOR_KIND_CILKSIMD, GF_OMP_TARGET_KIND_MASK, + GF_OMP_TARGET_KIND_REGION, GF_OMP_TARGET_KIND_DATA, + GF_OMP_TARGET_KIND_UPDATE. + * gimplify.c (omp_notice_variable) case OMP_CLAUSE_DEFAULT_NONE: Explicitly enumerate the expected region types. diff --git gcc/gimple.h gcc/gimple.h index 9df45de..b1970e5 100644 --- gcc/gimple.h +++ gcc/gimple.h @@ -91,17 +91,17 @@ enum gf_mask { GF_CALL_ALLOCA_FOR_VAR = 1 5, GF_CALL_INTERNAL = 1 6, GF_OMP_PARALLEL_COMBINED = 1 0, -GF_OMP_FOR_KIND_MASK = 3 0, -GF_OMP_FOR_KIND_FOR= 0 0, -GF_OMP_FOR_KIND_DISTRIBUTE = 1 0, -GF_OMP_FOR_KIND_SIMD = 2 0, -GF_OMP_FOR_KIND_CILKSIMD = 3 0, +GF_OMP_FOR_KIND_MASK = (1 2) - 1, +GF_OMP_FOR_KIND_FOR= 0, +GF_OMP_FOR_KIND_DISTRIBUTE = 1, +GF_OMP_FOR_KIND_SIMD = 2, +GF_OMP_FOR_KIND_CILKSIMD = 3, GF_OMP_FOR_COMBINED= 1 2, GF_OMP_FOR_COMBINED_INTO = 1 3, -GF_OMP_TARGET_KIND_MASK= 3 0, -GF_OMP_TARGET_KIND_REGION = 0 0, -GF_OMP_TARGET_KIND_DATA= 1 0, -GF_OMP_TARGET_KIND_UPDATE = 2 0, +GF_OMP_TARGET_KIND_MASK= (1 2) - 1, +GF_OMP_TARGET_KIND_REGION = 0, +GF_OMP_TARGET_KIND_DATA= 1, +GF_OMP_TARGET_KIND_UPDATE = 2, /* True on an GIMPLE_OMP_RETURN statement if the return does not require a thread synchronization via some sort of barrier. The exact barrier -- 1.9.1
Re: [PATCH] Fix PR rtl-optimization/61278
On Fri, May 23, 2014 at 12:33 PM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: On 23 May 2014 17:05, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 23, 2014 at 9:23 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, The patch fixes PR rtl-optimization/61278. Root cause for issue is that df_live does not exist at -O1. Bootstrap and no make check regression on X86-64. OK for trunk? Why do you need to give up? It seems you can simply avoid marking the block as dirty (though df_get_live_in/out also hands you back DF_LR_IN/OUT if !df_live). So isn't the df_grow_bb_info the real fix? The df_get_live_in of the new basic block will be used to analyse later INSNs. If it is not set or incorrect, it will impact on later analysis. df_grow_bb_info is to make sure the live_in data structure is allocated for the new basic block (although I have not found any case fail without it). After bitmap_copy(...), we can use it for later INSNs. Note that df_get_live_in/out are functions tailored to IRA that knows that they handle both df_live and df_lr dependent on optimization level. Is shrink-wrapping supposed to work with both problems as well? Yes. But it seams not perfect to handle df_lr problem. When I fixed PR 57637 (https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00897.html), we selected if DF_LIVE doesn't exist, i.e. at -O1, just give up searching NEXT_BLOCK. Ok, I see. Maybe it would be better to completely disable shrink-wrapping when LIVE is not available. Patch is ok. Thanks, Richard. Thanks! -Zhenqiang ChangeLog: 2014-05-23 Zhenqiang Chen zhenqiang.c...@linaro.org PR rtl-optimization/61278 * shrink-wrap.c (move_insn_for_shrink_wrap): Check df_live. testsuite/ChangeLog: 2014-05-23 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.dg/lto/pr61278_0.c: New test. * gcc.dg/lto/pr61278_1.c: New test. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index f09cfe7..be17829 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -204,8 +204,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn, /* Create a new basic block on the edge. */ if (EDGE_COUNT (next_block-preds) == 2) { + /* If DF_LIVE doesn't exist, i.e. at -O1, just give up. */ + if (!df_live) + return false; + next_block = split_edge (live_edge); + /* We create a new basic block. Call df_grow_bb_info to make sure +all data structures are allocated. */ + df_grow_bb_info (df_live); bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb)); df_set_bb_dirty (next_block); diff --git a/gcc/testsuite/gcc.dg/lto/pr61278_0.c b/gcc/testsuite/gcc.dg/lto/pr61278_0.c new file mode 100644 index 000..03a24ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr61278_0.c @@ -0,0 +1,30 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -flto -O0 } } } */ +/* { dg-extra-ld-options -flto -O1 } */ + +static unsigned int +fn1 (int p1, int p2) +{ + return 0; +} + +char a, b, c; + +char +foo (char *p) +{ + int i; + for (b = 1 ; b 0; b++) +{ + for (i = 0; i 2; i++) + ; + for (a = 1; a 0; a++) + { + char d[1] = { 0 }; + if (*p) + break; + c ^= fn1 (fn1 (fn1 (0, 0), 0), 0); + } +} + return 0; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr61278_1.c b/gcc/testsuite/gcc.dg/lto/pr61278_1.c new file mode 100644 index 000..b02c8ac --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr61278_1.c @@ -0,0 +1,13 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -flto -O1 } } } */ + +extern char foo (char *); + +char d; + +int +main () +{ + foo (d); + return 0; +}
Re: [PATCH] Disable unroll loop that has header count less than iteration count.
On May 23, 2014 7:26:04 PM CEST, Jan Hubicka hubi...@ucw.cz wrote: On Thu, May 22, 2014 at 11:36 PM, Dehao Chen de...@google.com wrote: If a loop's header count is less than iteration count, the iteration estimation is apparently incorrect for this loop. Thus disable unrolling of such loops. Testing on going. OK for trunk if test pass? No. Why don't you instead plug the hole in expected_loop_iterations ()? That is, why may not loop-header be bogus? Isn't it maybe the bounding It is autoFDO thing. Dehao is basically pushing out changes that should make compiler more sane about bogus profiles. At the moment we have tests bb-count != 0 to figure out if the profile is reliable. I.e. we assume that profile feedback is always good, guessed profile is never good. Loop optimizers then do not trust guessed profile to give realistic estimates on number of iterations and bail out into simple heuristics for estimated profiles that are usually not good on this job - i.e. int niters = 0; if (desc-const_iter) niters = desc-niter; else if (loop-header-count) niters = expected_loop_iterations (loop); ... With FDO this change, as the FDO profiles are sometimes bogus (and there is not much to do about it). I would say that for loop optimization, we want a flag whether expected number of iterations is reliable. We probably want if (expected_loop_iterations_reliable_p (loop)) niters = expected_loop_iterations (loop); But why not simply never return bogus values from expected-loop-iterations? We probably want a flag in the .gcda file on whether it was from auto-fdo and only not trust profiles from those. Richard. We probably also want to store this information into loop structure rather than computing it all the time from profile, since the profile may get inaccurate and we already know how to maintain upper bounds on numbers of iterations, so it should be easy to implement. This would allow us to preserve info like for (i=0 ;i __bulitin_expect (n,10); i++) that would be nice feature to have. Honza you run into? /* Returns expected number of LOOP iterations. The returned value is bounded by REG_BR_PROB_BASE. */ unsigned expected_loop_iterations (const struct loop *loop) { gcov_type expected = expected_loop_iterations_unbounded (loop); return (expected REG_BR_PROB_BASE ? REG_BR_PROB_BASE : expected); } I miss a testcase as well. Richard. Thanks, Dehao gcc/ChangeLog: 2014-05-21 Dehao Chen de...@google.com * cfgloop.h (expected_loop_iterations_reliable_p): New func. * cfgloopanal.c (expected_loop_iterations_reliable_p): Likewise. * loop-unroll.c (decide_unroll_runtime_iterations): Disable unroll loop that has unreliable iteration counts. Index: gcc/cfgloop.h === --- gcc/cfgloop.h (revision 210717) +++ gcc/cfgloop.h (working copy) @@ -307,8 +307,8 @@ extern bool just_once_each_iteration_p (const stru gcov_type expected_loop_iterations_unbounded (const struct loop *); extern unsigned expected_loop_iterations (const struct loop *); extern rtx doloop_condition_get (rtx); +extern bool expected_loop_iterations_reliable_p (const struct loop *); - /* Loop manipulation. */ extern bool can_duplicate_loop_p (const struct loop *loop); Index: gcc/cfgloopanal.c === --- gcc/cfgloopanal.c (revision 210717) +++ gcc/cfgloopanal.c (working copy) @@ -285,6 +285,15 @@ expected_loop_iterations (const struct loop *loop) return (expected REG_BR_PROB_BASE ? REG_BR_PROB_BASE : expected); } +/* Returns true if the loop header's profile count is smaller than expected + loop iteration. */ + +bool +expected_loop_iterations_reliable_p (const struct loop *loop) +{ + return expected_loop_iterations (loop) loop-header-count; +} + /* Returns the maximum level of nesting of subloops of LOOP. */ unsigned Index: gcc/loop-unroll.c === --- gcc/loop-unroll.c (revision 210717) +++ gcc/loop-unroll.c (working copy) @@ -988,6 +988,15 @@ decide_unroll_runtime_iterations (struct loop *loo return; } + if (profile_status_for_fn (cfun) == PROFILE_READ + expected_loop_iterations_reliable_p (loop)) +{ + if (dump_file) + fprintf (dump_file, ;; Not unrolling loop, loop iteration + not reliable.); + return; +} + /* Check whether the loop rolls. */ if ((get_estimated_loop_iterations (loop, iterations) || get_max_loop_iterations (loop, iterations))
Re: Fix broken graph dump
On Sat, May 24, 2014 at 6:44 AM, Xinliang David Li davi...@google.com wrote: graph dump is broken in trunk (and gcc-49) -- the subgraph of the last function gets dumped. The patch fixed the problem. Also fixed the function header dump -- the cgraph uid is still used in many places such as -fdisable|enable-xxx options. Ok for trunk? Ok if tested ok. Thanks, Richard. David
Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)
On Mon, May 26, 2014 at 10:14 AM, FX fxcoud...@gmail.com wrote: .././../gcc-4.10-20140518/gcc/wide-int.cc:1274:23: error: invalid use of a cast in a inline asm context requiring an l-value: remove the cast or build with -fheinous-gnu-extensions umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); ~~~^ This is PR 61146. You can get around it by adding -fheinous-gnu-extensions to BOOT_CFLAGS. This causes GCC bootstrap to fail on Darwin systems (whose system compiler is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure it’s the right call, see below), I’ve filed a separate report for the bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315). Regarding PR 61146, I agree with Marc Glisse (comment #3) that the casts in question look weird and should probably be removed, as was done in GMP. This code should be cleaned up, and it will fix bootstrap on clang-based target coincidentally, without adding another kludge. Please post a patch. Thanks, Richard. FX
[PATCH][4/4] Always 64bit-HWI cleanups
This is the last cleanup bit. Remaining is getting rid of HOST_WIDE_INT in favor of [u]int64_t and adjusting interfaces and interface names. That's too disruptive at the moment (thus appropriate for a delay until 4.9.1 is out) and I'm not sure if we want to split that work or if such splitting is even possible sensibly. For example wide-int should get its own abstraction of its storage type (which could be simply [u]int64_t - the actual compute routines can happily split a [u]int64_t into two pieces again if that is desirable on some architectures - I suspect GCC is good enough in optimizing the two element case inline for them). This moves the relatively new HALF_WIDE_INT stuff to its only user, wide-int.cc, and uses the PRI*64 stuff in the HOST_WIDE_INT_PRINT macros. Bootstrap / testing in progress on x86_64-unknown-linux-gnu. Richard. 2014-05-26 Richard Biener rguent...@suse.de * hwint.h (*_HALF_WIDE_INT*): Move to ... * wide-int.cc (HOST_BITS_PER_HALF_WIDE_INT, HOST_HALF_WIDE_INT): ... here and remove the rest. * hwint.h (HOST_WIDE_INT_PRINT_*): Define in terms of PRI*64. Index: gcc/hwint.h === *** gcc/hwint.h.orig2014-05-26 10:49:23.009339524 +0200 --- gcc/hwint.h 2014-05-26 11:09:43.597255488 +0200 *** extern char sizeof_long_long_must_be_8[s *** 64,103 # endif #endif - /* Print support for half a host wide int. */ - #define HOST_BITS_PER_HALF_WIDE_INT 32 - #if HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_LONG - # define HOST_HALF_WIDE_INT long - # define HOST_HALF_WIDE_INT_PRINT HOST_LONG_FORMAT - # define HOST_HALF_WIDE_INT_PRINT_C L - # define HOST_HALF_WIDE_INT_PRINT_DEC % HOST_HALF_WIDE_INT_PRINT d - # define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C - # define HOST_HALF_WIDE_INT_PRINT_UNSIGNED % HOST_HALF_WIDE_INT_PRINT u - # define HOST_HALF_WIDE_INT_PRINT_HEX %# HOST_HALF_WIDE_INT_PRINT x - # define HOST_HALF_WIDE_INT_PRINT_HEX_PURE % HOST_HALF_WIDE_INT_PRINT x - #elif HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_INT - # define HOST_HALF_WIDE_INT int - # define HOST_HALF_WIDE_INT_PRINT - # define HOST_HALF_WIDE_INT_PRINT_C - # define HOST_HALF_WIDE_INT_PRINT_DEC % HOST_HALF_WIDE_INT_PRINT d - # define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C - # define HOST_HALF_WIDE_INT_PRINT_UNSIGNED % HOST_HALF_WIDE_INT_PRINT u - # define HOST_HALF_WIDE_INT_PRINT_HEX %# HOST_HALF_WIDE_INT_PRINT x - # define HOST_HALF_WIDE_INT_PRINT_HEX_PURE % HOST_HALF_WIDE_INT_PRINT x - #elif HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_SHORT - # define HOST_HALF_WIDE_INT short - # define HOST_HALF_WIDE_INT_PRINT - # define HOST_HALF_WIDE_INT_PRINT_C - # define HOST_HALF_WIDE_INT_PRINT_DEC % HOST_HALF_WIDE_INT_PRINT d - # define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C - # define HOST_HALF_WIDE_INT_PRINT_UNSIGNED % HOST_HALF_WIDE_INT_PRINT u - # define HOST_HALF_WIDE_INT_PRINT_HEX %# HOST_HALF_WIDE_INT_PRINT x - # define HOST_HALF_WIDE_INT_PRINT_HEX_PURE % HOST_HALF_WIDE_INT_PRINT x - #else - #error Please add support for HOST_HALF_WIDE_INT - #endif - - #define HOST_WIDE_INT_UC(X) HOST_WIDE_INT_C (X ## U) #define HOST_WIDE_INT_1 HOST_WIDE_INT_C (1) #define HOST_WIDE_INT_1U HOST_WIDE_INT_UC (1) --- 64,69 *** extern char sizeof_long_long_must_be_8[s *** 109,156 typedef before using the __asm_fprintf__ format attribute. */ typedef HOST_WIDE_INT __gcc_host_wide_int__; - /* Various printf format strings for HOST_WIDE_INT. */ - - #if HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG - # define HOST_WIDE_INT_PRINT HOST_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C L - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_FORMAT x%016 HOST_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_FORMAT x - #else - # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C LL - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_LONG_FORMAT x%016 HOST_LONG_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_LONG_FORMAT x - #endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */ - - #define HOST_WIDE_INT_PRINT_DEC % HOST_WIDE_INT_PRINT d - #define HOST_WIDE_INT_PRINT_DEC_C HOST_WIDE_INT_PRINT_DEC HOST_WIDE_INT_PRINT_C - #define HOST_WIDE_INT_PRINT_UNSIGNED % HOST_WIDE_INT_PRINT u - #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x - #define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x - /* Provide C99 inttypes.h style format definitions for 64bits. */ #ifndef HAVE_INTTYPES_H #undef PRId64 ! #define PRId64 HOST_WIDE_INT_PRINT d #undef PRIi64 ! #define PRIi64 HOST_WIDE_INT_PRINT i #undef PRIo64 ! #define
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Mon, May 26, 2014 at 12:22 PM, Eric Botcazou ebotca...@adacore.com wrote: Hi, the motivation of this work is to get rid of the range check on Z in: function F (X : Integer; Y : Integer ) return Positive is Z : Integer; begin if X = Y then return 1; end if; Z := Y - X; return Z; end; An equivalent C testcase is: extern void abort (void); int foo (int x, int y) { int z; if (x = y) return 1; z = y - x; if (z = 0) abort (); return z; } for which the call to abort is not optimized at -O2. fold_comparison knows how to optimize X +- C1 CMP C2 to X CMP C2 -+ C1 with undefined overflow so optimizing X - Y CMP 0 to X CMP Y is a generalization. Once this is done, forwprop will immediately optimize: extern void abort (void); int foo (int x, int y) { int z; if (x = y) return 1; z = y - x; if (z = 0) abort (); return 0; } because 'z' has a single use, but the original testcase won't be optimized. The attached patch implements the missing bits in vrp_evaluate_conditional by manual propagating the operands of a defining PLUS_EXPR or MINUS_EXPR for an SSA name in a condition; an other possibility could be DOM instead of VRP. This comes with 4 testcases: the original Ada testcase, the C equivalent, a testcase for the folding and another one for the -Wstrict-overflow warning. Tested on x86_64-suse-linux with no regressions. + tree new_const + = fold_build2_loc (loc, reverse_op, TREE_TYPE (arg1), const2, const1); /* If the constant operation overflowed this can be simplified as a comparison against INT_MAX/INT_MIN. */ - if (TREE_CODE (lhs) == INTEGER_CST - TREE_OVERFLOW (lhs)) + if (TREE_OVERFLOW (new_const)) well, either use int_const_binop above or retain the check (or use TREE_OVERFLOW_P). Bonus points for using wide-ints here and not relying on TREE_OVERFLOW. + /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. */ + if (TREE_CODE (arg0) == MINUS_EXPR + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1)) any good reason for using TYPE_OVERFLOW_UNDEFINED on the type of arg1 instead on the type of the MINUS (yes, they should match, but it really looks odd ... the overflow of the minus has to be undefined). Also for EQ_EXPR and NE_EXPR the transform is fine even when !TYPE_OVERFLOW_UNDEFINED (and we seem to perform it already somewhere). Please look where and try to add the undefined overflow case to it. As for the VRP pieces I don't understand what is missing here (well, compare_range_with_value and/or compare_values might not handle this case? then better fix that to improve symbolic range handling in general?). Also I have a longstanding patch in my tree that does Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 210931) +++ gcc/tree-vrp.c (working copy) @@ -6919,14 +6919,15 @@ vrp_evaluate_conditional_warnv_with_ops_ vr0 = (TREE_CODE (op0) == SSA_NAME) ? get_value_range (op0) : NULL; vr1 = (TREE_CODE (op1) == SSA_NAME) ? get_value_range (op1) : NULL; + tree res = NULL_TREE; if (vr0 vr1) -return compare_ranges (code, vr0, vr1, strict_overflow_p); - else if (vr0 vr1 == NULL) -return compare_range_with_value (code, vr0, op1, strict_overflow_p); - else if (vr0 == NULL vr1) -return (compare_range_with_value +res = compare_ranges (code, vr0, vr1, strict_overflow_p); + if (!res vr0) +res = compare_range_with_value (code, vr0, op1, strict_overflow_p); + if (!res vr1) +res = (compare_range_with_value (swap_tree_comparison (code), vr1, op0, strict_overflow_p)); - return NULL; + return res; } /* Helper function for vrp_evaluate_conditional_warnv. */ maybe that helps as well. Thanks, Richard. 2014-05-26 Eric Botcazou ebotca...@adacore.com * fold-const.c (fold_comparison): Clean up and simplify X +- C1 CMP C2 to X CMP C2 -+ C1 transformation, add X - Y CMP 0 to X CMP Y. * tree-vrp.c (vrp_evaluate_conditional_warnv_with_fold): New function. (vrp_evaluate_conditional): Call it on SSA names with defining PLUS_EXPR or MINUS_EXPR if the evaluation of the condition yielded nothing. 2014-05-26 Eric Botcazou ebotca...@adacore.com * gcc.dg/fold-compare-8.c: New test. * gcc.dg/Wstrict-overflow-25.c: Likewise. * gcc.dg/tree-ssa/vrp92.c: Likewise. * gnat.dg/opt38.adb: Likewise. -- Eric Botcazou
[PATCH] Fix primitive wi::int_traits
The following patch fixes completes the primitive int_traints with long and long long variants and drops the HWI case. This fixes darwin builds where HOST_WIDE_INT is 'long' but int64_t is 'long long'. Bootstrapped on x86_64-unknown-linux-gnu (and on darwin by Ian), soon to be committed. Richard. 2014-05-26 Richard Biener rguent...@suse.de * wide-int.h (wi::int_traits long, wi::int_traits unsigned long, wi::int_traits long long, wi::int_traits unsigned long long): Provide specializations. (wi::int_traits HOST_WIDE_INT, wi::int_traits unsigned HOST_WIDE_INT): Remove specializations. Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 210931) +++ gcc/wide-int.h (working copy) @@ -1446,12 +1446,22 @@ namespace wi : public primitive_int_traits unsigned int, false {}; template - struct int_traits HOST_WIDE_INT -: public primitive_int_traits HOST_WIDE_INT, true {}; + struct int_traits long +: public primitive_int_traits long, true {}; template - struct int_traits unsigned HOST_WIDE_INT -: public primitive_int_traits unsigned HOST_WIDE_INT, false {}; + struct int_traits unsigned long +: public primitive_int_traits unsigned long, false {}; + +#if defined HAVE_LONG_LONG + template + struct int_traits long long +: public primitive_int_traits long long, true {}; + + template + struct int_traits unsigned long long +: public primitive_int_traits unsigned long long, false {}; +#endif } namespace wi
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On May 27, 2014 3:58:06 AM CEST, David Edelsohn dje@gmail.com wrote: This patch (further) broke bootstrap on AIX. AIX defaults to 32 bit, although it supports 64 bit HWI. /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'int print_statistics(bitmap_descriptor_d**, output_info*)': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2168:24: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'void dump_bitmap_statistics()': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2202:15: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bt-load.c: In function 'void migrate_btr_defs(reg_class, int)': /nasfarm/edelsohn/src/src/gcc/bt-load.c:1414:34: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_edge_info(FILE*, edge, int, int)': /nasfarm/edelsohn/src/src/gcc/cfg.c:489:25: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. Can you investigate some more the situation on aix? Thanks, Richard. bitmap.i looks like: int print_statistics (bitmap_descriptor_d **slot, output_info *i) { bitmap_descriptor d = *slot; char s[4096]; if (d-allocated) { const char *s1 = d-file; const char *s2; while ((s2 = strstr (s1, gcc/))) s1 = s2 + 4; sprintf (s, %s:%i (%s), s1, d-line, d-function); s[41] = 0; fprintf ((_iob[2]), %-41s %9u %15PRId64 %15PRId64 %15PRId64 %10PRId64 %10PRId64\n, s, d-created, d-allocated, d-peak, d-current, d-nsearches, d-search_iter); i-size += d-allocated; i-count += d-created; } return 1; } - David
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. 2014-05-26 Patrick Palka patr...@parcs.ath.cx * c-family/c-common.c (decl_with_nonnull_addr_p): Assume inline functions are non-null. * fold-const.c (tree_single_nonzero_warnv_p): Likewise. --- gcc/c-family/c-common.c | 4 +++- gcc/fold-const.c| 5 - gcc/testsuite/g++.dg/inline-1.C | 14 ++ 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/inline-1.C diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 6ec14fc..d4747a0 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4508,7 +4508,9 @@ decl_with_nonnull_addr_p (const_tree expr) return (DECL_P (expr) (TREE_CODE (expr) == PARM_DECL || TREE_CODE (expr) == LABEL_DECL - || !DECL_WEAK (expr))); + || !DECL_WEAK (expr) + || (TREE_CODE (expr) == FUNCTION_DECL + DECL_DECLARED_INLINE_P (expr; } /* Prepare expr to be an argument of a TRUTH_NOT_EXPR, diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 188b179..2796079 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -16052,7 +16052,10 @@ tree_single_nonzero_warnv_p (tree t, bool *strict_overflow_p) || (DECL_CONTEXT (base) TREE_CODE (DECL_CONTEXT (base)) == FUNCTION_DECL auto_var_in_fn_p (base, DECL_CONTEXT (base) - return !VAR_OR_FUNCTION_DECL_P (base) || !DECL_WEAK (base); + return !VAR_OR_FUNCTION_DECL_P (base) +|| !DECL_WEAK (base) +|| (TREE_CODE (base) == FUNCTION_DECL + DECL_DECLARED_INLINE_P (base)); /* Constants are never weak. */ if (CONSTANT_CLASS_P (base)) diff --git a/gcc/testsuite/g++.dg/inline-1.C b/gcc/testsuite/g++.dg/inline-1.C new file mode 100644 index 000..65beff1 --- /dev/null +++ b/gcc/testsuite/g++.dg/inline-1.C @@ -0,0 +1,14 @@ +// { dg-options -Waddress } + +inline void +foo (void) +{ +} + +int +bar (void) +{ + return foo == 0; // { dg-warning never be NULL } +} + +// { dg-final { scan-assembler-not foo } } -- 2.0.0.rc2
Re: [PATCH] Fix an AARCH64/ARM performance regression
On Tue, May 27, 2014 at 10:10 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, I have been informed, that the following check-in may cause a slight performance regression on aarch64 and arm on trunk and gcc-4_9-branch: See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205899 This can be seen with the following test example: test.c: typedef struct { int *x; int *y; int *z; int *m; int *n; int *o; }A; typedef struct { int x; int y; int z; int m; int n; int o; }B; A a[6]; B *b; void test(int i) { A *a1 = a[i]; B *b1 = b[i]; a1-x = b1-x; a1-y = b1-y; a1-z = b1-z; a1-m = b1-m; a1-n = b1-n; a1-o = b1-o; } when compiled with aarch64-elf-gcc -O3 -S we get the following assembler code: test: adrp x1, b sxtw x3, w0 mov w5, 24 adrp x2, a addx2, x2, :lo12:a ldr x4, [x1, #:lo12:b] lsl x1, x3, 2 subx1, x1, x3 lsl x1, x1, 4 smaddl x0, w0, w5, x4 addx3, x2, x1 addx7, x0, 8 addx6, x0, 16 addx8, x0, 4 addx5, x0, 12 addx4, x0, 20 str x0, [x2, x1] mov x0, x3 mov x1, x3 str x8, [x3, 8] str x7, [x0, 16]! str x6, [x1, 32]! str x5, [x0, 8] str x4, [x1, 8] ret Note the two mov instructions, and that two extra registers are used to store the values. The mov instructions have not been there before the patch. After some investigation I found out how this happened: The difference is first visible with -fdump-rtl-expand-slim: 1: NOTE_INSN_DELETED 4: NOTE_INSN_BASIC_BLOCK 2 2: r83:SI=x0:SI 3: NOTE_INSN_FUNCTION_BEG 6: r74:DI=sign_extend(r83:SI) 7: r85:DI=high(`b') 8: r84:DI=r85:DI+low(`b') REG_EQUAL `b' 9: r87:SI=0x18 10: r86:DI=sign_extend(r83:SI)*sign_extend(r87:SI) 11: r88:DI=[r84:DI] 12: r76:DI=r88:DI+r86:DI 13: r90:DI=high(`a') 14: r89:DI=r90:DI+low(`a') REG_EQUAL `a' 15: r91:DI=sign_extend(r83:SI) 16: r92:DI=r91:DI 17: r93:DI=r92:DI0x2 18: r94:DI=r93:DI-r91:DI REG_EQUAL r91:DI*0x3 19: r95:DI=r94:DI0x4 20: r94:DI=r95:DI REG_EQUAL r91:DI*0x30 21: r96:DI=r89:DI+r94:DI 22: [r96:DI]=r76:DI 23: r98:DI=high(`a') 24: r97:DI=r98:DI+low(`a') REG_EQUAL `a' 25: r99:DI=sign_extend(r83:SI) 26: r100:DI=r99:DI 27: r101:DI=r100:DI0x2 28: r102:DI=r101:DI-r99:DI REG_EQUAL r99:DI*0x3 29: r103:DI=r102:DI0x4 30: r102:DI=r103:DI REG_EQUAL r99:DI*0x30 31: r104:DI=r97:DI+r102:DI 32: r105:DI=r76:DI+0x4 33: [r104:DI+0x8]=r105:DI 34: r107:DI=high(`a') 35: r106:DI=r107:DI+low(`a') REG_EQUAL `a' 36: r108:DI=sign_extend(r83:SI) 37: r109:DI=r108:DI 38: r110:DI=r109:DI0x2 39: r111:DI=r110:DI-r108:DI REG_EQUAL r108:DI*0x3 40: r112:DI=r111:DI0x4 41: r111:DI=r112:DI REG_EQUAL r108:DI*0x30 42: r113:DI=r106:DI+r111:DI 43: r114:DI=r113:DI+0x10 44: r115:DI=r76:DI+0x8 45: [r114:DI]=r115:DI 46: r117:DI=high(`a') 47: r116:DI=r117:DI+low(`a') REG_EQUAL `a' 48: r118:DI=sign_extend(r83:SI) 49: r119:DI=r118:DI 50: r120:DI=r119:DI0x2 51: r121:DI=r120:DI-r118:DI REG_EQUAL r118:DI*0x3 52: r122:DI=r121:DI0x4 53: r121:DI=r122:DI REG_EQUAL r118:DI*0x30 54: r123:DI=r116:DI+r121:DI 55: r124:DI=r123:DI+0x10 56: r125:DI=r76:DI+0xc 57: [r124:DI+0x8]=r125:DI 58: r127:DI=high(`a') 59: r126:DI=r127:DI+low(`a') REG_EQUAL `a' 60: r128:DI=sign_extend(r83:SI) 61: r129:DI=r128:DI 62: r130:DI=r129:DI0x2 63: r131:DI=r130:DI-r128:DI REG_EQUAL r128:DI*0x3 64: r132:DI=r131:DI0x4 65: r131:DI=r132:DI REG_EQUAL r128:DI*0x30 66: r133:DI=r126:DI+r131:DI 67: r134:DI=r133:DI+0x20 68: r135:DI=r76:DI+0x10 69: [r134:DI]=r135:DI 70: r137:DI=high(`a') 71: r136:DI=r137:DI+low(`a') REG_EQUAL `a' 72: r138:DI=sign_extend(r83:SI) 73: r139:DI=r138:DI 74: r140:DI=r139:DI0x2 75: r141:DI=r140:DI-r138:DI REG_EQUAL r138:DI*0x3 76: r142:DI=r141:DI0x4 77: r141:DI=r142:DI REG_EQUAL r138:DI*0x30 78: r143:DI=r136:DI+r141:DI 79: r144:DI=r143:DI+0x20 80: r145:DI=r76:DI+0x14 81: [r144:DI+0x8]=r145:DI Note the offset +0x8 on the instructions 33, 57, 81 but not on 22, 45, 69. This artefact was not there before the patch. Well, this causes little ripples in the later rtl-passes. There is one pass that does a pretty good job at compensating
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Tue, May 27, 2014 at 11:25 AM, Eric Botcazou ebotca...@adacore.com wrote: + tree new_const + = fold_build2_loc (loc, reverse_op, TREE_TYPE (arg1), const2, const1); /* If the constant operation overflowed this can be simplified as a comparison against INT_MAX/INT_MIN. */ - if (TREE_CODE (lhs) == INTEGER_CST - TREE_OVERFLOW (lhs)) + if (TREE_OVERFLOW (new_const)) well, either use int_const_binop above or retain the check (or use TREE_OVERFLOW_P). Bonus points for using wide-ints here and not relying on TREE_OVERFLOW. The check is useless (you get either NULL_TREE or INTEGER_CST here) but I'll use int_const_binop. + /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. */ + if (TREE_CODE (arg0) == MINUS_EXPR + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1)) any good reason for using TYPE_OVERFLOW_UNDEFINED on the type of arg1 instead on the type of the MINUS (yes, they should match, but it really looks odd ... the overflow of the minus has to be undefined). For the sake of consistency with the X +- C1 CMP C2 case just above, but I can change both. Also for EQ_EXPR and NE_EXPR the transform is fine even when !TYPE_OVERFLOW_UNDEFINED (and we seem to perform it already somewhere). Please look where and try to add the undefined overflow case to it. Yes, but it's the same for the X +- C1 CMP C2 case, i.e. there are specific cases for X +- C1 EQ/NE C2 and X - Y EQ/NE 0 in fold_binary, so I'm not sure what you're asking. I'm asking to merge them (move them to fold_comparison). As for the VRP pieces I don't understand what is missing here (well, compare_range_with_value and/or compare_values might not handle this case? then better fix that to improve symbolic range handling in general?). Also I have a longstanding patch in my tree that does Yes, there is an explicit non-handling of symbolic ranges for PLUS_EXPR and MINUS_EXPR in VRP (extract_range_from_binary_expr_1) and the patch works around it by propagating the code instead of the ranges, which is far easier and sufficient here. If you think that the way to go is to handle symbolic ranges for PLUS_EXPR and MINUS_EXPR instead, fine with me, I can try. Yeah, it would be nice to see some support. The most interesting cases will be symbolic-singleton +- CST where the offset shrinks a constant offset in a symbolic A +- CST (thus we don't get into any overflow issues). Thus handling [a + 1, a + 1] - [1, 1] - [a, a] for example. We get the offsetted singleton symbolic ranges from conditional asserts a lot. Thanks, Richard. -- Eric Botcazou
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Tue, May 27, 2014 at 11:59 AM, Eric Botcazou ebotca...@adacore.com wrote: I'm asking to merge them (move them to fold_comparison). OK, I'll repost a first patch doing only the fold-const.c massaging. Yeah, it would be nice to see some support. The most interesting cases will be symbolic-singleton +- CST where the offset shrinks a constant offset in a symbolic A +- CST (thus we don't get into any overflow issues). Thus handling [a + 1, a + 1] - [1, 1] - [a, a] for example. We get the offsetted singleton symbolic ranges from conditional asserts a lot. For the case at hand, you have [x + 1, INF] for y and you want to evaluate the range of y - x, so you really don't want to use the range of x... Ok. That's similar to the issue I try to address with the VRP snipped I posted yesterday. I'd say we still handle basic symbolic range ops in extract_range_from_binary_1 but in extract_range_from_binary_expr for a symbolic op0 we try to simplify it with both [op1, op1] and with the value-range of op1 until we get a non-varying range as result. Not sure if it's worth restricting that to the case where op0s value-range refers to op1 or vice versa, and eventually only use op1 symbolically then. Richard. -- Eric Botcazou
[PATCH] Try harder for conditional evaluation in VRP
Especially for ops with symbolic ranges it may be preferable to compare one range with an op such as in [x + 1, x + 1] x instead of expanding the range of x on the rhs to sth unrelated. So, try harder. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-05-27 Richard Biener rguent...@suse.de * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 210931) +++ gcc/tree-vrp.c (working copy) @@ -6919,14 +6919,15 @@ vrp_evaluate_conditional_warnv_with_ops_ vr0 = (TREE_CODE (op0) == SSA_NAME) ? get_value_range (op0) : NULL; vr1 = (TREE_CODE (op1) == SSA_NAME) ? get_value_range (op1) : NULL; + tree res = NULL_TREE; if (vr0 vr1) -return compare_ranges (code, vr0, vr1, strict_overflow_p); - else if (vr0 vr1 == NULL) -return compare_range_with_value (code, vr0, op1, strict_overflow_p); - else if (vr0 == NULL vr1) -return (compare_range_with_value +res = compare_ranges (code, vr0, vr1, strict_overflow_p); + if (!res vr0) +res = compare_range_with_value (code, vr0, op1, strict_overflow_p); + if (!res vr1) +res = (compare_range_with_value (swap_tree_comparison (code), vr1, op0, strict_overflow_p)); - return NULL; + return res; } /* Helper function for vrp_evaluate_conditional_warnv. */
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On Tue, 27 May 2014, Richard Biener wrote: On May 27, 2014 3:58:06 AM CEST, David Edelsohn dje@gmail.com wrote: This patch (further) broke bootstrap on AIX. AIX defaults to 32 bit, although it supports 64 bit HWI. /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'int print_statistics(bitmap_descriptor_d**, output_info*)': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2168:24: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'void dump_bitmap_statistics()': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2202:15: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bt-load.c: In function 'void migrate_btr_defs(reg_class, int)': /nasfarm/edelsohn/src/src/gcc/bt-load.c:1414:34: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_edge_info(FILE*, edge, int, int)': /nasfarm/edelsohn/src/src/gcc/cfg.c:489:25: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. Can you investigate some more the situation on aix? Proposed patch below. Richard. Index: gcc/hwint.h === --- gcc/hwint.h (revision 210932) +++ gcc/hwint.h (working copy) @@ -135,21 +135,30 @@ typedef HOST_WIDE_INT __gcc_host_wide_in #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x #define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x -/* Provide C99 inttypes.h style format definitions for 64bits. */ -#ifndef HAVE_INTTYPES_H -#undef PRId64 +/* Provide C99 inttypes.h style format definitions for 64bits. As + 64bit types are only conditionally supported also provide fallback + definitions for the 64bit variants even if inttypes.h is available. */ +#ifndef PRId64 #define PRId64 HOST_WIDE_INT_PRINT d -#undef PRIi64 +#endif +#ifndef PRIi64 #define PRIi64 HOST_WIDE_INT_PRINT i -#undef PRIo64 +#endif +#ifndef PRIo64 #define PRIo64 HOST_WIDE_INT_PRINT o -#undef PRIu64 +#endif +#ifndef PRIu64 #define PRIu64 HOST_WIDE_INT_PRINT u -#undef PRIx64 +#endif +#ifndef PRIx64 #define PRIx64 HOST_WIDE_INT_PRINT x -#undef PRIX64 +#endif +#ifndef PRIX64 #define PRIX64 HOST_WIDE_INT_PRINT X #endif +#ifndef HAVE_INTTYPES_H +/* 32bit variants go here once we start using them. */ +#endif /* Define HOST_WIDEST_FAST_INT to the widest integer type supported efficiently in hardware. (That is, the widest integer type that fits
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On Tue, 27 May 2014, Jakub Jelinek wrote: On Tue, May 27, 2014 at 08:26:35AM +0200, Richard Biener wrote: /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. BTW, we should be prepared for C++11 udlit and use spaces around the PRI* macros, in case some compiler treats those as udlits unconditionally. Though that's not likely as it would break existing and valid code. Richard.
[PATCH] Fix HWI AIX breakage
This fixes the AIX breakage by defining __STDC_FORMAT_MACROS earlier, before stdio.h on AIX gets to include inttypes.h. Committed as obvious. Richard. 2014-05-27 Richard Biener rguent...@suse.de * system.h (__STDC_FORMAT_MACROS): Define as very first thing. Index: gcc/system.h === --- gcc/system.h(revision 210972) +++ gcc/system.h(working copy) @@ -22,6 +22,12 @@ along with GCC; see the file COPYING3. #ifndef GCC_SYSTEM_H #define GCC_SYSTEM_H +/* Define this so that inttypes.h defines the PRI?64 macros even + when compiling with a C++ compiler. Define it here so in the + event inttypes.h gets pulled in by another header it is already + defined. */ +#define __STDC_FORMAT_MACROS + /* We must include stdarg.h before stdio.h. */ #include stdarg.h @@ -491,7 +497,6 @@ extern void *realloc (void *, size_t); #endif #ifdef HAVE_INTTYPES_H -#define __STDC_FORMAT_MACROS #include inttypes.h #endif
Re: [PATCH] Try harder for conditional evaluation in VRP
On Tue, 27 May 2014, Steven Bosscher wrote: On Tue, May 27, 2014 at 12:27 PM, Richard Biener wrote: * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. No test case? Sorry ;) Happens to patches I uncover in my dev tree. I'll try to come up with sth (I remember coding it when workin on some PR ... but I've lost track of which one). Richard.
Re: [PATCH] Try harder for conditional evaluation in VRP
On Tue, 27 May 2014, Richard Biener wrote: On Tue, 27 May 2014, Steven Bosscher wrote: On Tue, May 27, 2014 at 12:27 PM, Richard Biener wrote: * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. No test case? Sorry ;) Happens to patches I uncover in my dev tree. I'll try to come up with sth (I remember coding it when workin on some PR ... but I've lost track of which one). Here is one (yeah, a bit artificial ...). Committed. Richard. 2014-05-27 Richard Biener rguent...@suse.de * gcc.dg/tree-ssa/vrp92.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/vrp92.c === --- gcc/testsuite/gcc.dg/tree-ssa/vrp92.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/vrp92.c (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp1-details } */ + +void bar (void); +int foo (int i, int j) +{ + int res = 1; + if (i j) +{ + /* We should be able to simplify the following conditional + during propagation. */ + if (i j) + res = 0; +} + /* And compute res as having a value-range of [1,1]. */ + if (res) +return i; + return j; +} + +/* { dg-final { scan-tree-dump res_.: \\\[1, 1\\\] vrp1 } } */ +/* { dg-final { scan-tree-dump-not Threaded vrp1 } } */ +/* { dg-final { cleanup-tree-dump vrp1 } } */
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Tue, May 27, 2014 at 4:06 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, May 27, 2014 at 8:32 AM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, May 27, 2014 at 3:33 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. So as it stands the foo == 0 check in the above example gets folded with or without my patch. The issue here seems to be related to the use of gnu_inline. The address of an inline function marked gnu_inline shouldn't be considered non-NULL because a standalone function does not actually get emitted. Of course, one could inspect aliases but in general it is not correct to assume such functions are non-NULL. Does this sound right? This issue has slight overlap with the issue that my patch tries to fix. I could try to handle it as well. Oh, disregard the above. I think I see what you mean now, Richard. Because __foo is an alias for foo, and __foo is weak, is it safe to assume that foo is non-NULL? That is a tricky question... No (I wasn't making a folding validity remark). I want to make sure we won't see diagnostics on code like #include string.h int main() { if (malloc != 0) ... } just because it is built with -D_FORTIFY_SOURCE. Whether we may fold that check at all is another question, of course (and again that should _not_ differ between -D_FORTIFY_SOURCE or -U_FORTIFY_SOURCE, but it may well also be a glibc issue). And yes, aliases are tricky beasts ... Richard.
Re: Mark more constants readonly
On Tue, May 27, 2014 at 3:13 PM, Bernd Schmidt ber...@codesourcery.com wrote: I noticed that string constants built by the Fortran frontend don't set TREE_READONLY for STRING_CST (and subsequently noticed that nothing seems to set it for COMPLEX_CST). That was confusing the ptx backend I'm working on. The -fwritable-strings option for C was removed a while ago, so I tested the following patch to just set the flag unconditionally, and passed bootstrap and test on x86_64-linux. Ok? Hmm? Not so obvious. TREE_READONLY has no purpose on tcc_constant nodes if I read documentation correctly (tree.h). In fact base.readonly_flag is unused for tcc_constant (and would be redundant with TREE_CONSTANT). Richard. Bernd
Re: Don't copy constants in build_constant_desc
On Tue, May 27, 2014 at 3:26 PM, Bernd Schmidt ber...@codesourcery.com wrote: When putting a constant into the constant pool, we make a copy of the tree in build_constant_desc. This caused me some problems with the ptx backend, which has a pass to modify the address spaces of all variables and constants to match the requirements of the backend. It would be nice for it to be able to find them all by walking the varpool and the gimple of all functions, and I ran into some testcases where these copies can slip through. After a while it occurred to me that the copy is probably a relic of old obstack times and not necessary anymore. And indeed, in 2.7.2.3 the only call to it looked like this: push_obstacks_nochange (); suspend_momentary (); p-exp = copy_constant (exp); pop_obstacks (); It seems unlikely that anything in the compiler would want to modify a constant after it has been put into the constant pool, and even if so, it should have the responsibility of making the copy itself. So, as an experiment I've bootstrapped and tested the following patch (x86_64-linux), and that seemed to work. Ok to apply along with removal of the now unused copy_constant? Ok if nobody raises issues within 24h. Thanks, Richard. Bernd
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On May 27, 2014 6:12:58 PM CEST, Eric Botcazou ebotca...@adacore.com wrote: I'm asking to merge them (move them to fold_comparison). Done (in the end the patch removes more lines than it adds :-). That's what I like! Tested on x86_64-suse-linux with no regressions. OK. Thanks, Richard. 2014-05-27 Eric Botcazou ebotca...@adacore.com * fold-const.c (fold_comparison): Clean up and extend X +- C1 CMP C2 to X CMP C2 -+ C1 transformation to EQ_EXPR/NE_EXPR. Add X - Y CMP 0 to X CMP Y transformation. (fold_binary_loc) EQ_EXPR/NE_EXPR: Remove same transformations. 2014-05-27 Eric Botcazou ebotca...@adacore.com * gcc.dg/fold-compare-8.c: New test. * gcc.dg/Wstrict-overflow-25.c: Likewise.
Re: Fix old bug in div_and_round_double
On Tue, May 27, 2014 at 10:29 PM, Eric Botcazou ebotca...@adacore.com wrote: This is an old bug in div_and_round_double for ROUND_DIV_EXPR: when the code detects that it needs to adjust the quotient, it needs to decide whether it increases or decreases it by 1. This only depends on the expected sign of the quotient, but the test reads: if (*hquo 0) So if the quotient is 0, the code will always bump it, thus yielding 1, even if the expected quotient is negative. This causes the attached Ada testcase to fail at -O on all active branches... except for mainline, because of the correct implementation of the same test in wi::div_round: if (wi::neg_p (x, sgn) != wi::neg_p (y, sgn)) return quotient - 1; else return quotient + 1; It turns out that div_and_round_double has a predicate quo_neg that implements the same test as the wide-int one and ought to be used here. Tested on x86_64-suse-linux (of course this probably doesn't mean anything on the mainline, but it was also test with 4.7 and 4.9 compilers) and applied on the mainline as obvious. I suppose you also install on branches? Thanks, Richard. 2014-05-27 Eric Botcazou ebotca...@adacore.com * double-int.c (div_and_round_double) ROUND_DIV_EXPR: Use the proper predicate to detect a negative quotient. 2014-05-27 Eric Botcazou ebotca...@adacore.com * gnat.dg/overflow_fixed.adb: New test. -- Eric Botcazou
Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)
On Wed, May 28, 2014 at 8:50 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, May 26, 2014 at 08:36:31AM -0700, Mike Stump wrote: On May 26, 2014, at 2:22 AM, FX fxcoud...@gmail.com wrote: This causes GCC bootstrap to fail on Darwin systems (whose system compiler is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure it’s the right call, see below), I’ve filed a separate report for the bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315). Since my PR has been closed twice by Andrew Pinski (“it’s clang’s fault, bouh ouh”), I’d ask the maintainers to step in. Can we please provide a GCC that works for the default darwin setup? Or at least drop darwin as secondary target and document the failure? The best coarse of action, post a patch, have it reviewed and put in. Current action, a patch has been posted, the review is outstanding, I’d like to see it put in; though, I am curious why the casts were there in the first place. Note, haven't added them there, but from what I can test, the casts there can serve as a compile time check that the right type is used, e.g. unsigned long i; void foo (void) { asm volatile (# %0 %1 : =r ((unsigned long long) i) : 0 ((unsigned long long) 0)); } Ah, interesting. A not-so-hineous extension then. errors out on x86_64 -m32, but compiles fine with -m64, because in the latter case the type has the correct size, while in the former case it doesn't. So, perhaps instead of removing the casts we should replace them with some kind of static assertions (whether extern char foobar[sizeof (arg) == sizeof (UDItype) __builtin_classify_type (arg) == 1 ? 1 : -1]; or __builtin_types_compatible_p, or C++ templates for C++, ... Yeah, a portable (C and C++) static assert would be nice. And also pushing this to gmp then. In the meantime I see nothing wrong in merging from GMP. Richard. Jakub
[PATCH] Fix PR61335
This fixes PR61335 - symbolic range propagation in VRP is so weak that we didn't notice this very old serious bug ... compare_values can return don't know which is -2, but PHI node visiting handles it the same as -1 (less than...). Oops. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-05-28 Richard Biener rguent...@suse.de PR tree-optimization/61335 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and new range fails, drop to varying. * gfortran.dg/pr61335.f90: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 210973) --- gcc/tree-vrp.c (working copy) *** vrp_visit_phi_node (gimple phi) *** 8323,8330 --- 8323,8336 edges == old_edges lhs_vr-type != VR_UNDEFINED) { + /* Compare old and new ranges, fall back to varying if the + values are not comparable. */ int cmp_min = compare_values (lhs_vr-min, vr_result.min); + if (cmp_min == -2) + goto varying; int cmp_max = compare_values (lhs_vr-max, vr_result.max); + if (cmp_max == -2) + goto varying; /* For non VR_RANGE or for pointers fall back to varying if the range changed. */ Index: gcc/testsuite/gfortran.dg/pr61335.f90 === *** gcc/testsuite/gfortran.dg/pr61335.f90 (revision 0) --- gcc/testsuite/gfortran.dg/pr61335.f90 (working copy) *** *** 0 --- 1,117 + ! { dg-do run } + ! { dg-additional-options -fbounds-check } + MODULE cp_units + + INTEGER, PARAMETER :: default_string_length=80, dp=KIND(0.0D0) + + LOGICAL, PRIVATE, PARAMETER :: debug_this_module=.TRUE. + CHARACTER(len=*), PARAMETER, PRIVATE :: moduleN = 'cp_units' + INTEGER, SAVE, PRIVATE :: last_unit_id=0, last_unit_set_id=0 + + INTEGER, PARAMETER, PUBLIC :: cp_unit_max_kinds=8, cp_unit_basic_desc_length=15, +cp_unit_desc_length=cp_unit_max_kinds*cp_unit_basic_desc_length, cp_ukind_max=9 + + CONTAINS + + FUNCTION cp_to_string(i) RESULT(res) + INTEGER, INTENT(in) :: i + CHARACTER(len=6) :: res + + INTEGER :: iostat + REAL(KIND=dp):: tmp_r + + IF (i99 .OR. i-9) THEN +tmp_r=i +WRITE (res,fmt='(es6.1)',iostat=iostat) tmp_r + ELSE +WRITE (res,fmt='(i6)',iostat=iostat) i + END IF + IF (iostat/=0) THEN +STOP 7 + END IF + END FUNCTION cp_to_string + + SUBROUTINE cp_unit_create(string) + CHARACTER(len=*), INTENT(in) :: string + + CHARACTER(len=*), PARAMETER :: routineN = 'cp_unit_create', + routineP = moduleN//':'//routineN + + CHARACTER(default_string_length) :: desc + CHARACTER(LEN=40):: formatstr + INTEGER :: i_high, i_low, i_unit, + len_string, next_power + INTEGER, DIMENSION(cp_unit_max_kinds):: kind_id, power, unit_id + LOGICAL :: failure + + failure=.FALSE. + unit_id=cp_units_none + kind_id=cp_ukind_none + power=0 + i_low=1 + i_high=1 + len_string=LEN(string) + i_unit=0 + next_power=1 + DO WHILE(i_lowlen_string) +IF (string(i_low:i_low)/=' ') EXIT +i_low=i_low+1 + END DO + i_high=i_low + DO WHILE(i_high=len_string) +IF ( string(i_high:i_high)==' '.OR.string(i_high:i_high)=='^'.OR. + string(i_high:i_high)=='*'.OR.string(i_high:i_high)=='/') EXIT +i_high=i_high+1 + END DO + DO WHILE(.NOT.failure) +IF (i_high=i_low.OR.i_lowlen_string) EXIT +i_unit=i_unit+1 +IF (i_unitcp_unit_max_kinds) THEN + EXIT +END IF +power(i_unit)=next_power +! parse op +i_low=i_high +DO WHILE(i_low=len_string) + IF (string(i_low:i_low)/=' ') EXIT + i_low=i_low+1 +END DO +i_high=i_low +DO WHILE(i_high=len_string) + IF ( string(i_high:i_high)==' '.OR.string(i_high:i_high)=='^'.OR. +string(i_high:i_high)=='*'.OR.string(i_high:i_high)=='/') EXIT + i_high=i_high+1 +END DO +IF (i_highi_low.OR.i_lowlen_string) EXIT + +IF (i_high=len_string) THEN + IF (string(i_low:i_high)=='^') THEN + i_low=i_high+1 + DO WHILE(i_low=len_string) + IF (string(i_low:i_low)/=' ') EXIT + i_low=i_low+1 + END DO + i_high=i_low + DO WHILE(i_high=len_string) + SELECT CASE(string(i_high:i_high)) + CASE('+','-','0','1','2','3','4','5','6','7
Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)
On Wed, May 28, 2014 at 10:24 AM, FX fxcoud...@gmail.com wrote: Yeah, a portable (C and C++) static assert would be nice. And also pushing this to gmp then. In the meantime I see nothing wrong in merging from GMP. One question, one comment: 1. can I count your “I see nothing wrong” as an approval, as in “global reviewers can approve changes to any part of the compiler or associated libraries”? Well, kind of. But Jakub is as well, so I don't want to override him. So please wait for an ack from Jakub. I agree with him that the casts served a purpose and that, if removed, they need to be replaced with an appropriate assertion measure. Given that inline asm is a GCC extension calling those casts another extension from the LLVM side is really odd. In fact I think the casts are a very good way of doing this kind of assertions. Are they documented in that regard? If so I'd say it's really really LLVM that should be fixed and the workaround on the GCC side is to pass that -fhineous-gnu-extensions flag. 2. I think your quotes around “merging” mean you’re not actually thinking of a merge, but for clarification’s sake: GMP’s longlong.h has apparently a long history of its own, and has many differences with GCC’s version. The closest thing to an “upstream” for us would probably be glibc (see the diff attached), from which we last merged on 2014-04-22. I see. I suppose the gcc side includes the proposed patch and glibc still has those casts, right? In that case there is nothing to merge from glibc (but to glibc eventually). Richard. Thanks, FX
Re: Darwin bootstrap failure following wide int merge
On Wed, May 28, 2014 at 10:24 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, May 28, 2014 at 8:50 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, May 26, 2014 at 08:36:31AM -0700, Mike Stump wrote: On May 26, 2014, at 2:22 AM, FX fxcoud...@gmail.com wrote: This causes GCC bootstrap to fail on Darwin systems (whose system compiler is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure it’s the right call, see below), I’ve filed a separate report for the bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315). Since my PR has been closed twice by Andrew Pinski (“it’s clang’s fault, bouh ouh”), I’d ask the maintainers to step in. Can we please provide a GCC that works for the default darwin setup? Or at least drop darwin as secondary target and document the failure? The best coarse of action, post a patch, have it reviewed and put in. Current action, a patch has been posted, the review is outstanding, I’d like to see it put in; though, I am curious why the casts were there in the first place. Note, haven't added them there, but from what I can test, the casts there can serve as a compile time check that the right type is used, e.g. unsigned long i; void foo (void) { asm volatile (# %0 %1 : =r ((unsigned long long) i) : 0 ((unsigned long long) 0)); } Ah, interesting. A not-so-hineous extension then. In that case, how about just protecting the include with: #if GCC_VERSION = 4300 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) rather than: #if GCC_VERSION = 3000 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) so that clang will fail the version check? At the end of the day we only really care what happens during stage 2 and 3. Cross-compilers built with recentish gccs will still benefit. Works for me (thus, pre-approved with a comment explaining the version choice). Thanks, Richard. Thanks, Richard
Re: Darwin bootstrap failure following wide int merge
On Wed, May 28, 2014 at 11:40 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, May 28, 2014 at 10:24 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, May 28, 2014 at 8:50 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, May 26, 2014 at 08:36:31AM -0700, Mike Stump wrote: On May 26, 2014, at 2:22 AM, FX fxcoud...@gmail.com wrote: This causes GCC bootstrap to fail on Darwin systems (whose system compiler is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure it’s the right call, see below), I’ve filed a separate report for the bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315). Since my PR has been closed twice by Andrew Pinski (“it’s clang’s fault, bouh ouh”), I’d ask the maintainers to step in. Can we please provide a GCC that works for the default darwin setup? Or at least drop darwin as secondary target and document the failure? The best coarse of action, post a patch, have it reviewed and put in. Current action, a patch has been posted, the review is outstanding, I’d like to see it put in; though, I am curious why the casts were there in the first place. Note, haven't added them there, but from what I can test, the casts there can serve as a compile time check that the right type is used, e.g. unsigned long i; void foo (void) { asm volatile (# %0 %1 : =r ((unsigned long long) i) : 0 ((unsigned long long) 0)); } Ah, interesting. A not-so-hineous extension then. In that case, how about just protecting the include with: #if GCC_VERSION = 4300 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) rather than: #if GCC_VERSION = 3000 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) so that clang will fail the version check? At the end of the day we only really care what happens during stage 2 and 3. Cross-compilers built with recentish gccs will still benefit. Works for me (thus, pre-approved with a comment explaining the version choice). Btw, testing coverage would ask for defined (__OPTIMIZE__), too, disabling that path in stage1 unconditionally even for new GCC. Richard. Thanks, Richard. Thanks, Richard
Re: [PATCH, V2] Fix an AARCH64/ARM performance regression
On Wed, May 28, 2014 at 11:02 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, But the coment previously read /* A constant address in TO_RTX can have VOIDmode, we must not try to call force_reg for that case. Avoid that case. */ and you are removing that check. So I guess you want to retain GET_MODE (XEXP (to_rtx, 0)) != VOIDmode or investigate why we don't need to avoid this case anymore. In fact, that's probably the only change necessary, just drop the == BLKmode check? Your lengthy description doesn't reveal if you investigated that. It seems to me it tries to avoid ajdusting MEM (symbol-ref) for example? Maybe also for optimization reasons? I have now found out, what kind of MEM it is, which has a VOIDmode address... Just a few lines above, there is this: if (!MEM_P (to_rtx)) { /* We can get constant negative offsets into arrays with broken user code. Translate this to a trap instead of ICEing. */ gcc_assert (TREE_CODE (offset) == INTEGER_CST); expand_builtin_trap (); to_rtx = gen_rtx_MEM (BLKmode, const0_rtx); } now, I have never ever seen this block executed. Even the examples from PR 23714 do not execute this block any more. But const0_rtx has VOIDmode. This would satisfy the condition. So if I try: to_rtx = gen_rtx_MEM (BLKmode, const0_rtx); force_reg(mode1, to_rtx); Voila: assertion in emit_move_insn. Somehow this matches literally to what Jeff wrote in the comment: A constant address in TO_RTX can have VOIDmode, we must not try to call force_reg for that case. But nothing happens with: to_rtx = gen_rtx_MEM (BLKmode, const0_rtx); to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT) also the rest of the code generation works, and of course this expands a null pointer access. Adding a new comment before the block reflecting your analysis above would be nice as well. done, see the attached patch v2. I have also removed the check for MEM_P (to_rtx) in the first if-statement, because this is always satisfied, after the if (!MEM_P(to_rtx))-block above. Now both places have exactly the same logic. Boot-strapped regression-tested in x86_64-linux-gnu. OK for trunk? Ok. Thanks, Richard. Thanks Bernd.
Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)
On Wed, May 28, 2014 at 11:48 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, May 28, 2014 at 11:38:55AM +0200, Richard Biener wrote: On Wed, May 28, 2014 at 10:24 AM, FX fxcoud...@gmail.com wrote: Yeah, a portable (C and C++) static assert would be nice. And also pushing this to gmp then. In the meantime I see nothing wrong in merging from GMP. One question, one comment: 1. can I count your “I see nothing wrong” as an approval, as in “global reviewers can approve changes to any part of the compiler or associated libraries”? Well, kind of. But Jakub is as well, so I don't want to override him. So please wait for an ack from Jakub. I agree with him that the casts served a purpose and that, if removed, they need to be replaced with an appropriate assertion measure. I think my preference would be to change include/ansidecl.h to: /* This macro simplifies testing whether we are using gcc, and if it is of a particular minimum version. (Both major minor numbers are significant.) This macro will evaluate to 0 if we are not using gcc at all.*/ #ifndef GCC_VERSION +/* Some compilers pretend to be GCC, even when they are not. */ +#if defined(__clang__) || defined(__INTEL_COMPILER) +#define GCC_VERSION 0 +#else #define GCC_VERSION (__GNUC__ * 1000 + __GNUC_MINOR__) +#endif #endif /* GCC_VERSION */ so that we really can trust the GCC_VERSION macro, casts in lhs of inline asm isn't the only incompatibility clang has. Any other compilers that define __GNUC__? Every one I guess. ICC 9 has it defined to 4, __GNUC_MINOR__ to 0. Richard. Jakub
Re: Darwin bootstrap failure following wide int merge (was: we are starting the wide int merge)
On Wed, May 28, 2014 at 12:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Wed, May 28, 2014 at 11:48 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, May 28, 2014 at 11:38:55AM +0200, Richard Biener wrote: On Wed, May 28, 2014 at 10:24 AM, FX fxcoud...@gmail.com wrote: Yeah, a portable (C and C++) static assert would be nice. And also pushing this to gmp then. In the meantime I see nothing wrong in merging from GMP. One question, one comment: 1. can I count your “I see nothing wrong” as an approval, as in “global reviewers can approve changes to any part of the compiler or associated libraries”? Well, kind of. But Jakub is as well, so I don't want to override him. So please wait for an ack from Jakub. I agree with him that the casts served a purpose and that, if removed, they need to be replaced with an appropriate assertion measure. I think my preference would be to change include/ansidecl.h to: /* This macro simplifies testing whether we are using gcc, and if it is of a particular minimum version. (Both major minor numbers are significant.) This macro will evaluate to 0 if we are not using gcc at all.*/ #ifndef GCC_VERSION +/* Some compilers pretend to be GCC, even when they are not. */ +#if defined(__clang__) || defined(__INTEL_COMPILER) +#define GCC_VERSION 0 +#else #define GCC_VERSION (__GNUC__ * 1000 + __GNUC_MINOR__) +#endif #endif /* GCC_VERSION */ so that we really can trust the GCC_VERSION macro, casts in lhs of inline asm isn't the only incompatibility clang has. Any other compilers that define __GNUC__? Every one I guess. ICC 9 has it defined to 4, __GNUC_MINOR__ to 0. So if we want to go down that route I'd rather change the configury that checks whether we are using GCC to be more pedantic and for example parse $CC --version output (not for the actual version but for whether $CC is GCC). Richard. Richard. Jakub
Re: Darwin bootstrap failure following wide int merge
On Wed, May 28, 2014 at 11:49 AM, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, May 28, 2014 at 11:40 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, May 28, 2014 at 10:24 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, May 28, 2014 at 8:50 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, May 26, 2014 at 08:36:31AM -0700, Mike Stump wrote: On May 26, 2014, at 2:22 AM, FX fxcoud...@gmail.com wrote: This causes GCC bootstrap to fail on Darwin systems (whose system compiler is clang-based). Since PR 61146 was resolved as INVALID (but I’m not sure it’s the right call, see below), I’ve filed a separate report for the bootstrap issue (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61315). Since my PR has been closed twice by Andrew Pinski (“it’s clang’s fault, bouh ouh”), I’d ask the maintainers to step in. Can we please provide a GCC that works for the default darwin setup? Or at least drop darwin as secondary target and document the failure? The best coarse of action, post a patch, have it reviewed and put in. Current action, a patch has been posted, the review is outstanding, I’d like to see it put in; though, I am curious why the casts were there in the first place. Note, haven't added them there, but from what I can test, the casts there can serve as a compile time check that the right type is used, e.g. unsigned long i; void foo (void) { asm volatile (# %0 %1 : =r ((unsigned long long) i) : 0 ((unsigned long long) 0)); } Ah, interesting. A not-so-hineous extension then. In that case, how about just protecting the include with: #if GCC_VERSION = 4300 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) rather than: #if GCC_VERSION = 3000 (W_TYPE_SIZE == 32 || defined (__SIZEOF_INT128__)) so that clang will fail the version check? At the end of the day we only really care what happens during stage 2 and 3. Cross-compilers built with recentish gccs will still benefit. Works for me (thus, pre-approved with a comment explaining the version choice). Btw, testing coverage would ask for defined (__OPTIMIZE__), too, disabling that path in stage1 unconditionally even for new GCC. Ah, but surely the day is near when we use -O or -Og for stage1? I've been using -O for a good few years now and it definitely makes things faster (and without affecting debugging too much -- in the rare cases where it does affect debugging, a recompile with -g is very quick). ATM we get the testing coverage for i686 and ppc32 hosts. So TBH I'd prefer to keep it simple and just bump the version number. Works for me (though see Jakubs idea on the other thread, so please wait until we settled on a solution). Richard. Thanks, Richard
Re: Mark more constants readonly
On Wed, May 28, 2014 at 12:00 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 05/27/2014 04:57 PM, Richard Biener wrote: On Tue, May 27, 2014 at 3:13 PM, Bernd Schmidt ber...@codesourcery.com wrote: I noticed that string constants built by the Fortran frontend don't set TREE_READONLY for STRING_CST (and subsequently noticed that nothing seems to set it for COMPLEX_CST). That was confusing the ptx backend I'm working on. The -fwritable-strings option for C was removed a while ago, so I tested the following patch to just set the flag unconditionally, and passed bootstrap and test on x86_64-linux. Ok? Hmm? Not so obvious. TREE_READONLY has no purpose on tcc_constant nodes if I read documentation correctly (tree.h). In fact base.readonly_flag is unused for tcc_constant (and would be redundant with TREE_CONSTANT). Well, the C frontend still sets it for strings (in fix_string_type), and gcc.dg/lvalue-5.c fails if that is removed. So things are a little inconsistent between frontends. Then fix that instead. The ptx backend shouldn't use TREE_READONLY on tcc_constant, so you can resolve the issue completely within the ptx backend anyway. Richard. Bernd
Re: Fix old bug in div_and_round_double
On Wed, May 28, 2014 at 12:17 PM, Eric Botcazou ebotca...@adacore.com wrote: I suppose you also install on branches? No plan to do so since this isn't a regression, unless you insist. :-) Well, a wrong-code bug plus a very obvious fix certainly qualifies. Richard. -- Eric Botcazou
Re: Fix old bug in div_and_round_double
On Wed, May 28, 2014 at 12:23 PM, Eric Botcazou ebotca...@adacore.com wrote: Well, a wrong-code bug plus a very obvious fix certainly qualifies. Fine with me, onto which branch(es) do you want me to put it? Where you have tested it already, no need to spend extra cycles. Richard. -- Eric Botcazou
[PATCH] Fix PR61045
The following fixes bogus folding (introducing signed overflow) for the X +- C1 CMP Y +- C2 to X CMP Y +- C2 +- C1 transform where we let through a sign-change of the remaining constant operand. Fixed as follows, bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-05-28 Richard Biener rguent...@suse.de PR middle-end/61045 * fold-const.c (fold_comparison): When folding X +- C1 CMP Y +- C2 to X CMP Y +- C2 +- C1 also ensure the sign of the remaining constant operand stays the same. * gcc.dg/pr61045.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 211011) +++ gcc/fold-const.c(working copy) @@ -9239,7 +9239,7 @@ fold_comparison (location_t loc, enum tr /* Transform comparisons of the form X +- C1 CMP Y +- C2 to X CMP Y +- C2 +- C1 for signed X, Y. This is valid if the resulting offset is smaller in absolute value than the - original one. */ + original one and has the same sign. */ if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST @@ -9258,32 +9258,35 @@ fold_comparison (location_t loc, enum tr a comparison); /* Put the constant on the side where it doesn't overflow and is -of lower absolute value than before. */ +of lower absolute value and of same sign than before. */ cst = int_const_binop (TREE_CODE (arg0) == TREE_CODE (arg1) ? MINUS_EXPR : PLUS_EXPR, const2, const1); if (!TREE_OVERFLOW (cst) - tree_int_cst_compare (const2, cst) == tree_int_cst_sgn (const2)) + tree_int_cst_compare (const2, cst) == tree_int_cst_sgn (const2) + tree_int_cst_sgn (cst) == tree_int_cst_sgn (const2)) { fold_overflow_warning (warnmsg, WARN_STRICT_OVERFLOW_COMPARISON); return fold_build2_loc (loc, code, type, - variable1, - fold_build2_loc (loc, - TREE_CODE (arg1), TREE_TYPE (arg1), - variable2, cst)); + variable1, + fold_build2_loc (loc, TREE_CODE (arg1), + TREE_TYPE (arg1), + variable2, cst)); } cst = int_const_binop (TREE_CODE (arg0) == TREE_CODE (arg1) ? MINUS_EXPR : PLUS_EXPR, const1, const2); if (!TREE_OVERFLOW (cst) - tree_int_cst_compare (const1, cst) == tree_int_cst_sgn (const1)) + tree_int_cst_compare (const1, cst) == tree_int_cst_sgn (const1) + tree_int_cst_sgn (cst) == tree_int_cst_sgn (const1)) { fold_overflow_warning (warnmsg, WARN_STRICT_OVERFLOW_COMPARISON); return fold_build2_loc (loc, code, type, - fold_build2_loc (loc, TREE_CODE (arg0), TREE_TYPE (arg0), - variable1, cst), - variable2); + fold_build2_loc (loc, TREE_CODE (arg0), + TREE_TYPE (arg0), + variable1, cst), + variable2); } } Index: gcc/testsuite/gcc.dg/pr61045.c === --- gcc/testsuite/gcc.dg/pr61045.c (revision 0) +++ gcc/testsuite/gcc.dg/pr61045.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do run } */ +/* { dg-options -fstrict-overflow } */ + +int main () +{ + int a = 0; + int b = __INT_MAX__; + int t = (a - 2) (b - 1); + if (t != 0) +__builtin_abort(); + return 0; +}
[PATCH] Less noisy VRP dump, improve copy handling
The following patch makes the VRP dump less vertical space noisy. It also makes handling of the two forms of copies, a_1 = b_2; and a_1 = PHI b_2 behave more similar by also copying VR_UNDEFINED ranges in the first case and by creating a [b_2, b_2] symbolic range in the second case (if b_2 has a VR_VARYING range). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-05-28 Richard Biener rguent...@suse.de * tree-ssa-propagate.c (add_control_edge): Print less vertical space. * tree-vrp.c (extract_range_from_ssa_name): Also copy VR_UNDEFINED. (vrp_visit_assignment_or_call): Print less vertical space. (vrp_visit_stmt): Likewise. (vrp_visit_phi_node): Likewise. For a PHI argument with VR_VARYING range consider recording it as copy. Index: gcc/tree-ssa-propagate.c === *** gcc/tree-ssa-propagate.c(revision 211015) --- gcc/tree-ssa-propagate.c(working copy) *** add_control_edge (edge e) *** 301,307 cfg_blocks_add (bb); if (dump_file (dump_flags TDF_DETAILS)) ! fprintf (dump_file, Adding Destination of edge (%d - %d) to worklist\n\n, e-src-index, e-dest-index); } --- 301,307 cfg_blocks_add (bb); if (dump_file (dump_flags TDF_DETAILS)) ! fprintf (dump_file, \nAdding Destination of edge (%d - %d) to worklist\n, e-src-index, e-dest-index); } Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 211015) --- gcc/tree-vrp.c (working copy) *** extract_range_from_ssa_name (value_range *** 1810,1816 { value_range_t *var_vr = get_value_range (var); ! if (var_vr-type != VR_UNDEFINED var_vr-type != VR_VARYING) copy_value_range (vr, var_vr); else set_value_range (vr, VR_RANGE, var, var, NULL); --- 1810,1816 { value_range_t *var_vr = get_value_range (var); ! if (var_vr-type != VR_VARYING) copy_value_range (vr, var_vr); else set_value_range (vr, VR_RANGE, var, var, NULL); *** vrp_visit_assignment_or_call (gimple stm *** 6679,6685 print_generic_expr (dump_file, lhs, 0); fprintf (dump_file, : ); dump_value_range (dump_file, new_vr); ! fprintf (dump_file, \n\n); } if (new_vr.type == VR_VARYING) --- 6679,6685 print_generic_expr (dump_file, lhs, 0); fprintf (dump_file, : ); dump_value_range (dump_file, new_vr); ! fprintf (dump_file, \n); } if (new_vr.type == VR_VARYING) *** vrp_visit_stmt (gimple stmt, edge *taken *** 7473,7479 { fprintf (dump_file, \nVisiting statement:\n); print_gimple_stmt (dump_file, stmt, 0, dump_flags); - fprintf (dump_file, \n); } if (!stmt_interesting_for_vrp (stmt)) --- 7473,7478 *** vrp_visit_phi_node (gimple phi) *** 8242,8248 if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, ! \nArgument #%d (%d - %d %sexecutable)\n, (int) i, e-src-index, e-dest-index, (e-flags EDGE_EXECUTABLE) ? : not ); } --- 8241,8247 if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, ! Argument #%d (%d - %d %sexecutable)\n, (int) i, e-src-index, e-dest-index, (e-flags EDGE_EXECUTABLE) ? : not ); } *** vrp_visit_phi_node (gimple phi) *** 8260,8275 /* Do not allow equivalences or symbolic ranges to leak in from backedges. That creates invalid equivalencies. See PR53465 and PR54767. */ ! if (e-flags EDGE_DFS_BACK ! (vr_arg.type == VR_RANGE ! || vr_arg.type == VR_ANTI_RANGE)) { ! vr_arg.equiv = NULL; ! if (symbolic_range_p (vr_arg)) { ! vr_arg.type = VR_VARYING; ! vr_arg.min = NULL_TREE; ! vr_arg.max = NULL_TREE; } } } --- 8259,8288 /* Do not allow equivalences or symbolic ranges to leak in from backedges. That creates invalid equivalencies. See PR53465 and PR54767. */ ! if (e-flags EDGE_DFS_BACK) { ! if (vr_arg.type == VR_RANGE ! || vr_arg.type == VR_ANTI_RANGE) { ! vr_arg.equiv = NULL; ! if (symbolic_range_p (vr_arg)) ! { ! vr_arg.type = VR_VARYING; ! vr_arg.min
Re: Darwin bootstrap failure following wide int merge
On Wed, May 28, 2014 at 3:15 PM, FX fxcoud...@gmail.com wrote: After lengthy IRC discussions, what Richard and I can live with is !defined(__clang__) in this particular case that uses longlong.h in GCC sources, with a comment why. I’ll test this patch and commit if there is no problem. But right now, current trunk doesn’t build on x86_64-apple-darwin due to error below. Richard, could this be due to your revision 211013? Hum, yeah. But why does it even warn if sizeof (long) == sizeof (long long)? I suppose casting the result of CWI_ELT () to uint64_t fixes this. Do similar errors happen elsewhere? (the hex printfs expect unsigned types but CWI_ELT returns a signed HWI) Richard. FX ../../trunk/gcc/rtl.c: In function ‘void cwi_output_hex(FILE*, const_rtx)’: ../../trunk/gcc/rtl.c:239:62: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long int’ [-Werror=format=] fprintf (outfile, HOST_WIDE_INT_PRINT_HEX, CWI_ELT (x, --i)); ^ ../../trunk/gcc/rtl.c:239:62: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long int’ [-Werror=format=] ../../trunk/gcc/rtl.c:241:69: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long int’ [-Werror=format=] fprintf (outfile, HOST_WIDE_INT_PRINT_PADDED_HEX, CWI_ELT (x, i)); ^ ../../trunk/gcc/rtl.c:241:69: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long int’ [-Werror=format=] cc1plus: all warnings being treated as errors make[3]: *** [build/rtl.o] Error 1 make[2]: *** [all-stage2-gcc] Error 2 make[1]: *** [stage2-bubble] Error 2 make: *** [all] Error 2
[PATCH] Revert HWI printing change
I have reverted the following for now. Richard. 2014-05-28 Richard Biener rguent...@suse.de Revert 2014-05-28 Richard Biener rguent...@suse.de * hwint.h (HOST_WIDE_INT_PRINT_*): Define in terms of PRI*64. Index: gcc/hwint.h === *** gcc/hwint.h.orig2014-05-26 10:49:23.009339524 +0200 --- gcc/hwint.h 2014-05-26 11:09:43.597255488 +0200 *** extern char sizeof_long_long_must_be_8[s *** 109,156 typedef before using the __asm_fprintf__ format attribute. */ typedef HOST_WIDE_INT __gcc_host_wide_int__; - /* Various printf format strings for HOST_WIDE_INT. */ - - #if HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG - # define HOST_WIDE_INT_PRINT HOST_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C L - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_FORMAT x%016 HOST_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_FORMAT x - #else - # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C LL - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_LONG_FORMAT x%016 HOST_LONG_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_LONG_FORMAT x - #endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */ - - #define HOST_WIDE_INT_PRINT_DEC % HOST_WIDE_INT_PRINT d - #define HOST_WIDE_INT_PRINT_DEC_C HOST_WIDE_INT_PRINT_DEC HOST_WIDE_INT_PRINT_C - #define HOST_WIDE_INT_PRINT_UNSIGNED % HOST_WIDE_INT_PRINT u - #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x - #define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x - /* Provide C99 inttypes.h style format definitions for 64bits. */ #ifndef HAVE_INTTYPES_H #undef PRId64 ! #define PRId64 HOST_WIDE_INT_PRINT d #undef PRIi64 ! #define PRIi64 HOST_WIDE_INT_PRINT i #undef PRIo64 ! #define PRIo64 HOST_WIDE_INT_PRINT o #undef PRIu64 ! #define PRIu64 HOST_WIDE_INT_PRINT u #undef PRIx64 ! #define PRIx64 HOST_WIDE_INT_PRINT x #undef PRIX64 ! #define PRIX64 HOST_WIDE_INT_PRINT X #endif /* Define HOST_WIDEST_FAST_INT to the widest integer type supported efficiently in hardware. (That is, the widest integer type that fits in a hardware register.) Normally this is long but on some hosts it --- 75,119 typedef before using the __asm_fprintf__ format attribute. */ typedef HOST_WIDE_INT __gcc_host_wide_int__; /* Provide C99 inttypes.h style format definitions for 64bits. */ #ifndef HAVE_INTTYPES_H + #if HOST_BITS_PER_LONG == 64 + # define GCC_PRI64 HOST_LONG_FORMAT + #else + # define GCC_PRI64 HOST_LONG_LONG_FORMAT + #endif #undef PRId64 ! #define PRId64 GCC_PRI64 d #undef PRIi64 ! #define PRIi64 GCC_PRI64 i #undef PRIo64 ! #define PRIo64 GCC_PRI64 o #undef PRIu64 ! #define PRIu64 GCC_PRI64 u #undef PRIx64 ! #define PRIx64 GCC_PRI64 x #undef PRIX64 ! #define PRIX64 GCC_PRI64 X ! #endif ! ! /* Various printf format strings for HOST_WIDE_INT. */ ! ! #if HOST_BITS_PER_LONG == 64 ! # define HOST_WIDE_INT_PRINT HOST_LONG_FORMAT ! # define HOST_WIDE_INT_PRINT_C L ! #else ! # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT ! # define HOST_WIDE_INT_PRINT_C LL #endif + #define HOST_WIDE_INT_PRINT_DEC % PRId64 + #define HOST_WIDE_INT_PRINT_DEC_C % PRId64 HOST_WIDE_INT_PRINT_C + #define HOST_WIDE_INT_PRINT_UNSIGNED % PRIu64 + #define HOST_WIDE_INT_PRINT_HEX %# PRIx64 + #define HOST_WIDE_INT_PRINT_HEX_PURE % PRIx64 + #define HOST_WIDE_INT_PRINT_DOUBLE_HEX 0x% PRIx64 %016 PRIx64 + #define HOST_WIDE_INT_PRINT_PADDED_HEX %016 PRIx64 + /* Define HOST_WIDEST_FAST_INT to the widest integer type supported efficiently in hardware. (That is, the widest integer type that fits in a hardware register.) Normally this is long but on some hosts it
Re: [PATCH][2/n] Always 64bit-HWI cleanups
On Wed, 28 May 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The following changes the configury to insist on [u]int64_t being available and removes the very old __int64 case. Autoconf doesn't check for it, support came in via a big merge in Dec 2002, r60174, and it was never used on the libcpp side until I fixed that with the last patch of this series, so we couldn't have relied on it at least since libcpp was introduced. Both libcpp and vmsdbg.h now use [u]int64_t, switching HOST_WIDE_INT to literally use int64_t has to be done with the grand renaming of all users due to us using 'unsigned HOST_WIDE_INT'. Btw, I couldn't find any standard way of writing [u]int64_t literals (substitution for HOST_WIDE_INT_C) nor one for printf formats (substitutions for HOST_WIDE_INT_PRINT and friends). I'll consider doing s/HOST_WIDE_INT/[U]INT64/ there if nobody comes up with a better plan. Not sure whether you meant that to apply to both groups, but as far as the HOST_WIDE_INT_C replacement goes, how about just using int64_t (N) instead of HOST_WIDE_INT_C (N) or INT64_C (N)? int64_t (N) would already be taken, but yes, I considered [U]INT64_C (N). Let's see. Richard.
[PATCH] Fix PR61378
Committed as obvious. Richard. 2014-06-02 Richard Biener rguent...@suse.de PR tree-optimization/61378 * tree-ssa-sccvn.c (vn_reference_lookup_3): Initialize valueized_anything. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 211021) --- gcc/tree-ssa-sccvn.c(working copy) *** vn_reference_lookup_3 (ao_ref *ref, tree *** 1613,1619 conditional calls to free falsely clobbering ref because of imprecise points-to info of the argument. */ tree oldargs[4]; ! bool valueized_anything; for (unsigned i = 0; i gimple_call_num_args (def_stmt); ++i) { oldargs[i] = gimple_call_arg (def_stmt, i); --- 1613,1619 conditional calls to free falsely clobbering ref because of imprecise points-to info of the argument. */ tree oldargs[4]; ! bool valueized_anything = false; for (unsigned i = 0; i gimple_call_num_args (def_stmt); ++i) { oldargs[i] = gimple_call_arg (def_stmt, i);
GCC 4.7.4 Status Report, branch frozen for release (candidate)
Status == The GCC 4.7 branch is now frozen as I am preparing a first release candidate for GCC 4.7.4. All changes from now on require release manager approval. After GCC 4.7.4 is released the branch will be closed. Previous Report === https://gcc.gnu.org/ml/gcc/2013-04/msg00121.html I will send the next status report once GCC 4.7.4 RC1 is ready.
[PATCH] Testcase for PR61346
Committed. Richard. 2014-06-02 Richard Biener rguent...@suse.de PR tree-optimization/61346 * gcc.dg/torture/pr61346.c: New testcase. Index: gcc/testsuite/gcc.dg/torture/pr61346.c === --- gcc/testsuite/gcc.dg/torture/pr61346.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr61346.c (working copy) @@ -0,0 +1,162 @@ +/* { dg-do run } */ + +extern void abort (void); + +typedef int int32_t __attribute__ ((mode (SI))); +typedef int int64_t __attribute__ ((mode (DI))); +typedef __SIZE_TYPE__ size_t; + +struct slice +{ + unsigned char *data; + int64_t len; + int64_t cap; +}; + +void fail (int32_t) __attribute__ ((noinline)); +void +fail (int32_t c) +{ + if (c != 0) +abort (); +} + +struct decode_rune_ret +{ + int32_t r; + int64_t width; +}; + +struct decode_rune_ret decode_rune (struct slice) __attribute__ ((noinline)); +struct decode_rune_ret +decode_rune (struct slice s) +{ + struct decode_rune_ret dr; + dr.r = s.data[0]; + dr.width = 1; + return dr; +} + +_Bool is_space (int32_t) __attribute__ ((noinline)); +_Bool +is_space (int32_t r) +{ + return r == ' '; +} + +struct ret +{ + int64_t advance; + struct slice token; +}; + +struct ret scanwords (struct slice, _Bool) __attribute__ ((noinline)); + +struct ret +scanwords (struct slice data, _Bool ateof) +{ + int64_t advance; + struct slice token; + int64_t start = 0; + { +int64_t width; +for (width = 0; start data.len; start += width) + { + int32_t r = 0; + struct slice s; + if (start data.cap || start 0) + fail (3); + s.data = data.data + (size_t) start; + s.len = data.len - start; + s.cap = data.cap - start; + struct decode_rune_ret dr = decode_rune (s); + r = dr.r; + width = dr.width; + if (!is_space (r)) + break; + } + } + _Bool tmp = ateof; + if (tmp != 0) +goto L1; + else +goto L2; + L1: + tmp = data.len == 0; + L2: + if (tmp != 0) +goto L11; + else +goto L12; + L11: +{ + struct ret r; + advance = 0; + token.data = 0; + token.len = 0; + token.cap = 0; + r.advance = advance; + r.token = token; + return r; +} + L12:; + int64_t width; + int64_t i; + for (width = 0, i = start; i data.len; i += width) +{ + int32_t r; + struct slice s; + if (i data.cap || i 0) + fail (3); + s.data = data.data + i; + s.len = data.len - i; + s.cap = data.cap - i; + struct decode_rune_ret dr = decode_rune (s); + r = dr.r; + width = dr.width; + if (is_space (r)) + { + if (i start || i data.cap || i 0) + fail (3); + if (start data.cap || start 0) + fail (3); + struct ret r; + advance = i + width; + token.data = data.data + (size_t) start; + token.len = i - start; + token.cap = data.cap - start; + r.advance = advance; + r.token = token; + return r; + } +} + { +struct ret r; +advance = 0; +token.data = 0; +token.len = 0; +token.cap = 0; +r.advance = advance; +r.token = token; +return r; + } +} + +int +main () +{ + unsigned char buf[1000]; + struct slice s; + __builtin_memset (buf, 0, sizeof (buf)); + buf[0] = ' '; + buf[1] = 'a'; + buf[2] = ' '; + s.data = buf; + s.len = 3; + s.cap = sizeof (buf); + struct ret r; + r = scanwords (s, 1); + if (r.advance != 3 || r.token.data[0] != 'a' || r.token.len != 1) +abort (); + return 0; +}
Re: [PATCH][2/n] Always 64bit-HWI cleanups
On Mon, 2 Jun 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Wed, 28 May 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The following changes the configury to insist on [u]int64_t being available and removes the very old __int64 case. Autoconf doesn't check for it, support came in via a big merge in Dec 2002, r60174, and it was never used on the libcpp side until I fixed that with the last patch of this series, so we couldn't have relied on it at least since libcpp was introduced. Both libcpp and vmsdbg.h now use [u]int64_t, switching HOST_WIDE_INT to literally use int64_t has to be done with the grand renaming of all users due to us using 'unsigned HOST_WIDE_INT'. Btw, I couldn't find any standard way of writing [u]int64_t literals (substitution for HOST_WIDE_INT_C) nor one for printf formats (substitutions for HOST_WIDE_INT_PRINT and friends). I'll consider doing s/HOST_WIDE_INT/[U]INT64/ there if nobody comes up with a better plan. Not sure whether you meant that to apply to both groups, but as far as the HOST_WIDE_INT_C replacement goes, how about just using int64_t (N) instead of HOST_WIDE_INT_C (N) or INT64_C (N)? int64_t (N) would already be taken, but yes, I considered [U]INT64_C (N). Not sure what you mean by taken though. Isn't it just a C++-style cast? The nice thing about [u]int64_t (N) is that it Just Works for both constant and non-constant N. We shouldn't really need a special macro that can only handle constant N. E.g.: #include stdint.h uint64_t x = uint64_t (1) 48 But HOST_WIDE_INT_C can also be used to write 12443875182037483LL. Not sure if large numbers are parsed as 'long long' (give that C++04 doesn't have long long and integer literals not fitting long invoke undefined behavior). For example config/alpha/alpha.c: GEN_INT (HOST_WIDE_INT_C (0x0ffffff0)), config/alpha/alpha.c: word1 = GEN_INT (HOST_WIDE_INT_C (0xa77b0010a43b0018)); ... config/rs6000/rs6000.c: const HOST_WIDE_INT lower_32bits = HOST_WIDE_INT_C(0x); config/rs6000/rs6000.c: const HOST_WIDE_INT sign_bit = HOST_WIDE_INT_C(0x8000); ... (no significant other uses). Richard.
[PATCH] Make HOST_WIDE_INT underlying type match int64_t, adjust printing macros again
This computes the underlying type of int64_t in a configure check and uses that for HOST_WIDE_INT. This makes sure we can use the PRI?64 macros from inttypes.h for HOST_WIDE_INT_PRINT_*, which the following patch re-instantiates. The patch also makes sure inttypes.h defines PRId64 and if not, provides replacements in hwint.h (like previously), matching int64_t. Built on i586-linux-gnu and x86_64-linux-gnu (to verify both 'long' and 'long long' choices). A bootstrap and regtest on x86_64-unknown-linux-gnu is pending. Ok? Thanks, Richard. 2014-06-02 Richard Biener rguent...@suse.de * configure.ac: Check whether the underlying type of int64_t is long or long long. * configure: Regenerate. * config.in: Likewise. * hwint.h (HOST_WIDE_INT): Match the underlying type of int64_t. (HOST_WIDE_INT_PRINT_*): Define in terms of PRI*64. Index: gcc/configure.ac === *** gcc/configure.ac(revision 211125) --- gcc/configure.ac(working copy) *** if test x$ac_cv_c_uint64_t = xno -o *** 316,321 --- 316,350 AC_MSG_ERROR([uint64_t or int64_t not found]) fi + # check what underlying integer type int64_t uses + AC_LANG_PUSH(C++) + AC_CACHE_CHECK(for int64_t underlying type, ac_cv_int64_t_type, [ + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + #ifdef HAVE_STDINT_H + #include stdint.h + #endif + template typename T struct X { }; + template + struct Xlong { typedef long t; }; + ]], [[typename Xint64_t::t x;]])],[ac_cv_int64_t_type=long],[ac_cv_int64_t_type=long long])]) + if test x$ac_cv_int64_t_type = xlong; then + AC_DEFINE(INT64_T_IS_LONG, 1, + [Define if int64_t uses long as underlying type.]) + else + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + #ifdef HAVE_STDINT_H + #include stdint.h + #endif + template typename T struct X { }; + template + struct Xlong long { typedef long long t; }; + ]], [[typename Xint64_t::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses long long])]) + fi + AC_LANG_POP(C++) + + + + # - # Warnings and checking # - *** LIBS=$save_LIBS *** 1055,1067 AC_SUBST(LDEXP_LIB) # Use inttypes.h only if it exists, ! # doesn't clash with sys/types.h, and declares intmax_t. AC_MSG_CHECKING(for inttypes.h) AC_CACHE_VAL(gcc_cv_header_inttypes_h, [AC_COMPILE_IFELSE([AC_LANG_PROGRAM( ! [[#include sys/types.h #include inttypes.h]], ! [[intmax_t i = -1;]])], [gcc_cv_header_inttypes_h=yes], [gcc_cv_header_inttypes_h=no])]) AC_MSG_RESULT($gcc_cv_header_inttypes_h) --- 1084,1101 AC_SUBST(LDEXP_LIB) # Use inttypes.h only if it exists, ! # doesn't clash with sys/types.h, declares intmax_t and defines ! # PRId64 AC_MSG_CHECKING(for inttypes.h) AC_CACHE_VAL(gcc_cv_header_inttypes_h, [AC_COMPILE_IFELSE([AC_LANG_PROGRAM( ! [[#define __STDC_FORMAT_MACROS ! #include sys/types.h #include inttypes.h]], ! [[intmax_t i = -1; ! #ifndef PRId64 ! choke me ! #endif]])], [gcc_cv_header_inttypes_h=yes], [gcc_cv_header_inttypes_h=no])]) AC_MSG_RESULT($gcc_cv_header_inttypes_h) Index: gcc/hwint.h === *** gcc/hwint.h (revision 211125) --- gcc/hwint.h (working copy) *** extern char sizeof_long_long_must_be_8[s *** 46,58 #endif /* Set HOST_WIDE_INT, this should be always 64 bits. ! !With a sane ABI, 'long' is the largest efficient host integer type. !Thus, we use that unless we have to use 'long long' !because we're on a 32-bit host. */ #define HOST_BITS_PER_WIDE_INT 64 ! #if HOST_BITS_PER_LONG == 64 # define HOST_WIDE_INT long # define HOST_WIDE_INT_C(X) X ## L #else --- 46,56 #endif /* Set HOST_WIDE_INT, this should be always 64 bits. !The underlying type is matched to that of int64_t and assumed !to be either long or long long. */ #define HOST_BITS_PER_WIDE_INT 64 ! #if INT64_T_IS_LONG # define HOST_WIDE_INT long # define HOST_WIDE_INT_C(X) X ## L #else *** extern char sizeof_long_long_must_be_8[s *** 75,122 typedef before using the __asm_fprintf__ format attribute. */ typedef HOST_WIDE_INT __gcc_host_wide_int__; - /* Various printf format strings for HOST_WIDE_INT. */ - - #if HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG - # define HOST_WIDE_INT_PRINT HOST_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C L - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_FORMAT x%016 HOST_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_FORMAT x - #else - # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C LL - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_LONG_FORMAT x%016 HOST_LONG_LONG_FORMAT x - # define
Re: Eliminate write-only variables
On Sat, May 31, 2014 at 8:21 PM, Sandra Loosemore san...@codesourcery.com wrote: On 05/20/2014 04:56 AM, Richard Biener wrote: On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka hubi...@ucw.cz wrote: On 05/18/2014 08:45 PM, Sandra Loosemore wrote: On 05/18/2014 02:59 PM, Jan Hubicka wrote: For cases like local-statics-7 your approach can be saved by adding simple IPA analysis to look for static vars that are used only by one function and keeping your DSE code active for them, so we can still get rid of this special case assignments during late compilation. I am however not quite convinced it is worth the effort - do you have some real world cases where it helps? Um, the well-known benchmark. ;-) I looked at the generated code for this benchmark and see that your patch is indeed not getting rid of all the pointless static variables, while ours is, so this is quite disappointing. I'm thinking now that we will still need to retain our patch at least locally, although we'd really like to get it on trunk if possible. Yours patch can really be improved by adding simple IPA analysis to work out what variables are refernced by one function only instead of using not-longer-that-relevant local static info. As last IPA pass for each static variable with no address taken, look at all references and see if they all come from one function or functions inlined to it. Here is another testcase vaguely based on the same kind of signal-processing algorithm as the benchmark, but coded without reference to it. I have arm-none-eabi builds around for both 4.9.0 with our remove_local_statics patch applied, and trunk with your patch. With -O2, our optimization produces 23 instructions and gets rid of all 3 statics, yours produces 33 and only gets rid of 1 of them. Of course it's lame to use static variables in this way, but OTOH it's lame if GCC can't recognize them and optimize them away, too. -Sandra void fir (int *coeffs, int coeff_length, int *input, int input_length, int *output) { static int *coeffp; static int *inputp; static int *outputp; Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late. coeffp is removed late, too. int i, c, acc; for (i = 0; i input_length; i++) { coeffp = coeffs; inputp = input + i; outputp = output + i; acc = 0; for (c = 0; c coeff_length; c++) { acc += *coeffp * *inputp; coeffp++; inputp--; } It seems like AA can not work out the fact that inputp is unchanged until that late. Richi, any ideas? Well, AA is perfectly fine - it's just that this boils down to a partial redundancy problem. The stores can be removed by DCE even with Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 210635) +++ gcc/tree-ssa-dce.c (working copy) @@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple break; case GIMPLE_ASSIGN: - if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME - TREE_CLOBBER_P (gimple_assign_rhs1 (stmt))) - return; - break; + { + tree lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) == SSA_NAME +TREE_CLOBBER_P (gimple_assign_rhs1 (stmt))) + return; + if (TREE_CODE (lhs) == VAR_DECL +!TREE_ADDRESSABLE (lhs) +TREE_STATIC (lhs) +!TREE_PUBLIC (lhs) !DECL_EXTERNAL (lhs) + /* ??? Make sure lhs is only refered to from cfun-decl. */ +decl_function_context (lhs) == cfun-decl) + return; + break; + } default: break; but I don't have a good way of checking the ??? prerequesite (even without taking its address the statics may be refered to by a) inline copies, b) ipa-split function parts, c) nested functions). I'm sure IPA references are not kept up-to-date. The last reads go away with PRE at the expense of two additional induction variables. If we know that a static variable does not have its address taken and is only refered to from a single function we can in theory simply rewrite it into SSA form during early opts (before inlining its body), for example by SRA. That isn't even restricted to function-local statics (for statics used in multiple functions but not having address-taken we can do the same with doing function entry / exit load / store). I think that would be a much better IPA enabled optimization than playing games with looking at individual stmts. FAOD, I'm not currently planning to work on this any more myself. I understand Bernd's patch pretty well and could make some minor changes to clean it up if that's what it takes to get it approved, but it sounds like what's wanted is a completely different approach, again, and I don't have any confidence that I could implement
Re: [PATCH] Fix PR ipa/61190
On Mon, Jun 2, 2014 at 11:00 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, the test case g++.old-deja/g++.mike/p4736b.C is mis-compiled with with all optimization levels, except -O0 and -Og. This probably started with gcc 4.7.x. The constructor Main::Main() is first inlined and then completely optimized away in the dce1 pass. But the thunk is still using the vtable, and therefore crashes. Well, the problem seems to be, that the thunk code is not emitted in the normal way using gimple code, but instead by the i386 back-end with a callback. This makes the thunk invisible to the ipa-pure-const pass, which assumes that all values just flow straight thu the thunk. See ipa-pure-const.c (analyze_function): if (fn-thunk.thunk_p || fn-alias) { /* Thunk gets propagated through, so nothing interesting happens. */ gcc_assert (ipa); return l; } But this is not true for a virtual base class: The thunk itself references the vtable, so if nothing else might need the vtable, the optimizer may remove the initalization of the vtable, which happened in this example. The most simple work around would be the attached patch, which simply falls back to emitting gimple code, if virtual base classes are used. Boot-Strapped and Regression-Tested on x86_64-linux-gnu. Ok for trunk? Honza should know better. I'd rather skip the above for thunks going the output_mi_thunk way. That is, the target hook docs should be updated to reflect which kind of thunks it is supposed to handle. Richard. Thanks Bernd.
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Fri, May 30, 2014 at 10:48 AM, Eric Botcazou ebotca...@adacore.com wrote: I'd say we still handle basic symbolic range ops in extract_range_from_binary_1 but in extract_range_from_binary_expr for a symbolic op0 we try to simplify it with both [op1, op1] and with the value-range of op1 until we get a non-varying range as result. Not sure if it's worth restricting that to the case where op0s value-range refers to op1 or vice versa, and eventually only use op1 symbolically then. Patch along these lines attached. A bit heavy as expected, but it's VRP... It deals with my pet problem, you might want to check it does so with yours. Tested on x86_64-suse-linux with no regressions. Looks mostly ok. Any reason why you are not re-creating MINUS_EXPR in build_symbolic_expr? That is, build inv - t (for non-pointers, of course)? Otherwise if a range becomes -t + inv that will no longer match get_single_symbol for further propagation? Then I'm not sure if + /* Try with VR0 and [-INF, OP1]. */ + set_value_range (new_vr1, VR_RANGE, vrp_val_min (expr_type), op1, NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; + + /* Try with VR0 and [OP1, +INF]. */ + set_value_range (new_vr1, VR_RANGE, op1, vrp_val_max (expr_type), NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; is a safe thing to do. If it does make a difference to try [-INF, OP1], [OP1, +INF] instead of just [OP1, OP1] then at least it's very suspicious ;) (or an easy missed optimization). So - can you fix the negate thing and drop the four cases trying the +-INF based ranges? Thanks, Richard. 2014-05-30 Eric Botcazou ebotca...@adacore.com * tree-vrp.c (get_single_symbol): New function. (build_symbolic_expr): Likewise. (symbolic_range_based_on_p): New predicate. (extract_range_from_binary_expr_1): Deal with single-symbolic ranges for PLUS and MINUS. Do not drop symbolic ranges at the end. (extract_range_from_binary_expr): Try harder for PLUS and MINUS if operand is symbolic and based on the other operand. 2014-05-30 Eric Botcazou ebotca...@adacore.com * gcc.dg/tree-ssa/vrp93.c: New test. * gnat.dg/opt38.adb: Likewise. -- Eric Botcazou
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Mon, Jun 2, 2014 at 12:36 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 30, 2014 at 10:48 AM, Eric Botcazou ebotca...@adacore.com wrote: I'd say we still handle basic symbolic range ops in extract_range_from_binary_1 but in extract_range_from_binary_expr for a symbolic op0 we try to simplify it with both [op1, op1] and with the value-range of op1 until we get a non-varying range as result. Not sure if it's worth restricting that to the case where op0s value-range refers to op1 or vice versa, and eventually only use op1 symbolically then. Patch along these lines attached. A bit heavy as expected, but it's VRP... It deals with my pet problem, you might want to check it does so with yours. Tested on x86_64-suse-linux with no regressions. Looks mostly ok. Any reason why you are not re-creating MINUS_EXPR in build_symbolic_expr? That is, build inv - t (for non-pointers, of course)? Otherwise if a range becomes -t + inv that will no longer match get_single_symbol for further propagation? Then I'm not sure if + /* Try with VR0 and [-INF, OP1]. */ + set_value_range (new_vr1, VR_RANGE, vrp_val_min (expr_type), op1, NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; + + /* Try with VR0 and [OP1, +INF]. */ + set_value_range (new_vr1, VR_RANGE, op1, vrp_val_max (expr_type), NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; is a safe thing to do. If it does make a difference to try [-INF, OP1], [OP1, +INF] instead of just [OP1, OP1] then at least it's very suspicious ;) (or an easy missed optimization). So - can you fix the negate thing and drop the four cases trying the +-INF based ranges? Btw, the testcases are missing in the patch so I can't have a look myself. Richard. Thanks, Richard. 2014-05-30 Eric Botcazou ebotca...@adacore.com * tree-vrp.c (get_single_symbol): New function. (build_symbolic_expr): Likewise. (symbolic_range_based_on_p): New predicate. (extract_range_from_binary_expr_1): Deal with single-symbolic ranges for PLUS and MINUS. Do not drop symbolic ranges at the end. (extract_range_from_binary_expr): Try harder for PLUS and MINUS if operand is symbolic and based on the other operand. 2014-05-30 Eric Botcazou ebotca...@adacore.com * gcc.dg/tree-ssa/vrp93.c: New test. * gnat.dg/opt38.adb: Likewise. -- Eric Botcazou
Re: [PATCH] Make HOST_WIDE_INT underlying type match int64_t, adjust printing macros again
On Mon, 2 Jun 2014, Jakub Jelinek wrote: On Mon, Jun 02, 2014 at 11:24:23AM +0200, Richard Biener wrote: 2014-06-02 Richard Biener rguent...@suse.de * configure.ac: Check whether the underlying type of int64_t is long or long long. * configure: Regenerate. * config.in: Likewise. * hwint.h (HOST_WIDE_INT): Match the underlying type of int64_t. (HOST_WIDE_INT_PRINT_*): Define in terms of PRI*64. LGTM, but give others 24 hours to comment on it too. Testing with older host compiler revealed two issues - typename use outside of template and missing quoting around the cache variable check. Thus I am re-testing the following. Richard. 2014-06-02 Richard Biener rguent...@suse.de * configure.ac: Check whether the underlying type of int64_t is long or long long. * configure: Regenerate. * config.in: Likewise. * hwint.h (HOST_WIDE_INT): Match the underlying type of int64_t. (HOST_WIDE_INT_PRINT_*): Define in terms of PRI*64. Index: gcc/configure.ac === *** gcc/configure.ac(revision 211125) --- gcc/configure.ac(working copy) *** if test x$ac_cv_c_uint64_t = xno -o *** 316,321 --- 316,350 AC_MSG_ERROR([uint64_t or int64_t not found]) fi + # check what underlying integer type int64_t uses + AC_LANG_PUSH(C++) + AC_CACHE_CHECK(for int64_t underlying type, ac_cv_int64_t_type, [ + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + #ifdef HAVE_STDINT_H + #include stdint.h + #endif + template typename T struct X { }; + template + struct Xlong { typedef long t; }; + ]], [[Xint64_t::t x;]])],[ac_cv_int64_t_type=long],[ac_cv_int64_t_type=long long])]) + if test $ac_cv_int64_t_type = long; then + AC_DEFINE(INT64_T_IS_LONG, 1, + [Define if int64_t uses long as underlying type.]) + else + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + #ifdef HAVE_STDINT_H + #include stdint.h + #endif + template typename T struct X { }; + template + struct Xlong long { typedef long long t; }; + ]], [[Xint64_t::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses long long])]) + fi + AC_LANG_POP(C++) + + + + # - # Warnings and checking # - *** LIBS=$save_LIBS *** 1055,1067 AC_SUBST(LDEXP_LIB) # Use inttypes.h only if it exists, ! # doesn't clash with sys/types.h, and declares intmax_t. AC_MSG_CHECKING(for inttypes.h) AC_CACHE_VAL(gcc_cv_header_inttypes_h, [AC_COMPILE_IFELSE([AC_LANG_PROGRAM( ! [[#include sys/types.h #include inttypes.h]], ! [[intmax_t i = -1;]])], [gcc_cv_header_inttypes_h=yes], [gcc_cv_header_inttypes_h=no])]) AC_MSG_RESULT($gcc_cv_header_inttypes_h) --- 1084,1101 AC_SUBST(LDEXP_LIB) # Use inttypes.h only if it exists, ! # doesn't clash with sys/types.h, declares intmax_t and defines ! # PRId64 AC_MSG_CHECKING(for inttypes.h) AC_CACHE_VAL(gcc_cv_header_inttypes_h, [AC_COMPILE_IFELSE([AC_LANG_PROGRAM( ! [[#define __STDC_FORMAT_MACROS ! #include sys/types.h #include inttypes.h]], ! [[intmax_t i = -1; ! #ifndef PRId64 ! choke me ! #endif]])], [gcc_cv_header_inttypes_h=yes], [gcc_cv_header_inttypes_h=no])]) AC_MSG_RESULT($gcc_cv_header_inttypes_h) Index: gcc/hwint.h === *** gcc/hwint.h (revision 211125) --- gcc/hwint.h (working copy) *** extern char sizeof_long_long_must_be_8[s *** 46,58 #endif /* Set HOST_WIDE_INT, this should be always 64 bits. ! !With a sane ABI, 'long' is the largest efficient host integer type. !Thus, we use that unless we have to use 'long long' !because we're on a 32-bit host. */ #define HOST_BITS_PER_WIDE_INT 64 ! #if HOST_BITS_PER_LONG == 64 # define HOST_WIDE_INT long # define HOST_WIDE_INT_C(X) X ## L #else --- 46,56 #endif /* Set HOST_WIDE_INT, this should be always 64 bits. !The underlying type is matched to that of int64_t and assumed !to be either long or long long. */ #define HOST_BITS_PER_WIDE_INT 64 ! #if INT64_T_IS_LONG # define HOST_WIDE_INT long # define HOST_WIDE_INT_C(X) X ## L #else *** extern char sizeof_long_long_must_be_8[s *** 75,122 typedef before using the __asm_fprintf__ format attribute. */ typedef HOST_WIDE_INT __gcc_host_wide_int__; - /* Various printf format strings for HOST_WIDE_INT. */ - - #if HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG - # define HOST_WIDE_INT_PRINT HOST_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C L - /* HOST_BITS_PER_WIDE_INT is 64 bits. */ - # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ - 0x% HOST_LONG_FORMAT x%016 HOST_LONG_FORMAT x - # define HOST_WIDE_INT_PRINT_PADDED_HEX \ - %016 HOST_LONG_FORMAT x - #else - # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT - # define HOST_WIDE_INT_PRINT_C LL - /* HOST_BITS_PER_WIDE_INT is 64 bits
Re: [PATCH, Pointer Bounds Checker 14/x] Pointer Bounds Checker passes
On Fri, May 30, 2014 at 2:25 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds Pointer Bounds Checker passes. Versioning happens before early local passes. Earply local passes are split into 3 stages to have everything instrumented before any optimization applies. That looks artificial to me. If you need to split up early_local_passes then do that - nesting three IPA pass groups inside it looks odd to me. Btw - doing this in three IPA phases makes things possibly slower due to cache effects. It might be worth pursuing to move the early stage completely to the lowering pipeline. Btw, fixup_cfg only needs to run once local_pure_const was run on a callee, thus it shouldn't be neccessary to run it from the first group. void pass_manager::execute_early_local_passes () { - execute_pass_list (pass_early_local_passes_1-sub); + execute_pass_list (pass_early_local_passes_1-sub-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-next-next-sub); } is gross - it should be enough to execute the early local pass list (obsolete comment with the suggestion above). Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c: New. * tree-chkp.h: New. * rtl-chkp.c: New. * rtl-chkp.h: New. * Makefile.in (OBJS): Add tree-chkp.o, rtl-chkp.o. (GTFILES): Add tree-chkp.c. * c-family/c.opt (fchkp-check-incomplete-type): New. (fchkp-zero-input-bounds-for-main): New. (fchkp-first-field-has-own-bounds): New. (fchkp-narrow-bounds): New. (fchkp-narrow-to-innermost-array): New. (fchkp-optimize): New. (fchkp-use-fast-string-functions): New. (fchkp-use-nochk-string-functions): New. (fchkp-use-static-bounds): New. (fchkp-use-static-const-bounds): New. (fchkp-treat-zero-dynamic-size-as-infinite): New. (fchkp-check-read): New. (fchkp-check-write): New. (fchkp-store-bounds): New. (fchkp-instrument-calls): New. (fchkp-instrument-marked-only): New. * cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add __CHKP__ macro when Pointer Bounds Checker is on. * passes.def (pass_ipa_chkp_versioning): New. (pass_before_local_optimization_passes): New. (pass_chkp_instrumentation_passes): New. (pass_ipa_chkp_produce_thunks): New. (pass_local_optimization_passes): New. (pass_chkp_opt): New. * toplev.c: include tree-chkp.h. (compile_file): Add chkp_finish_file call. * tree-pass.h (make_pass_ipa_chkp_versioning): New. (make_pass_ipa_chkp_produce_thunks): New. (make_pass_chkp): New. (make_pass_chkp_opt): New. (make_pass_before_local_optimization_passes): New. (make_pass_chkp_instrumentation_passes): New. (make_pass_local_optimization_passes): New. * tree.h (called_as_built_in): New. * builtins.c (called_as_built_in): Not static anymore. * passes.c (pass_manager::execute_early_local_passes): Execute early passes in three steps. (pass_data_before_local_optimization_passes): New. (pass_before_local_optimization_passes): New. (pass_data_chkp_instrumentation_passes): New. (pass_chkp_instrumentation_passes): New. (pass_data_local_optimization_passes): New. (pass_local_optimization_passes): New. (make_pass_before_local_optimization_passes): New. (make_pass_chkp_instrumentation_passes): New. (make_pass_local_optimization_passes): New.
Re: [PATCH, i386, Pointer Bounds Checker 10/x] Partitions
On Fri, May 30, 2014 at 7:10 PM, Jeff Law l...@redhat.com wrote: On 05/28/14 10:06, Ilya Enkovich wrote: Hi, This patch keeps instrumented and original versions together and preserve tranparent alias chain during symbol name privatization. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-05-28 Ilya Enkovich ilya.enkov...@intel.com * lto/lto-partition.c (add_symbol_to_partition_1): Keep original and instrumented versions together. This part is OK. Note lto/ has its own ChangeLog, so put the ChangeLog entry there and don't use the lto/ prefix in the ChangeLog entry. (privatize_symbol_name): Restore transparent alias chain if required. What exactly are you doing here? The comment in the code doesn't really make it clear what you are doing or why. + /* We could change name which is a target of transparent alias + chain of instrumented function name. Fix alias chain if so .*/ So are you saying that we want to change the name? Or that it could have been changed and we have to adjust something because it was changed? I'm certainly not as familiar with the LTO stuff as I should be -- what is the purpose behing chaining the DECL_ASSEMBLER_NAME nodes? Something gross: /* Nonzero in an IDENTIFIER_NODE if the name is a local alias, whose uses are to be substituted for uses of the TREE_CHAINed identifier. */ #define IDENTIFIER_TRANSPARENT_ALIAS(NODE) \ (IDENTIFIER_NODE_CHECK (NODE)-base.deprecated_flag) this should be all moved to the symbol table level. (and IDENTIFIER_NODE shouldn't have to have tree_common.chain and thus become smaller). Richard. jeff
Re: [PATCH, Pointer Bounds Checker 13/x] Early versioning
On Mon, Jun 2, 2014 at 12:48 PM, Ilya Enkovich enkovich@gmail.com wrote: On 30 May 10:59, Jeff Law wrote: On 05/29/14 05:05, Ilya Enkovich wrote: Hi, This patch allows to perform function versioning when some structures are not available yet. It is required to make clones for Pointer Bounds Checker right after SSA build. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * tree-inline.c (copy_cfg_body): Check loop tree existence before accessing it. (tree_function_versioning): Check DF info existence before accessing it. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 4293241..23fef90 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -2544,7 +2544,8 @@ copy_cfg_body (copy_body_data * id, gcov_type count, int frequency_scale, /* If the loop tree in the source function needed fixup, mark the destination loop tree for fixup, too. */ - if (loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) + if (loops_for_fn (src_cfun) + loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) loops_state_set (LOOPS_NEED_FIXUP); Hmm, so if I understand things correctly, src_fun has no loop structures attached, thus there's nothing to copy. Presumably at some later point we build loop structures for the copy from scratch? I suppose it is just a simple bug with absent NULL pointer check. Here is original code: /* Duplicate the loop tree, if available and wanted. */ if (loops_for_fn (src_cfun) != NULL current_loops != NULL) { copy_loops (id, entry_block_map-loop_father, get_loop (src_cfun, 0)); /* Defer to cfgcleanup to update loop-father fields of basic-blocks. */ loops_state_set (LOOPS_NEED_FIXUP); } /* If the loop tree in the source function needed fixup, mark the destination loop tree for fixup, too. */ if (loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) loops_state_set (LOOPS_NEED_FIXUP); As you may see we have check for absent loops structure in the first if-statement and no check in the second one. I hit segfault and added the check. Actually after SSA is built we always have loops (loops are built once we build the CFG which happens earlier). So all the above checks are no longer necessary now. Similarly for the PTA info, we just build it from scratch in the copy at some point? Here we also have conditional access like /* Reset the escaped solution. */ if (cfun-gimple_df) pt_solution_reset (cfun-gimple_df-escaped); and following unconditional I've fixed. Likewise. The init_data_structures pass initializes this (init_tree_ssa). So I'm not sure why you need all this given we are in SSA form? Richard. Assuming the answers to both are yes, then this patch is OK for the trunk when the rest of the patches are approved. It's not great, bit it's OK. Thanks! Ilya jeff
Re: [GSoC][match-and-simplify] add bitwise patterns to match.pd
On Mon, Jun 2, 2014 at 1:33 PM, Prathamesh Kulkarni bilbotheelffri...@gmail.com wrote: Hi, I have tried to add few bitwise patterns from tree-ssa-forwprop.c:simplify_bitwise_binary and the patterns mentioned in Simplifications wiki (https://gcc.gnu.org/wiki/Simplifications). How to write a test-case to match multiple gimple statements ? Example: For the pattern: ~x | ~y - ~(x y) test-case: int f(int x, int y) { int t1 = ~x; int t2 = ~y; return t1 | t2; } fdump-tree-forwprop-details output: gimple_match_and_simplified to _5 = ~_7; f (int x, int y) { int t2; int t1; int _5; int _7; bb 2: t1_2 = ~x_1(D); t2_4 = ~y_3(D); _7 = x_1(D) y_3(D); _5 = ~_7; return _5; } I suppose we want to look for matching both: _7 = x_1(D) y_3(D); _5 = ~_7; so only matching on gimple_match_and_simplified to _5 = ~_7 won't be correct ? Yeah, that's a forwprop debugging dump issue and/or of the API it uses. The gimple_match_and_simplify overload using a gimple_stmt_iterator * inserts the other stmts before it instead of delaying that to the caller (and gives it the chance to dump it). You can do the old-school testing by scanning for the IL itself, not some gimple_match_and_simplified dump. Thus in addition to the above also scan for { dg-final { scan-tree-dump \[xy\]_\\d\+\\(D\\) \[xy\]_\\d\+\\(D\\) } } The patterns for x 0 - 0 and x -1 - x don't get fired from forwprop. I tried: int f1(int x) { int t1 = 0; return x t1; } fdump-tree-forwprop-details shows following: ;; Function f1 (f1, funcdef_no=0, decl_uid=1743, symbol_order=0) f1 (int x) { int t1; bb 2: return 0; } That's because constant propagation (which runs before forwprop) already simplified the statement. I have a patch to make use of match-and-simplify from gimple_fold_stmt_to_constant but I have to clean it up. I'll give writing testcases a thought there (hopefully will post and commit a patch later today). Richard. [gcc] * match.pd: Add few patterns from simplify_bitwise_binary. [gcc/testsuite] * gcc.dg/tree-ssa/match-2.c: Add more test-cases. Thanks and Regards, Prathamesh
[PATCH][match-and-simplify]
This is a patch I worked on last week - it makes gimple_fold_stmt_to_constant[_1] use gimple_match_and_simplify (and avoid building trees we throw away because they are not gimple values). It also adds basic constant folding patterns (and comments on what is missing in match.pd - the actual patch I was using was comparing the result from both simplification methods and errors if they don't agree). The patch below (which has been committed) keeps the old path to not regress due to missing patterns. Committed to the branch. Richard. 2014-06-02 Richard Biener rguent...@suse.de * gimple-fold.c (gimple_fold_stmt_to_constant_1): Rename to (gimple_fold_stmt_to_constant_2): ... this and make static. (gimple_fold_stmt_to_constant_1): New wrapper using gimple_match_and_simplify before falling back to gimple_fold_stmt_to_constant_2. * match.pd: Add basic constant folding patterns. Add two commutations. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 211131) +++ gcc/gimple-fold.c (working copy) @@ -2524,8 +2524,8 @@ maybe_fold_or_comparisons (enum tree_cod privatized with the single valueize function used in the various TUs to avoid the indirect function call overhead. */ -tree -gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree)) +static tree +gimple_fold_stmt_to_constant_2 (gimple stmt, tree (*valueize) (tree)) { location_t loc = gimple_location (stmt); switch (gimple_code (stmt)) @@ -2803,6 +2803,30 @@ gimple_fold_stmt_to_constant_1 (gimple s } } +tree +gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree)) +{ + tree lhs = gimple_get_lhs (stmt); + if (lhs) +{ + tree res = gimple_match_and_simplify (lhs, NULL, valueize); + if (res) + { + if (dump_file dump_flags TDF_DETAILS) + { + fprintf (dump_file, Match-and-simplified definition of ); + print_generic_expr (dump_file, lhs, 0); + fprintf (dump_file, to ); + print_generic_expr (dump_file, res, 0); + fprintf (dump_file, \n); + } + return res; + } +} + /* ??? For now, to avoid regressions. */ + return gimple_fold_stmt_to_constant_2 (stmt, valueize); +} + /* Fold STMT to a constant using VALUEIZE to valueize SSA names. Returns NULL_TREE if folding to a constant is not possible, otherwise returns a constant according to is_gimple_min_invariant. */ Index: gcc/match.pd === --- gcc/match.pd(revision 211131) +++ gcc/match.pd(working copy) @@ -21,6 +21,95 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* Simple constant foldings to substitute gimple_fold_stmt_to_constant_2. */ +(match_and_simplify + (plus @0 integer_zerop) + @0) +(match_and_simplify + (pointer_plus @0 integer_zerop) + @0) +(match_and_simplify + (minus @0 integer_zerop) + @0) +(match_and_simplify + (minus @0 @0) + { build_zero_cst (type); }) +(match_and_simplify + (mult @0 integer_zerop@1) + @1) +(match_and_simplify + (mult @0 integer_onep) + @0) +(match_and_simplify + (trunc_div @0 integer_onep) + @0) +/* It's hard to preserve non-folding of / 0 which is done by a + positional check in fold-const.c (to preserve warnings). The + issue here is that we fold too early in frontends. + Also fold happilt folds 0 / x to 0 (even if x turns out to be zero later). */ +(match_and_simplify + (trunc_div integer_zerop@0 @1) + @0) +(match_and_simplify + (trunc_div @0 @0) + { build_one_cst (type); }) +(match_and_simplify + (trunc_mod @0 integer_onep) + { build_zero_cst (type); }) +(match_and_simplify + (trunc_mod integer_zerop@0 @1) + @0) +(match_and_simplify + (trunc_mod @0 @0) + { build_zero_cst (type); }) +(match_and_simplify + (bit_ior @0 integer_zerop) + @0) +(match_and_simplify + (bit_ior @0 integer_all_onesp@1) + @1) +(match_and_simplify + (bit_and @0 integer_all_onesp) + @0) +(match_and_simplify + (bit_and @0 integer_zerop@1) + @1) +(match_and_simplify + (bit_xor @0 integer_zerop) + @0) +(match_and_simplify + (bit_xor @0 @0) + { build_zero_cst (type); }) +/* tree-ssa/ifc-pr44710.c requires a b ? c : d to fold to 1. + ??? probably runs into issue of recursive folding of a b op0. */ +/* tree-ssa/ssa-ccp-16.c wants to fold hello[i_2] to 0 + (fold_const_aggregate_ref_1). */ +/* tree-ssa/ssa-ccp-19.c wants to fold a1_3-i to MEM[(void *)a] + (get_addr_base_and_unit_offset_1). */ +/* tree-ssa/ssa-ccp-22.c wants to fold b_2(D) = t_1 to 1. + We are missing compare constant folding to type boundaries. */ + +/* The following is simplification done by gimple_fold_stmt_to_constant_1 + to aid propagation engines, producing is_gimple_min_invariants from
Re: [GSoC][match-and-simplify] add bitwise patterns to match.pd
On Mon, Jun 2, 2014 at 2:53 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Jun 2, 2014 at 1:33 PM, Prathamesh Kulkarni bilbotheelffri...@gmail.com wrote: Hi, I have tried to add few bitwise patterns from tree-ssa-forwprop.c:simplify_bitwise_binary and the patterns mentioned in Simplifications wiki (https://gcc.gnu.org/wiki/Simplifications). How to write a test-case to match multiple gimple statements ? Example: For the pattern: ~x | ~y - ~(x y) test-case: int f(int x, int y) { int t1 = ~x; int t2 = ~y; return t1 | t2; } fdump-tree-forwprop-details output: gimple_match_and_simplified to _5 = ~_7; f (int x, int y) { int t2; int t1; int _5; int _7; bb 2: t1_2 = ~x_1(D); t2_4 = ~y_3(D); _7 = x_1(D) y_3(D); _5 = ~_7; return _5; } I suppose we want to look for matching both: _7 = x_1(D) y_3(D); _5 = ~_7; so only matching on gimple_match_and_simplified to _5 = ~_7 won't be correct ? Yeah, that's a forwprop debugging dump issue and/or of the API it uses. The gimple_match_and_simplify overload using a gimple_stmt_iterator * inserts the other stmts before it instead of delaying that to the caller (and gives it the chance to dump it). You can do the old-school testing by scanning for the IL itself, not some gimple_match_and_simplified dump. Thus in addition to the above also scan for { dg-final { scan-tree-dump \[xy\]_\\d\+\\(D\\) \[xy\]_\\d\+\\(D\\) } } The patterns for x 0 - 0 and x -1 - x don't get fired from forwprop. I tried: int f1(int x) { int t1 = 0; return x t1; } fdump-tree-forwprop-details shows following: ;; Function f1 (f1, funcdef_no=0, decl_uid=1743, symbol_order=0) f1 (int x) { int t1; bb 2: return 0; } That's because constant propagation (which runs before forwprop) already simplified the statement. I have a patch to make use of match-and-simplify from gimple_fold_stmt_to_constant but I have to clean it up. I'll give writing testcases a thought there (hopefully will post and commit a patch later today). Btw, /* x 0 - 0 */ (match_and_simplify (bit_and @0 @1) if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) @1 == integer_zero_node) { integer_zero_node; }) /* x -1 - x */ (match_and_simplify (bit_and @0 @1) if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) @1 == integer_minus_one_node) @0) are too restrictive due to comparing @1 against very special tree nodes. The patch I just committed uses integer_zerop and integer_all_onesp instead. I have simplfied some of the if-exprs, operands are guaranteed to match. Also if we settle on the solution of providing 'type' as the type of the outermost expression we can simplify them as bitwise expressions don't change types. Most of the patterns would also apply in commutated form, thus I guess thinking of a good solution for that is next on the list after the decision tree stuff. /* ((a b) ~a) ~b - 0 */ (match_and_simplify (bit_and (bit_and (bit_and @0 @1) (bit_not @0)) (bit_not @1)) if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) INTEGRAL_TYPE_P (TREE_TYPE (@1))) { integer_zero_node; }) isn't this too complex and instead already (a b) ~a is 0? Also that would be still doing two steps in one as we already have a ~a - 0 so the pattern (if wanted) should only do the association (a b) ~a - (a ~a) b? (note that generally this is something for a reassociation pass, not for simple pattern matching) I have applied the patch with these changes. Richard. Richard. [gcc] * match.pd: Add few patterns from simplify_bitwise_binary. [gcc/testsuite] * gcc.dg/tree-ssa/match-2.c: Add more test-cases. Thanks and Regards, Prathamesh
Re: [PATCH, Pointer Bounds Checker 14/x] Pointer Bounds Checker passes
On Mon, Jun 2, 2014 at 2:44 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-02 15:35 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, May 30, 2014 at 2:25 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds Pointer Bounds Checker passes. Versioning happens before early local passes. Earply local passes are split into 3 stages to have everything instrumented before any optimization applies. That looks artificial to me. If you need to split up early_local_passes then do that - nesting three IPA pass groups inside it looks odd to me. Btw - doing this in three IPA phases makes things possibly slower due to cache effects. It might be worth pursuing to move the early stage completely to the lowering pipeline. Early local passes is some special case because these passes are executed separately for new functions. I did not want to get three special passes instead and therefore made split inside. Yeah, but all passes are already executed via execute_early_local_passes, so it would be only an implementation detail. If you prefer split pass itself, I suppose pass_early_local_passes may be replaced with something like pass_build_ssa_passes + pass_chkp_instrumentation_passes + pass_ipa_chkp_produce_thunks + pass_local_optimization_passes. execute_early_local_passes would execute gimple passes lists of pass_build_ssa_passes, pass_chkp_instrumentation_passes and pass_local_optimization_passes. I think we cannot have the first stage moved into lowering passes because it should be executed for newly created functions. Well, let's defer that then. Btw, fixup_cfg only needs to run once local_pure_const was run on a callee, thus it shouldn't be neccessary to run it from the first group. OK. Will try to remove it from the first group. void pass_manager::execute_early_local_passes () { - execute_pass_list (pass_early_local_passes_1-sub); + execute_pass_list (pass_early_local_passes_1-sub-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-next-next-sub); } is gross - it should be enough to execute the early local pass list (obsolete comment with the suggestion above). This function should call only gimple passes for cfun. Therefore we cannot call IPA passes here and has to execute each gimple passes list separately. Ok, given a different split this would then become somewhat more sane anyway. Richard. Ilya Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c: New. * tree-chkp.h: New. * rtl-chkp.c: New. * rtl-chkp.h: New. * Makefile.in (OBJS): Add tree-chkp.o, rtl-chkp.o. (GTFILES): Add tree-chkp.c. * c-family/c.opt (fchkp-check-incomplete-type): New. (fchkp-zero-input-bounds-for-main): New. (fchkp-first-field-has-own-bounds): New. (fchkp-narrow-bounds): New. (fchkp-narrow-to-innermost-array): New. (fchkp-optimize): New. (fchkp-use-fast-string-functions): New. (fchkp-use-nochk-string-functions): New. (fchkp-use-static-bounds): New. (fchkp-use-static-const-bounds): New. (fchkp-treat-zero-dynamic-size-as-infinite): New. (fchkp-check-read): New. (fchkp-check-write): New. (fchkp-store-bounds): New. (fchkp-instrument-calls): New. (fchkp-instrument-marked-only): New. * cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add __CHKP__ macro when Pointer Bounds Checker is on. * passes.def (pass_ipa_chkp_versioning): New. (pass_before_local_optimization_passes): New. (pass_chkp_instrumentation_passes): New. (pass_ipa_chkp_produce_thunks): New. (pass_local_optimization_passes): New. (pass_chkp_opt): New. * toplev.c: include tree-chkp.h. (compile_file): Add chkp_finish_file call. * tree-pass.h (make_pass_ipa_chkp_versioning): New. (make_pass_ipa_chkp_produce_thunks): New. (make_pass_chkp): New. (make_pass_chkp_opt): New. (make_pass_before_local_optimization_passes): New. (make_pass_chkp_instrumentation_passes): New. (make_pass_local_optimization_passes): New. * tree.h (called_as_built_in): New. * builtins.c (called_as_built_in): Not static anymore. * passes.c (pass_manager::execute_early_local_passes): Execute early passes in three steps. (pass_data_before_local_optimization_passes): New. (pass_before_local_optimization_passes): New. (pass_data_chkp_instrumentation_passes): New. (pass_chkp_instrumentation_passes): New. (pass_data_local_optimization_passes): New. (pass_local_optimization_passes): New
[PATCH] Fix part of PR61383
In comment #3 it's noted that ifcombine happily hoists possibly trapping stmts ... oops. Fixed like the following, bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-06-02 Richard Biener rguent...@suse.de PR tree-optimization/61383 * tree-ssa-ifcombine.c (bb_no_side_effects_p): Make sure stmts can't trap. * gcc.dg/torture/pr61383-1.c: New testcase. Index: gcc/tree-ssa-ifcombine.c === *** gcc/tree-ssa-ifcombine.c(revision 211125) --- gcc/tree-ssa-ifcombine.c(working copy) *** bb_no_side_effects_p (basic_block bb) *** 128,133 --- 128,134 gimple stmt = gsi_stmt (gsi); if (gimple_has_side_effects (stmt) + || gimple_could_trap_p (stmt) || gimple_vuse (stmt)) return false; } Index: gcc/testsuite/gcc.dg/torture/pr61383-1.c === *** gcc/testsuite/gcc.dg/torture/pr61383-1.c(revision 0) --- gcc/testsuite/gcc.dg/torture/pr61383-1.c(working copy) *** *** 0 --- 1,35 + /* { dg-do run } */ + + int a, b = 1, c, d, e, f, g; + + int + fn1 () + { + int h; + for (;;) + { + g = b; + g = g ? 0 : 1 % g; + e = a + 1; + for (; d 1; d = e) + { + if (f == 0) + h = 0; + else + h = 1 % f; + if (f 1) + c = 0; + else if (h) + break; + } + if (b) + return 0; + } + } + + int + main () + { + fn1 (); + return 0; + }
Re: Eliminate write-only variables
On June 2, 2014 5:44:13 PM CEST, Jan Hubicka hubi...@ucw.cz wrote: Well, I'm hesitant to add a new pass just to optimize a (irrelevant in practice) benchmark. I'm ok with strengthening existing infrastructure or enhancing existing passes. The issue with these mini-passes is that they are very placement sensitive and you don't easily get secondary effects. See the comment (and patch) about DCE - the ??? comment can be addressed the same way Bernd addressed it (use cgraph_function_possibly_inlined_p). Would that optimize the testcase? Note that the issue with nested function use prevails (not sure how to solve that - same issue in Bernds patch). I think this part of the patch can be made much cleaner by simply adding flag used by one function only to variables. We can compute it at the end of IPA queue and remove the code playing with local statics, nested functions and inlines. I can easily implement this analysis - perhaps it would be useful for AA or something else, too? Yeah, I discussed this with martin today on irc. For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :( So we want deck_referred_to_by_single_function and deck_may_have_address_taken. Richard. Honza Thanks, Richard. -Sandra
Re: [PATCH] Fix PR61328: fully initialize symbolic number before using it
On June 3, 2014 9:18:29 AM CEST, Thomas Preud'homme thomas.preudho...@arm.com wrote: When a bitwise OR gimple statement has for operands SSA_NAME initialized directly from memory source (no cast or other unary statement intervening), a symbolic number will be used only partly initialized. This was discovered by valgrind and reported as PR61328. This patch fixes that by moving the initialization code in a separate function that can be called from the two places that need it. There was a problem of a field of a structure that was set in a function and the value of this field was read before checking the result of the function call. This would lead to missed optimization. ChangeLog is as follows: 2014-05-29 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/61328 * tree-ssa-math-opts.c (init_symbolic_number): Extract symbolic number initialization from find_bswap_or_nop_1. (find_bswap_or_nop_1): Test return value of find_bswap_or_nop_1 stored in source_expr2 before using the size value the function sets. Also make use of init_symbolic_number () in both the old place and find_bswap_or_nop_load () to avoid reading uninitialized memory when doing recursion in the GIMPLE_BINARY_RHS case. Ok for trunk? OK. Thanks, Richard. diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index d9afccf..6c26d6d 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1701,6 +1701,30 @@ verify_symbolic_number_p (struct symbolic_number *n, gimple stmt) return true; } +/* Initialize the symbolic number N for the bswap pass from the base element + SRC manipulated by the bitwise OR expression. */ + +static bool +init_symbolic_number (struct symbolic_number *n, tree src) +{ + n-base_addr = n-offset = n-alias_set = n-vuse = NULL_TREE; + + /* Set up the symbolic number N by setting each byte to a value between 1 and + the byte size of rhs1. The highest order byte is set to n-size and the + lowest order byte to 1. */ + n-size = TYPE_PRECISION (TREE_TYPE (src)); + if (n-size % BITS_PER_UNIT != 0) +return false; + n-size /= BITS_PER_UNIT; + n-range = n-size; + n-n = CMPNOP; + + if (n-size (int)sizeof (int64_t)) +n-n = ((uint64_t)1 (n-size * BITS_PER_UNIT)) - 1; + + return true; +} + /* Check if STMT might be a byte swap or a nop from a memory source and returns the answer. If so, REF is that memory source and the base of the memory area accessed and the offset of the access from that base are recorded in N. */ @@ -1713,26 +1737,27 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) HOST_WIDE_INT bitsize, bitpos; enum machine_mode mode; int unsignedp, volatilep; + tree offset, base_addr; if (!gimple_assign_load_p (stmt) || gimple_has_volatile_ops (stmt)) return false; - n-base_addr = get_inner_reference (ref, bitsize, bitpos, n-offset, -mode, unsignedp, volatilep, false); + base_addr = get_inner_reference (ref, bitsize, bitpos, offset, mode, + unsignedp, volatilep, false); - if (TREE_CODE (n-base_addr) == MEM_REF) + if (TREE_CODE (base_addr) == MEM_REF) { offset_int bit_offset = 0; - tree off = TREE_OPERAND (n-base_addr, 1); + tree off = TREE_OPERAND (base_addr, 1); if (!integer_zerop (off)) { -offset_int boff, coff = mem_ref_offset (n-base_addr); +offset_int boff, coff = mem_ref_offset (base_addr); boff = wi::lshift (coff, LOG2_BITS_PER_UNIT); bit_offset += boff; } - n-base_addr = TREE_OPERAND (n-base_addr, 0); + base_addr = TREE_OPERAND (base_addr, 0); /* Avoid returning a negative bitpos as this may wreak havoc later. */ if (wi::neg_p (bit_offset)) @@ -1743,11 +1768,11 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) Subtract it to BIT_OFFSET and add it (scaled) to OFFSET. */ bit_offset -= tem; tem = wi::arshift (tem, LOG2_BITS_PER_UNIT); -if (n-offset) - n-offset = size_binop (PLUS_EXPR, n-offset, +if (offset) + offset = size_binop (PLUS_EXPR, offset, wide_int_to_tree (sizetype, tem)); else - n-offset = wide_int_to_tree (sizetype, tem); + offset = wide_int_to_tree (sizetype, tem); } bitpos += bit_offset.to_shwi (); @@ -1758,6 +1783,9 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) if (bitsize % BITS_PER_UNIT) return false; + init_symbolic_number (n, ref); + n-base_addr = base_addr; + n-offset = offset; n-bytepos = bitpos / BITS_PER_UNIT; n-alias_set = reference_alias_ptr_type (ref); n-vuse = gimple_vuse (stmt); @@ -1816,28 +1844,12 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) /* If find_bswap_or_nop_1 returned NULL,
Re: [PATCH][match-and-simplify]
On Mon, 2 Jun 2014, Marc Glisse wrote: (plus (bit_not @0) @0) if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) { build_int_cst (TREE_TYPE (@0), -1); }) +(match_and_simplify + (plus @0 (bit_not @0)) + if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) + { build_int_cst (TREE_TYPE (@0), -1); }) Why not just: (match_and_simplify (plus @0 (bit_not @0)) { build_all_ones_cst (TREE_TYPE (@0)); }) ? Works for vector/complex, I don't know what type a bit_not_expr can have where the simplification wouldn't be true. Sure. Richard. 2014-06-03 Richard Biener rguent...@suse.de * match.pd: Generalize ~A + A - -1 simplification to all types. Index: gcc/match.pd === --- gcc/match.pd(revision 211135) +++ gcc/match.pd(working copy) @@ -185,12 +185,10 @@ to (minus @1 @0) /* ~A + A - -1 */ (match_and_simplify (plus (bit_not @0) @0) - if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) - { build_int_cst (TREE_TYPE (@0), -1); }) + { build_all_ones_cst (type); }) (match_and_simplify (plus @0 (bit_not @0)) - if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) - { build_int_cst (TREE_TYPE (@0), -1); }) + { build_all_ones_cst (type); }) /* ~A + 1 - -A */ (match_and_simplify
Re: [PATCH, i386, Pointer Bounds Checker 18/x] Expand instrumented builtin function calls
On Mon, Jun 2, 2014 at 4:51 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support for normal builtin function calls (target ones are not instrumented). The basic idea of the patch is to make call expr copy with no bounds and expand it instead. If expr is going to be emitted as a function call then original instrumented expr takes place. Handle string function separately because they are of high interest for the checker. It seems to be this should be user-controllable (always expand builtins with bounds as calls, or not), rather than deciding what is of interest yourself. From a high-level perspective the expansion path doing inline expansion should be separated from that expanding as a call (or expanding as a different call). I don't like that building of yet another GENERIC call expr in this code ... it goes very much backwards of the idea that we should expand from the GIMPLE representation. You are making such transition even harder. Can't you do that frobbing from cfgexpand.c:expand_call_stmt instead (please!)? Thanks, Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-02 Ilya Enkovich ilya.enkov...@intel.com * builtins.c: Include rtl-chkp.h, tree-chkp.h. (expand_builtin_mempcpy_args): Add orig exp as argument. Support BUILT_IN_CHKP_MEMPCPY_NOBND_NOCHK. (expand_builtin_mempcpy): Adjust expand_builtin_mempcpy_args call. (expand_builtin_stpcpy): Likewise. (expand_builtin_memset_args): Support BUILT_IN_CHKP_MEMSET_NOBND_NOCHK. (std_expand_builtin_va_start): Initialize bounds for va_list. (expand_builtin): Support instrumented calls. diff --git a/gcc/builtins.c b/gcc/builtins.c index 7357124..c0140e1 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -59,6 +59,8 @@ along with GCC; see the file COPYING3. If not see #include builtins.h #include ubsan.h #include cilk.h +#include tree-chkp.h +#include rtl-chkp.h static tree do_mpc_arg1 (tree, tree, int (*)(mpc_ptr, mpc_srcptr, mpc_rnd_t)); @@ -124,7 +126,7 @@ static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, enum machine_mode); static rtx expand_builtin_memcpy (tree, rtx); static rtx expand_builtin_mempcpy (tree, rtx, enum machine_mode); static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, - enum machine_mode, int); + enum machine_mode, int, tree); static rtx expand_builtin_strcpy (tree, rtx); static rtx expand_builtin_strcpy_args (tree, tree, rtx); static rtx expand_builtin_stpcpy (tree, rtx, enum machine_mode); @@ -3284,7 +3286,8 @@ expand_builtin_mempcpy (tree exp, rtx target, enum machine_mode mode) tree src = CALL_EXPR_ARG (exp, 1); tree len = CALL_EXPR_ARG (exp, 2); return expand_builtin_mempcpy_args (dest, src, len, - target, mode, /*endp=*/ 1); + target, mode, /*endp=*/ 1, + exp); } } @@ -3296,10 +3299,23 @@ expand_builtin_mempcpy (tree exp, rtx target, enum machine_mode mode) static rtx expand_builtin_mempcpy_args (tree dest, tree src, tree len, -rtx target, enum machine_mode mode, int endp) +rtx target, enum machine_mode mode, int endp, +tree orig_exp) { + tree fndecl = get_callee_fndecl (orig_exp); + /* If return value is ignored, transform mempcpy into memcpy. */ - if (target == const0_rtx builtin_decl_implicit_p (BUILT_IN_MEMCPY)) + if (target == const0_rtx + DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_MEMPCPY_NOBND_NOCHK + builtin_decl_implicit_p (BUILT_IN_CHKP_MEMCPY_NOBND_NOCHK)) +{ + tree fn = builtin_decl_implicit (BUILT_IN_CHKP_MEMCPY_NOBND_NOCHK); + tree result = build_call_nofold_loc (UNKNOWN_LOCATION, fn, 3, + dest, src, len); + return expand_expr (result, target, mode, EXPAND_NORMAL); +} + else if (target == const0_rtx + builtin_decl_implicit_p (BUILT_IN_MEMCPY)) { tree fn = builtin_decl_implicit (BUILT_IN_MEMCPY); tree result = build_call_nofold_loc (UNKNOWN_LOCATION, fn, 3, @@ -3484,7 +3500,8 @@ expand_builtin_stpcpy (tree exp, rtx target, enum machine_mode mode) lenp1 = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1)); ret = expand_builtin_mempcpy_args (dst, src, lenp1, -target, mode, /*endp=*/2); +target, mode, /*endp=*/2, +exp); if (ret) return ret; @@ -3778,7 +3795,8 @@ expand_builtin_memset_args (tree dest, tree val, tree len, do_libcall: fndecl = get_callee_fndecl (orig_exp); fcode =
Re: [PATCH, Pointer Bounds Checker 14/x] Pointer Bounds Checker passes
On Mon, Jun 2, 2014 at 5:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-02 17:37 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Mon, Jun 2, 2014 at 2:44 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-02 15:35 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, May 30, 2014 at 2:25 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds Pointer Bounds Checker passes. Versioning happens before early local passes. Earply local passes are split into 3 stages to have everything instrumented before any optimization applies. That looks artificial to me. If you need to split up early_local_passes then do that - nesting three IPA pass groups inside it looks odd to me. Btw - doing this in three IPA phases makes things possibly slower due to cache effects. It might be worth pursuing to move the early stage completely to the lowering pipeline. Early local passes is some special case because these passes are executed separately for new functions. I did not want to get three special passes instead and therefore made split inside. Yeah, but all passes are already executed via execute_early_local_passes, so it would be only an implementation detail. If you prefer split pass itself, I suppose pass_early_local_passes may be replaced with something like pass_build_ssa_passes + pass_chkp_instrumentation_passes + pass_ipa_chkp_produce_thunks + pass_local_optimization_passes. execute_early_local_passes would execute gimple passes lists of pass_build_ssa_passes, pass_chkp_instrumentation_passes and pass_local_optimization_passes. I think we cannot have the first stage moved into lowering passes because it should be executed for newly created functions. Well, let's defer that then. Btw, fixup_cfg only needs to run once local_pure_const was run on a callee, thus it shouldn't be neccessary to run it from the first group. OK. Will try to remove it from the first group. void pass_manager::execute_early_local_passes () { - execute_pass_list (pass_early_local_passes_1-sub); + execute_pass_list (pass_early_local_passes_1-sub-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-sub); + execute_pass_list (pass_early_local_passes_1-sub-next-next-next-sub); } is gross - it should be enough to execute the early local pass list (obsolete comment with the suggestion above). This function should call only gimple passes for cfun. Therefore we cannot call IPA passes here and has to execute each gimple passes list separately. Ok, given a different split this would then become somewhat more sane anyway. Sorry, didn't catch it. Should I try a different split or defer it? :) Please try a different split. Defer moving the first part to the lowering stage. Richard. Ilya Richard. Ilya Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c: New. * tree-chkp.h: New. * rtl-chkp.c: New. * rtl-chkp.h: New. * Makefile.in (OBJS): Add tree-chkp.o, rtl-chkp.o. (GTFILES): Add tree-chkp.c. * c-family/c.opt (fchkp-check-incomplete-type): New. (fchkp-zero-input-bounds-for-main): New. (fchkp-first-field-has-own-bounds): New. (fchkp-narrow-bounds): New. (fchkp-narrow-to-innermost-array): New. (fchkp-optimize): New. (fchkp-use-fast-string-functions): New. (fchkp-use-nochk-string-functions): New. (fchkp-use-static-bounds): New. (fchkp-use-static-const-bounds): New. (fchkp-treat-zero-dynamic-size-as-infinite): New. (fchkp-check-read): New. (fchkp-check-write): New. (fchkp-store-bounds): New. (fchkp-instrument-calls): New. (fchkp-instrument-marked-only): New. * cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add __CHKP__ macro when Pointer Bounds Checker is on. * passes.def (pass_ipa_chkp_versioning): New. (pass_before_local_optimization_passes): New. (pass_chkp_instrumentation_passes): New. (pass_ipa_chkp_produce_thunks): New. (pass_local_optimization_passes): New. (pass_chkp_opt): New. * toplev.c: include tree-chkp.h. (compile_file): Add chkp_finish_file call. * tree-pass.h (make_pass_ipa_chkp_versioning): New. (make_pass_ipa_chkp_produce_thunks): New. (make_pass_chkp): New. (make_pass_chkp_opt): New. (make_pass_before_local_optimization_passes): New. (make_pass_chkp_instrumentation_passes): New. (make_pass_local_optimization_passes): New. * tree.h (called_as_built_in): New. * builtins.c (called_as_built_in): Not static anymore. * passes.c (pass_manager::execute_early_local_passes): Execute early passes in three steps
Re: [PATCH, Pointer Bounds Checker 20/x] Follow transparent alias chains
On Mon, Jun 2, 2014 at 5:15 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, In the most case we follow transparent alias chains wne assemble names. But in some cases it is not performed. For instrumented functions it is critical and following patch fixes that. It also adds a visibility inheritance for instrtumented functions. It feels like this should be handled by the symtab code nowadays ... Honza? Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-02 Ilya Enkovich ilya.enkov...@intel.com * varasm.c: Include tree-chkp.h. (ultimate_transparent_alias_target): Move up. (make_decl_rtl): For instrumented function use name of the original decl. (assemble_start_function): Mark function as global in case it is instrumentation clone of the global function. (do_assemble_alias): Follow transparent alias chain for identifier. Check if original alias is public. (maybe_assemble_visibility): Use visibility of the original function for instrumented version. (default_unique_section): Likewise. diff --git a/gcc/varasm.c b/gcc/varasm.c index fcae2fa..d473bc7 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3. If not see #include pointer-set.h #include asan.h #include basic-block.h +#include tree-chkp.h #ifdef XCOFF_DEBUGGING_INFO #include xcoffout.h /* Needed for external data @@ -1200,6 +1201,30 @@ use_blocks_for_decl_p (tree decl) return targetm.use_blocks_for_decl_p (decl); } +/* Follow the IDENTIFIER_TRANSPARENT_ALIAS chain starting at *ALIAS + until we find an identifier that is not itself a transparent alias. + Modify the alias passed to it by reference (and all aliases on the + way to the ultimate target), such that they do not have to be + followed again, and return the ultimate target of the alias + chain. */ + +static inline tree +ultimate_transparent_alias_target (tree *alias) +{ + tree target = *alias; + + if (IDENTIFIER_TRANSPARENT_ALIAS (target)) +{ + gcc_assert (TREE_CHAIN (target)); + target = ultimate_transparent_alias_target (TREE_CHAIN (target)); + gcc_assert (! IDENTIFIER_TRANSPARENT_ALIAS (target) + ! TREE_CHAIN (target)); + *alias = target; +} + + return target; +} + /* Create the DECL_RTL for a VAR_DECL or FUNCTION_DECL. DECL should have static storage duration. In other words, it should not be an automatic variable, including PARM_DECLs. @@ -1214,6 +1239,7 @@ make_decl_rtl (tree decl) { const char *name = 0; int reg_number; + tree id; rtx x; /* Check that we are not being given an automatic variable. */ @@ -1271,7 +1297,12 @@ make_decl_rtl (tree decl) return; } - name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + id = DECL_ASSEMBLER_NAME (decl); + if (TREE_CODE (decl) == FUNCTION_DECL + cgraph_get_node (decl) + cgraph_get_node (decl)-instrumentation_clone) +ultimate_transparent_alias_target (id); + name = IDENTIFIER_POINTER (id); if (name[0] != '*' TREE_CODE (decl) != FUNCTION_DECL DECL_REGISTER (decl)) @@ -1699,7 +1730,10 @@ assemble_start_function (tree decl, const char *fnname) /* Make function name accessible from other files, if appropriate. */ - if (TREE_PUBLIC (decl)) + if (TREE_PUBLIC (decl) + || (cgraph_get_node (decl)-instrumentation_clone + cgraph_get_node (decl)-instrumented_version + TREE_PUBLIC (cgraph_get_node (decl)-instrumented_version-decl))) { notice_global_symbol (decl); @@ -2386,30 +2420,6 @@ mark_decl_referenced (tree decl) } -/* Follow the IDENTIFIER_TRANSPARENT_ALIAS chain starting at *ALIAS - until we find an identifier that is not itself a transparent alias. - Modify the alias passed to it by reference (and all aliases on the - way to the ultimate target), such that they do not have to be - followed again, and return the ultimate target of the alias - chain. */ - -static inline tree -ultimate_transparent_alias_target (tree *alias) -{ - tree target = *alias; - - if (IDENTIFIER_TRANSPARENT_ALIAS (target)) -{ - gcc_assert (TREE_CHAIN (target)); - target = ultimate_transparent_alias_target (TREE_CHAIN (target)); - gcc_assert (! IDENTIFIER_TRANSPARENT_ALIAS (target) - ! TREE_CHAIN (target)); - *alias = target; -} - - return target; -} - /* Output to FILE (an assembly file) a reference to NAME. If NAME starts with a *, the rest of NAME is output verbatim. Otherwise NAME is transformed in a target-specific way (usually by the @@ -5544,6 +5554,9 @@ vecalias_pair, va_gc *alias_pairs; void do_assemble_alias (tree decl, tree target) { + tree orig_decl = decl; + tree id; + /* Emulated
Re: [PATCH, Pointer Bounds Checker 21/x] Weakrefs output
On Mon, Jun 2, 2014 at 5:22 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch prevents output of both instrumented and not instrumented weakref variants. Shouldn't one of them be reclaimed instead at some point? Richard. Thanks, Ilya -- gcc/ 2014-06-02 Ilya Enkovich ilya.enkov...@intel.com * cgraphunit.c (output_weakrefs): If there are both instrumented and original versions, output only one of them. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index c5c..ae9e699 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2111,9 +2111,13 @@ static void output_weakrefs (void) { symtab_node *node; + cgraph_node *cnode; FOR_EACH_SYMBOL (node) if (node-alias !TREE_ASM_WRITTEN (node-decl) +(!(cnode = dyn_cast cgraph_node (node)) + || !cnode-instrumented_version + || !TREE_ASM_WRITTEN (cnode-instrumented_version-decl)) node-weakref) { tree target;
Re: [PATCH, Pointer Bounds Checker 22/x] Inline
On Mon, Jun 2, 2014 at 5:56 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support for inlining instrumented calls. Changes are mostly to support returned bounds. Also generated mem-to-mem assignments are registered to be later instrumented with appropriate bounds copy. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-02 Ilya Enkovich ilya.enkov...@intel.com * ipa-inline.c (early_inliner): Check edge has summary allocated. * tree-inline.c: Include tree-chkp.h. (declare_return_variable): Add arg holding returned bounds slot. Create and initialize returned bounds var. (remap_gimple_stmt): Handle returned bounds. Return sequence of statements instead of a single statement. (insert_init_stmt): Add declaration. (remap_gimple_seq): Adjust to new remap_gimple_stmt signature. (copy_bb): Adjust to changed return type of remap_gimple_stmt. (expand_call_inline): Handle returned bounds. Add bounds copy for generated mem to mem assignments. * tree-inline.h (copy_body_data): Add fields retbnd and assign_stmts. * cgraph.c: Include tree-chkp.h. (cgraph_redirect_edge_call_stmt_to_callee): Support returned bounds. * value-prof.c: Include tree-chkp.h. (gimple_ic): Support returned bounds. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 1f684c2..4b6996b 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3. If not see #include gimple-pretty-print.h #include expr.h #include tree-dfa.h +#include tree-chkp.h /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this. */ #include tree-pass.h @@ -1398,6 +1399,31 @@ cgraph_redirect_edge_call_stmt_to_callee (struct cgraph_edge *e) e-speculative = false; cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, false); + + /* Fix edges for BUILT_IN_CHKP_BNDRET calls attached to the +processed call stmt. */ + if (gimple_call_with_bounds_p (new_stmt) + gimple_call_lhs (new_stmt) + chkp_retbnd_call_by_val (gimple_call_lhs (e2-call_stmt))) + { + tree dresult = gimple_call_lhs (new_stmt); + tree iresult = gimple_call_lhs (e2-call_stmt); + gimple dbndret = chkp_retbnd_call_by_val (dresult); + gimple ibndret = chkp_retbnd_call_by_val (iresult); + struct cgraph_edge *iedge = cgraph_edge (e2-caller, ibndret); + struct cgraph_edge *dedge; + + if (dbndret) + { + dedge = cgraph_create_edge (iedge-caller, iedge-callee, + dbndret, e-count, + e-frequency); + dedge-frequency = compute_call_stmt_bb_frequency + (dedge-caller-decl, gimple_bb (dedge-call_stmt)); + } + iedge-frequency = compute_call_stmt_bb_frequency + (iedge-caller-decl, gimple_bb (iedge-call_stmt)); + } e-frequency = compute_call_stmt_bb_frequency (e-caller-decl, gimple_bb (e-call_stmt)); e2-frequency = compute_call_stmt_bb_frequency diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 4051819..a6fc853 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2301,11 +2301,15 @@ early_inliner (void) info that might be cleared out for newly discovered edges. */ for (edge = node-callees; edge; edge = edge-next_callee) { - struct inline_edge_summary *es = inline_edge_summary (edge); - es-call_stmt_size - = estimate_num_insns (edge-call_stmt, eni_size_weights); - es-call_stmt_time - = estimate_num_insns (edge-call_stmt, eni_time_weights); + /* We have no summary for new bound store calls yet. */ + if (inline_edge_summary_vec.length () (unsigned)edge-uid) + { + struct inline_edge_summary *es = inline_edge_summary (edge); + es-call_stmt_size + = estimate_num_insns (edge-call_stmt, eni_size_weights); + es-call_stmt_time + = estimate_num_insns (edge-call_stmt, eni_time_weights); + } if (edge-callee-decl !gimple_check_call_matching_types ( edge-call_stmt, edge-callee-decl, false)) diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 23fef90..6557a95 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include
Re: Eliminate write-only variables
On Mon, Jun 2, 2014 at 8:59 PM, Jan Hubicka hubi...@ucw.cz wrote: Yeah, I discussed this with martin today on irc. For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :( So we want deck_referred_to_by_single_function OK, I will implement this one in IPA mini-pass. It is easy. This property changes by clonning and inlining. What Martin wants to use it for? (I.e. my plan would be to compute it as last IPA pass making it useless for IPA analysispropagation) He doesn't want to use it but we talked and he said he'd have a look. Note that it's important the decls are not refered to by global initializers either. Why is a new pass necessary - can't the IPA reference maintaining code update some flag in the varpool magically? and deck_may_have_address_taken. We currently clear TREE_ADDRESSABLE for statics that have no address taken at WPA time and that ought to keep the flag cleared at ltrans (I think I even made testcase for this) What I think we could improve is to not consider string functions as ADDR operations for this analysis, i.e. it is common to only memset to 0. How may_have_address_taken would differ here? I want tree.h:may_be_aliased to return false if a decl doesn't have its address taken. We don't trust TREE_ADDRESSABLE for TREE_PUBLIC/DECL_EXTERNAL decls because other TUs may take the address. I want the check to be replaced with sth more symtab aware, that is, bring in benefits from LTO (and at the same time be not confused by statics brought public with hidden visibility). Richard. Honza
Re: [PATCH] Fix ICE due to memory corruption in SRA
On Mon, Jun 2, 2014 at 11:21 PM, Teresa Johnson tejohn...@google.com wrote: This patch fixes an ICE due to memory corruption discovered while building a large application with FDO and LIPO on the google branch. I don't have a small reproducer, but the same code appears on trunk, and I believe it could also silently result in incorrect code generation. The problem occurs if SRA is applied on a recursive call. In this case, the redirect_callers vec below contains the recursive edge from node-node. When rebuild_cgraph_edges is invoked, it will free the callee edges of node, including the one recorded in redirect_callers. In the case I looked at, after rebuilding the cgraph edges for node, the address recorded in redirect_callers now pointed to a different cgraph edge, and we later got an ICE because the (incorrect) callee that we tried to modify had the wrong number of arguments. To fix, I simply moved the collection of caller edges to after the cgraph edge rebuilding. Google ref b/15383777. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Ok. Thanks, Richard. Thanks, Teresa 2014-06-02 Teresa Johnson tejohn...@google.com * tree-sra.c (modify_function): Record caller nodes after rebuild. Index: tree-sra.c === --- tree-sra.c (revision 211139) +++ tree-sra.c (working copy) @@ -4925,12 +4925,15 @@ modify_function (struct cgraph_node *node, ipa_par { struct cgraph_node *new_node; bool cfg_changed; - veccgraph_edge_p redirect_callers = collect_callers_of_node (node); rebuild_cgraph_edges (); free_dominance_info (CDI_DOMINATORS); pop_cfun (); + /* This must be done after rebuilding cgraph edges for node above. + Otherwise any recursive calls to node that are recorded in + redirect_callers will be corrupted. */ + veccgraph_edge_p redirect_callers = collect_callers_of_node (node); new_node = cgraph_function_versioning (node, redirect_callers, NULL, NULL, false, NULL, NULL, isra); -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: RFA: PATCH to ctor_for_folding for c++/61020 (ICE with typeid)
On Tue, Jun 3, 2014 at 3:39 AM, Jason Merrill ja...@redhat.com wrote: The C++ front end wants to be able to build up objects of the various typeinfo derived classes (such as __si_class_type_info) without needing to see the full definition of the class. As part of this we build a VAR_DECL for the vtable, but never define it because we don't actually have declarations of the virtual functions, we're only using it for external references. ctor_for_folding was assuming that all vtables are defined, which broke on this case. Tested x86_64-pc-linux-gnu. OK for trunk? Ok. Thanks, Richard.
Re: [PATCH, Pointer Bounds Checker 13/x] Early versioning
On Tue, Jun 3, 2014 at 7:55 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-02 21:27 GMT+04:00 Jeff Law l...@redhat.com: On 06/02/14 04:48, Ilya Enkovich wrote: Hmm, so if I understand things correctly, src_fun has no loop structures attached, thus there's nothing to copy. Presumably at some later point we build loop structures for the copy from scratch? I suppose it is just a simple bug with absent NULL pointer check. Here is original code: /* Duplicate the loop tree, if available and wanted. */ if (loops_for_fn (src_cfun) != NULL current_loops != NULL) { copy_loops (id, entry_block_map-loop_father, get_loop (src_cfun, 0)); /* Defer to cfgcleanup to update loop-father fields of basic-blocks. */ loops_state_set (LOOPS_NEED_FIXUP); } /* If the loop tree in the source function needed fixup, mark the destination loop tree for fixup, too. */ if (loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) loops_state_set (LOOPS_NEED_FIXUP); As you may see we have check for absent loops structure in the first if-statement and no check in the second one. I hit segfault and added the check. Downthread you indicated you're not in SSA form which might explain the inconsistency here. If so, then we need to make sure that the loop df structures do get set up properly later. That is what init_data_structures pass will do for us as Richard pointed. Right? loops are set up during the CFG construction and thus are available everywhere. the df structures are set up in init_data_structures pass which is run before going into SSA form (I'd like to somehow cleanup that area). Richard. Ilya Jeff
Re: [PATCH, Pointer Bounds Checker 24/x] PRE
On Tue, Jun 3, 2014 at 9:13 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch preserves CALL_WITH_BOUNDS flag for calls during PRE. Ok. Richard. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-pre.c (create_component_ref_by_pieces_1): Store CALL_WITH_BOUNDS_P for calls. (copy_reference_ops_from_call): Restore CALL_WITH_BOUNDS_P flag. diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 1e55356..d5b9f3b 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2579,6 +2579,8 @@ create_component_ref_by_pieces_1 (basic_block block, vn_reference_t ref, (TREE_CODE (fn) == FUNCTION_DECL ? build_fold_addr_expr (fn) : fn), nargs, args); + if (currop-op2 == integer_one_node) + CALL_WITH_BOUNDS_P (folded) = true; free (args); if (sc) CALL_EXPR_STATIC_CHAIN (folded) = sc; diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index f7ec8b6..e83d9dc 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -1124,6 +1124,8 @@ copy_reference_ops_from_call (gimple call, temp.opcode = CALL_EXPR; temp.op0 = gimple_call_fn (call); temp.op1 = gimple_call_chain (call); + if (gimple_call_with_bounds_p (call)) +temp.op2 = integer_one_node; temp.off = -1; result-safe_push (temp);
Re: [PATCH, Pointer Bounds Checker 25/x] DCE
On Tue, Jun 3, 2014 at 9:23 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adjusts alloc-free removal algorithm in DCE to take into account BUILT_IN_CHKP_BNDRET call returning bounds of allocated memory. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-dce.c: Include target.h. (propagate_necessity): For free call fed by alloc check bounds are also provided by the same alloc. (eliminate_unnecessary_stmts): Handle BUILT_IN_CHKP_BNDRET used by free calls. diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 13a71ce..59a0b71 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3. If not see #include flags.h #include cfgloop.h #include tree-scalar-evolution.h +#include target.h static struct stmt_stats { @@ -778,7 +779,23 @@ propagate_necessity (bool aggressive) DECL_BUILT_IN_CLASS (def_callee) == BUILT_IN_NORMAL (DECL_FUNCTION_CODE (def_callee) == BUILT_IN_MALLOC || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC)) - continue; + { + tree retfndecl + = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); + gimple bounds_def_stmt; + tree bounds; + + /* For instrumented calls we should also check used +bounds are returned by the same allocation call. */ + if (!gimple_call_with_bounds_p (stmt) + || ((bounds = gimple_call_arg (stmt, 1)) Please provide an abstraction for this - I seem to recall seeing multiple (variants) of this. Esp. repeated calls to the target hook above look expensive to me (generally it will not be needed). I'd like to see gimple_call_bndarg (stmt) to get rid of the magic 1 and I'd like to see sth similar to gimple_call_builtin_p, for example gimple_call_chkp_p (stmt, BUILT_IN_CHKP_BNDRET) doing the target hook invocation and the fndecl check. + TREE_CODE (bounds) == SSA_NAME What about constant bounds? That shouldn't make the call necessary either? + (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) + is_gimple_call (bounds_def_stmt) + gimple_call_fndecl (bounds_def_stmt) == retfndecl + gimple_call_arg (bounds_def_stmt, 0) == ptr)) + continue; So this all becomes if (!gimple_call_with_bounds_p (stmt) || ((bounds = gimple_call_bndarg (stmt)) TREE_CODE (bounds) == SSA_NAME (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) is_gimple_call (bounds_def_stmt) gimple_call_chkp_p (bounds_def_stmt, BUILT_IN_CHKP_BNDRET) ... btw, I wonder how BUILT_IN_CHKP_BNDRET is in scope here but you need a target hook to compare the fndecl? Why isn't it enough to just check DECL_FUNCTION_CODE and DECL_BUILT_IN? Richard. + } } FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE) @@ -1204,6 +1221,23 @@ eliminate_unnecessary_stmts (void) !gimple_plf (def_stmt, STMT_NECESSARY)) gimple_set_plf (stmt, STMT_NECESSARY, false); } + /* We did not propagate necessity for free calls fed +by allocation function to allow unnecessary +alloc-free sequence elimination. For instrumented +calls it also means we did not mark bounds producer +as necessary and it is time to do it in case free +call is not removed. */ + if (gimple_call_with_bounds_p (stmt)) + { + gimple bounds_def_stmt; + tree bounds = gimple_call_arg (stmt, 1); + gcc_assert (TREE_CODE (bounds) == SSA_NAME); + bounds_def_stmt = SSA_NAME_DEF_STMT (bounds); + if (bounds_def_stmt + !gimple_plf (bounds_def_stmt, STMT_NECESSARY)) + gimple_set_plf (bounds_def_stmt, STMT_NECESSARY, + gimple_plf (stmt, STMT_NECESSARY)); + } } /* If GSI is not necessary then remove it. */ @@ -1216,6 +1250,7 @@ eliminate_unnecessary_stmts (void) else if (is_gimple_call (stmt)) { tree name = gimple_call_lhs (stmt); + tree retfn = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); notice_special_calls (stmt); @@ -1233,7 +1268,9 @@ eliminate_unnecessary_stmts (void)
Re: [PATCH][match-and-simplify]
On Tue, 3 Jun 2014, Marc Glisse wrote: On Tue, 3 Jun 2014, Richard Biener wrote: On Mon, 2 Jun 2014, Marc Glisse wrote: (plus (bit_not @0) @0) if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) { build_int_cst (TREE_TYPE (@0), -1); }) +(match_and_simplify + (plus @0 (bit_not @0)) + if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) + { build_int_cst (TREE_TYPE (@0), -1); }) Why not just: (match_and_simplify (plus @0 (bit_not @0)) { build_all_ones_cst (TREE_TYPE (@0)); }) ? Works for vector/complex, I don't know what type a bit_not_expr can have where the simplification wouldn't be true. Sure. Thanks. Sorry for not being clear enough, the same remark applies to basically all the bitwise patterns in the file. I could have saved the remark for the pre-merge RFC period. I mostly posted it now so the next batch of patterns can be directly written in a general way so you (Prathamesh or Richard) have fewer to fix later, but it can certainly wait. /* ~~x - x */ (match_and_simplify (bit_not (bit_not @0)) if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) @0) We can drop the if line. /* (x | CST1) CST2 - (x CST2) | (CST1 CST2) */ (match_and_simplify (bit_and (bit_ior @0 INTEGER_CST_P@1) INTEGER_CST_P@2) if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) (bit_ior (bit_and @0 @2) (bit_and @1 @2))) Drop the if line and replace INTEGER_CST_P with CONSTANT_CLASS_P. etc. Yeah, though we don't restrict the types that the BIT_*_EXPR operations apply to anywhere and I wonder if this all applies to FP types for example ... (and if we are not supposed to get FP types (or struct types!?) here then the checking code and the tree.def documentation should be amended). I think the conditions we have now are literally copied from tree-ssa-forwrpop.c (and the idea was to copy what that does 1:1). Richard.
Re: [PATCH, Pointer Bounds Checker 26/x] CCP
On Tue, Jun 3, 2014 at 9:38 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch allows BUILT_IN_CHKP_BNDRET as a consumer of a result of BUILT_IN_STACK_SAVE call. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-ccp.c (insert_clobber_before_stack_restore): Handle BUILT_IN_CHKP_BNDRET calls. diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c index eeefeaf..c138337 100644 --- a/gcc/tree-ssa-ccp.c +++ b/gcc/tree-ssa-ccp.c @@ -1848,7 +1848,7 @@ insert_clobber_before_stack_restore (tree saved_val, tree var, gimple_htab *visited) { gimple stmt, clobber_stmt; - tree clobber; + tree clobber, fndecl; imm_use_iterator iter; gimple_stmt_iterator i; gimple *slot; @@ -1880,6 +1880,10 @@ insert_clobber_before_stack_restore (tree saved_val, tree var, else if (gimple_assign_ssa_name_copy_p (stmt)) insert_clobber_before_stack_restore (gimple_assign_lhs (stmt), var, visited); +else if (gimple_code (stmt) == GIMPLE_CALL + (fndecl = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET)) + gimple_call_fndecl (stmt) == fndecl) + continue; Please change this to use the proper abstraction once implemented. It should be sth like else if (is_gimple_call (stmt) gimple_call_builtin_p (stmt, BUILT_IN_CHKP_BNDRET)) continue; I assume now that the builtins are not target builtins but added to builtins.def. Richard. else gcc_assert (is_gimple_debug (stmt)); }
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Tue, Jun 3, 2014 at 10:11 AM, Eric Botcazou ebotca...@adacore.com wrote: Looks mostly ok. Any reason why you are not re-creating MINUS_EXPR in build_symbolic_expr? That is, build inv - t (for non-pointers, of course)? It's more uniform and compare_values expects an INTEGER_CST on the RHS, although the patch was lacking a small tweak to the function: @@ -1205,19 +1292,23 @@ compare_values_warnv (tree val1, tree va STRIP_USELESS_TYPE_CONVERSION (val2); if ((TREE_CODE (val1) == SSA_NAME + || (TREE_CODE (val1) == NEGATE_EXPR + TREE_CODE (TREE_OPERAND (val1, 0)) == SSA_NAME) || TREE_CODE (val1) == PLUS_EXPR || TREE_CODE (val1) == MINUS_EXPR) (TREE_CODE (val2) == SSA_NAME + || (TREE_CODE (val2) == NEGATE_EXPR + TREE_CODE (TREE_OPERAND (val2, 0)) == SSA_NAME) || TREE_CODE (val2) == PLUS_EXPR || TREE_CODE (val2) == MINUS_EXPR)) { tree n1, c1, n2, c2; enum tree_code code1, code2; - /* If VAL1 and VAL2 are of the form 'NAME [+-] CST' or 'NAME', + /* If VAL1 and VAL2 are of the form '[-]NAME [+-] CST' or 'NAME', return -1 or +1 accordingly. If VAL1 and VAL2 don't use the same name, return -2. */ - if (TREE_CODE (val1) == SSA_NAME) + if (TREE_CODE (val1) == SSA_NAME || TREE_CODE (val1) == NEGATE_EXPR) { code1 = SSA_NAME; n1 = val1; @@ -1239,7 +1330,7 @@ compare_values_warnv (tree val1, tree va } } - if (TREE_CODE (val2) == SSA_NAME) + if (TREE_CODE (val2) == SSA_NAME || TREE_CODE (val2) == NEGATE_EXPR) { code2 = SSA_NAME; n2 = val2; @@ -1262,6 +1353,11 @@ compare_values_warnv (tree val1, tree va } /* Both values must use the same name. */ + if (TREE_CODE (n1) == NEGATE_EXPR TREE_CODE (n2) == NEGATE_EXPR) + { + n1 = TREE_OPERAND (n1, 0); + n2 = TREE_OPERAND (n2, 0); + } if (n1 != n2) return -2; Ah, ok - makes sense. Otherwise if a range becomes -t + inv that will no longer match get_single_symbol for further propagation? Yes, it will, NEGATE_EXPR is handled by get_single_symbol. Now spotted. Then I'm not sure if + /* Try with VR0 and [-INF, OP1]. */ + set_value_range (new_vr1, VR_RANGE, vrp_val_min (expr_type), op1, NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; + + /* Try with VR0 and [OP1, +INF]. */ + set_value_range (new_vr1, VR_RANGE, op1, vrp_val_max (expr_type), NULL); + extract_range_from_binary_expr_1 (vr, code, expr_type, vr0, new_vr1); + if (vr-type != VR_VARYING) + return; is a safe thing to do. If it does make a difference to try [-INF, OP1], [OP1, +INF] instead of just [OP1, OP1] then at least it's very suspicious ;) (or an easy missed optimization). Yes, it makes a difference and is required to eliminate half-symbolic ranges (the ones deduced from x = y). Otherwise extract_range_from_binary_expr_1 will reject the resulting range because it cannot compare the bounds. As discussed, extract_range_from_binary_expr_1 doesn't do any fiddling with the input ranges, it just computes the resulting range and see whether it can interpret it. Half-symbolic ranges cannot be interpreted and probably should not, in the general case at least, because I think it can be very delicate, so extract_range_from_binary_expr will just try all the possible combinations for PLUS and MINUS. So when computing a range for z in z = y - x; with x = [-INF, y - 1] and y = [x + 1, +INF] (deduced from !(x = y)) we fail to do sth sensible with [y, y] - [-INF, y - 1] or [x + 1, +INF] - [x, x] but we do sth with [x + 1, +INF] - [-INF, x]? That seems odd to me. With the patch we compute z to [1, +INF(OVF)] Going the [x + 1, +INF] - [x,x] path first we obtain [1, -x + INF] which fails the sanity checking cmp = compare_values (min, max); if (cmp == -2 || cmp == 1) { /* If the new range has its limits swapped around (MIN MAX), then the operation caused one of them to wrap around, mark the new range VARYING. */ set_value_range_to_varying (vr); } else set_value_range (vr, type, min, max, NULL); but clearly the same reasoning you can apply that makes trying with [-INF, x] valid (it's just enlarging the range) can be applied here, too, when computing +INF - x for the upper bound. We can safely increase that to +INF making the range valid for the above test. But I wonder what code path in the routine still relies on that sanity checking to produce a valid result (so I'd rather try removing it, or taking uncomparable bounds as a valid range). Simplest would be to simply do set_value_range (vr, type, min, max, NULL); return; and be done with that in the
Re: [PATCH, Pointer Bounds Checker 25/x] DCE
On Tue, Jun 3, 2014 at 1:36 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-03 13:45 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Jun 3, 2014 at 9:23 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adjusts alloc-free removal algorithm in DCE to take into account BUILT_IN_CHKP_BNDRET call returning bounds of allocated memory. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-dce.c: Include target.h. (propagate_necessity): For free call fed by alloc check bounds are also provided by the same alloc. (eliminate_unnecessary_stmts): Handle BUILT_IN_CHKP_BNDRET used by free calls. diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 13a71ce..59a0b71 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3. If not see #include flags.h #include cfgloop.h #include tree-scalar-evolution.h +#include target.h static struct stmt_stats { @@ -778,7 +779,23 @@ propagate_necessity (bool aggressive) DECL_BUILT_IN_CLASS (def_callee) == BUILT_IN_NORMAL (DECL_FUNCTION_CODE (def_callee) == BUILT_IN_MALLOC || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC)) - continue; + { + tree retfndecl + = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); + gimple bounds_def_stmt; + tree bounds; + + /* For instrumented calls we should also check used +bounds are returned by the same allocation call. */ + if (!gimple_call_with_bounds_p (stmt) + || ((bounds = gimple_call_arg (stmt, 1)) Please provide an abstraction for this - I seem to recall seeing multiple (variants) of this. Esp. repeated calls to the target hook above look expensive to me (generally it will not be needed). I'd like to see gimple_call_bndarg (stmt) to get rid of the magic 1 and I'd like to see sth similar to gimple_call_builtin_p, for example gimple_call_chkp_p (stmt, BUILT_IN_CHKP_BNDRET) doing the target hook invocation and the fndecl check. I do not see how to get rid of constant in gimple_call_arg call here. What semantics does proposed gimple_call_bndarg have? It has to return the first bounds arg but it's not clear from its name and function seems to be very specialized. Ah, ok. I see now, so the code is ok as-is. + TREE_CODE (bounds) == SSA_NAME What about constant bounds? That shouldn't make the call necessary either? Yep, it would be useless. So please fix. + (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) + is_gimple_call (bounds_def_stmt) + gimple_call_fndecl (bounds_def_stmt) == retfndecl + gimple_call_arg (bounds_def_stmt, 0) == ptr)) + continue; So this all becomes if (!gimple_call_with_bounds_p (stmt) || ((bounds = gimple_call_bndarg (stmt)) TREE_CODE (bounds) == SSA_NAME (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) is_gimple_call (bounds_def_stmt) gimple_call_chkp_p (bounds_def_stmt, BUILT_IN_CHKP_BNDRET) ... btw, I wonder how BUILT_IN_CHKP_BNDRET is in scope here but you need a target hook to compare the fndecl? Why isn't it enough to just check DECL_FUNCTION_CODE and DECL_BUILT_IN? Call is required because builtins for Pointer Bounds Checker are provided by target depending on available features. That is why gimple_call_builtin_p is not used. I may add new interface function into tree-chkp.h as you propose (e.g. chkp_gimple_call_builtin_p). But I do not see how it may speed-up the code. New function would still need to switch by function code and compare with proper decl which is basically what we have with target hook. I don't understand - does this mean the builtin calls are in the GIMPLE IL even though the target doesn't have the feature? Isn't the presence of the call enough? It is possible to make faster if use the fact that all chkp builtins codes (normal, not target ones) are consequent. Then we may just check the range and use code as an index in array of chkp fndecls to compare decls. Would it be OK to rely on builtin codes numbering? Yes, but that's not my point here. In the above code the target hook is called all the time, even if !gimple_call_with_bounds_p (). Providing a suitable abstraction (or simply relying on DECL_FUNCTION_CODE) should fix that. Richard. Ilya Richard. + } } FOR_EACH_SSA_TREE_OPERAND (use
[PATCH][match-and-simplify] Get rid of some stmt expressions
The following arranges for complex C-expressions (multi-stmt ones) in the transform pattern to be outlined to a separate function. This avoids the need for using stmt expressions which are not necessarily supported by all C++ host compilers. The patch doesn't address the stmt expressions being used by the matching code generator - that will be rewritten anyways. Lightly tested, I plan to install this tomorrow. Note that this also gives way of re-numbering captures before code generation so their number increases when for example walking in pre-order. And it gives an easier possibility for querying the largest capture number as well. Richard. 2014-06-03 Richard Biener rguent...@suse.de * genmatch.c (c_expr): Record cpp_tokens, the number of stmts seen and a function identifier. (c_expr::gen_gimple_transform): Generate textual form from the token vector or a call to the outlined function. (write_nary_simplifiers): Adjust. (outline_c_exprs): New function. (write_gimple): Call it. (parse_c_expr): Record a cpp_token vector. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 211131) +++ gcc/genmatch.c (working copy) @@ -190,9 +190,13 @@ struct expr : public operand struct c_expr : public operand { - c_expr (const char *code_) -: operand (OP_C_EXPR), code (code_) {} - const char *code; + c_expr (cpp_reader *r_, veccpp_token code_, unsigned nr_stmts_) +: operand (OP_C_EXPR), r (r_), code (code_), + nr_stmts (nr_stmts_), fname (NULL) {} + cpp_reader *r; + veccpp_token code; + unsigned nr_stmts; + char *fname; virtual void gen_gimple_match (FILE *, const char *, const char *) { gcc_unreachable (); } virtual void gen_gimple_transform (FILE *f, const char *); }; @@ -440,7 +444,47 @@ expr::gen_gimple_transform (FILE *f, con void c_expr::gen_gimple_transform (FILE *f, const char *) { - fputs (code, f); + /* If this expression has an outlined function variant, call it. */ + if (fname) +{ + fprintf (f, %s (type, captures), fname); + return; +} + + /* All multi-stmt expressions should have been outlined. */ + gcc_assert (nr_stmts = 1); + + for (unsigned i = 0; i code.length (); ++i) +{ + const cpp_token *token = code[i]; + + /* Replace captures for code-gen. */ + if (token-type == CPP_ATSIGN) + { + const cpp_token *n = code[i+1]; + if (n-type == CPP_NUMBER + !(n-flags PREV_WHITE)) + { + if (token-flags PREV_WHITE) + fputc (' ', f); + fprintf (f, captures[%s], n-val.str.text); + ++i; + continue; + } + } + + /* Skip a single stmt delimiter. */ + if (token-type == CPP_SEMICOLON + nr_stmts == 1) + continue; + + if (token-flags PREV_WHITE) + fputc (' ', f); + + /* Output the token as string. */ + char *tk = (char *)cpp_token_as_text (r, token); + fputs (tk, f); +} } void @@ -495,9 +539,9 @@ write_nary_simplifiers (FILE *f, vecsim } if (s-ifexpr) { - fprintf (f, if (!); + fprintf (f, if (!(); s-ifexpr-gen_gimple_transform (f, fail_label); - fprintf (f, ) goto %s;, fail_label); + fprintf (f, )) goto %s;, fail_label); } if (s-result-type == operand::OP_EXPR) { @@ -533,11 +577,82 @@ write_nary_simplifiers (FILE *f, vecsim } static void +outline_c_exprs (FILE *f, struct operand *op) +{ + if (op-type == operand::OP_C_EXPR) +{ + c_expr *e = static_cast c_expr *(op); + static unsigned fnnr = 1; + if (e-nr_stmts 1 + !e-fname) + { + e-fname = (char *)xmalloc (sizeof (cexprfn) + 4); + sprintf (e-fname, cexprfn%d, fnnr); + fprintf (f, static tree cexprfn%d (tree type, tree *captures)\n, + fnnr); + fprintf (f, {\n); + unsigned stmt_nr = 1; + for (unsigned i = 0; i e-code.length (); ++i) + { + const cpp_token *token = e-code[i]; + + /* Replace captures for code-gen. */ + if (token-type == CPP_ATSIGN) + { + const cpp_token *n = e-code[i+1]; + if (n-type == CPP_NUMBER + !(n-flags PREV_WHITE)) + { + if (token-flags PREV_WHITE) + fputc (' ', f); + fprintf (f, captures[%s], n-val.str.text); + ++i; + continue; + } + } + + if (token-flags PREV_WHITE) + fputc (' ', f); + + /* Output the token as string. */ + char *tk = (char *)cpp_token_as_text (e-r, token); + fputs (tk, f); + + if (token-type
Re: [PATCH, Pointer Bounds Checker 25/x] DCE
On Tue, Jun 3, 2014 at 3:27 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-03 15:56 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Jun 3, 2014 at 1:36 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-03 13:45 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Jun 3, 2014 at 9:23 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adjusts alloc-free removal algorithm in DCE to take into account BUILT_IN_CHKP_BNDRET call returning bounds of allocated memory. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-dce.c: Include target.h. (propagate_necessity): For free call fed by alloc check bounds are also provided by the same alloc. (eliminate_unnecessary_stmts): Handle BUILT_IN_CHKP_BNDRET used by free calls. diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 13a71ce..59a0b71 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3. If not see #include flags.h #include cfgloop.h #include tree-scalar-evolution.h +#include target.h static struct stmt_stats { @@ -778,7 +779,23 @@ propagate_necessity (bool aggressive) DECL_BUILT_IN_CLASS (def_callee) == BUILT_IN_NORMAL (DECL_FUNCTION_CODE (def_callee) == BUILT_IN_MALLOC || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC)) - continue; + { + tree retfndecl + = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); + gimple bounds_def_stmt; + tree bounds; + + /* For instrumented calls we should also check used +bounds are returned by the same allocation call. */ + if (!gimple_call_with_bounds_p (stmt) + || ((bounds = gimple_call_arg (stmt, 1)) Please provide an abstraction for this - I seem to recall seeing multiple (variants) of this. Esp. repeated calls to the target hook above look expensive to me (generally it will not be needed). I'd like to see gimple_call_bndarg (stmt) to get rid of the magic 1 and I'd like to see sth similar to gimple_call_builtin_p, for example gimple_call_chkp_p (stmt, BUILT_IN_CHKP_BNDRET) doing the target hook invocation and the fndecl check. I do not see how to get rid of constant in gimple_call_arg call here. What semantics does proposed gimple_call_bndarg have? It has to return the first bounds arg but it's not clear from its name and function seems to be very specialized. Ah, ok. I see now, so the code is ok as-is. + TREE_CODE (bounds) == SSA_NAME What about constant bounds? That shouldn't make the call necessary either? Yep, it would be useless. So please fix. + (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) + is_gimple_call (bounds_def_stmt) + gimple_call_fndecl (bounds_def_stmt) == retfndecl + gimple_call_arg (bounds_def_stmt, 0) == ptr)) + continue; So this all becomes if (!gimple_call_with_bounds_p (stmt) || ((bounds = gimple_call_bndarg (stmt)) TREE_CODE (bounds) == SSA_NAME (bounds_def_stmt = SSA_NAME_DEF_STMT (bounds)) is_gimple_call (bounds_def_stmt) gimple_call_chkp_p (bounds_def_stmt, BUILT_IN_CHKP_BNDRET) ... btw, I wonder how BUILT_IN_CHKP_BNDRET is in scope here but you need a target hook to compare the fndecl? Why isn't it enough to just check DECL_FUNCTION_CODE and DECL_BUILT_IN? Call is required because builtins for Pointer Bounds Checker are provided by target depending on available features. That is why gimple_call_builtin_p is not used. I may add new interface function into tree-chkp.h as you propose (e.g. chkp_gimple_call_builtin_p). But I do not see how it may speed-up the code. New function would still need to switch by function code and compare with proper decl which is basically what we have with target hook. I don't understand - does this mean the builtin calls are in the GIMPLE IL even though the target doesn't have the feature? Isn't the presence of the call enough? Call is generated using function decl provided by target. Therefore we do not know its code and has to compare using fndecl comparison. So they are target specific builtins after all? IMNSHO it looks wrong that the middle-end has to care about them in that case. Why can't the builtins itself be target independent? It is possible to make faster if use the fact that all chkp builtins codes (normal, not target ones) are consequent. Then we
Re: PR61385: phiopt drops some PHIs
On Tue, Jun 3, 2014 at 3:48 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, apparently it is possible to have a PHI in the middle basic block of value_replacement, so I need to move it as well when I move the statement and remove the block. Bootstrap and testsuite on x86_64-linux-gnu (re-running for various reasons but they had completed successfully yesterday). 2014-06-03 Marc Glisse marc.gli...@inria.fr PR tree-optimization/61385 gcc/ * tree-ssa-phiopt.c (value_replacement): Copy PHI nodes before removing the basic block. gcc/testsuite/ * gcc.dg/tree-ssa/pr61385.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/pr61385.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr61385.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr61385.c (working copy) @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +#define assert(x) if (!(x)) __builtin_abort () + +int a, b, c, d, e, f, g; + +int +fn1 () +{ + int *h = c; + for (; c 1; c++) +{ + int *i = a, *k = a; + f = 0; + if (b) + return 0; + if (*h) + { + int **j = i; + *j = 0; + d = 0; + } + else + g = e = 0; + if (*h) + { + int **l = k; + *l = g; + } + d = *h; + assert (k == a || k); + assert (i); +} + return 0; +} + +int +main () +{ + fn1 (); + return 0; +} Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 211178) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -877,20 +877,39 @@ value_replacement (basic_block cond_bb, operand_equal_for_phi_arg_p (rhs2, cond_lhs) neutral_element_p (code_def, cond_rhs, true)) || (arg1 == rhs2 operand_equal_for_phi_arg_p (rhs1, cond_lhs) neutral_element_p (code_def, cond_rhs, false)) || (operand_equal_for_phi_arg_p (arg1, cond_rhs) (operand_equal_for_phi_arg_p (rhs2, cond_lhs) || operand_equal_for_phi_arg_p (rhs1, cond_lhs)) absorbing_element_p (code_def, cond_rhs { + /* Move potential PHI nodes. */ + gimple_stmt_iterator psi = gsi_start_phis (middle_bb); + while (!gsi_end_p (psi)) + { + gimple phi_moving = gsi_stmt (psi); + gimple newphi = create_phi_node (gimple_phi_result (phi_moving), + cond_bb); + int nargs = cond_bb-preds-length(); + location_t loc = gimple_phi_arg_location (phi_moving, 0); + tree phi_arg = gimple_phi_arg_def (phi_moving, 0); + for (int i = 0; i nargs; ++i) + { + edge e = (*cond_bb-preds)[i]; + add_phi_arg (newphi, phi_arg, e, loc); All arguments get the same value (and the PHI in middle-bb is surely a singleton?), so it's way better to re-materialize the PHI as a gimple assignment at the start of the basic block. If they are singletons (which I expect), the easiest way is to propagate their single argument into all uses and simply remove them. Richard. + } + update_stmt (newphi); + gsi_remove (psi, false); + } + gsi = gsi_for_stmt (cond); gimple_stmt_iterator gsi_from = gsi_for_stmt (assign); gsi_move_before (gsi_from, gsi); replace_phi_edge_with_variable (cond_bb, e1, phi, lhs); return 2; } return 0; }
Re: Eliminate write-only variables
On Tue, Jun 3, 2014 at 6:19 PM, Jan Hubicka hubi...@ucw.cz wrote: On Mon, Jun 2, 2014 at 8:59 PM, Jan Hubicka hubi...@ucw.cz wrote: Yeah, I discussed this with martin today on irc. For aliasing we'd like to know whether a decl possibly has its address taken. Currently we only trust TREE_ADDRESSABLE for statics - and lto might change those to hidden visibility public :( So we want deck_referred_to_by_single_function OK, I will implement this one in IPA mini-pass. It is easy. This property changes by clonning and inlining. What Martin wants to use it for? (I.e. my plan would be to compute it as last IPA pass making it useless for IPA analysispropagation) He doesn't want to use it but we talked and he said he'd have a look. Note that it's important the decls are not refered to by global initializers either. Why is a new pass necessary - can't the IPA reference maintaining code update some flag in the varpool magically? If the code to add/remove reference was updating the flag, it would became out of date as we remove callgraph during the late compilation (we do not keep references for functions we already compiled). I do not want the flags to be computed before end of IPA queue so they won't become out of date as we clone/inline. We basically need to walk references and check that they are all functions either one given function F or a clones inlined to it. Assuming that function does not have unbounded number of references to given variale, this is basically constant time test. and deck_may_have_address_taken. We currently clear TREE_ADDRESSABLE for statics that have no address taken at WPA time and that ought to keep the flag cleared at ltrans (I think I even made testcase for this) What I think we could improve is to not consider string functions as ADDR operations for this analysis, i.e. it is common to only memset to 0. How may_have_address_taken would differ here? I want tree.h:may_be_aliased to return false if a decl doesn't have its address taken. We don't trust TREE_ADDRESSABLE for TREE_PUBLIC/DECL_EXTERNAL decls because other TUs may take the address. I want the check to be replaced with sth more symtab aware, that is, bring in benefits from LTO (and at the same time be not confused by statics brought public with hidden visibility). I see, are you sure you need to ignore TREE_ADDRESSABLE for PUBLIC/EXTERNAL? I belive frontends are resposible to set them for all data that may have address taken (even externally) and IPA code maintains it - we clear the flagonly for variables that are static. Not sure - we've always not trusted this flag on public/externals. Probably because there could be aliases to them that have their address taken? (well, that's true for locals as well ...) That said, having a varpool way of querying whether a decl may be aliased by a pointer would be nice (with bonus points of handling the alias case). Or we can just set the flag to true for externally visible vars ealry in IPA queue if this is false. Well, let me do a simple check (remove the restriction and bootstrap/test). Richard. Honza Richard. Honza
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 4, 2014 at 7:59 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: When bswap replace a bitwise expression involving a memory source by a load possibly followed by a bswap, it is possible that the load has a size smaller than that of the target expression where the bitwise expression was affected. So some sort of cast is needed. But there might also be a difference between the size of the expression that was affected and the size of the load. So 3 different sizes might be involved. Consider the following example from binutils: bfd_vma bfd_getl16 (const void *p) { const bfd_byte *addr = (*const bfd_byte *) p; return (addr[1] 8) | addr[0]; } Here the load will have a size of 16 bits, while the bitwise expression is an int (don't ask me why) but is returned as a 64 bits quantity (bfd_vma maps to the size of host registers). In this case we need 2 separate cast. One from 16 bit quantity to int with zero extension as the high bits are 0. It is always a zero extension because bswap will not do anything in the presence of a sign extension as depending on the initial value the result would be different (maybe a bswap if positive number and random value if negative number). Then, we need to cast respecting the extension that would have happen had we not replaced the bitwise extension. Here since the bitwise expression is int, it means we sign extend and then consider the content as being unsigned (bfd_vma is an unsigned quantity). When a bswap is necessary we need to do this double cast after doing the bswap as the bswap must be done on the loaded value since a that's the size expected by the bswap builtin. Finally, this patch also forbit any sign extension *in* the bitwise expression as the result of the expression would then be unpredictable (depend on the initial value). The patch works this way: 1) prevent size extension of a bitwise expression 2) record the type of the bitwise expression instead of its size (the size can be determined from the type) 3) use this type to perform a double cast as explained above 2014-05-30 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/61306 * tree-ssa-math-opts.c (struct symbolic_number): Store type of expression instead of its size. (do_shift_rotate): Adapt to change in struct symbolic_number. (verify_symbolic_number_p): Likewise. (init_symbolic_number): Likewise. (find_bswap_or_nop_1): Likewise. Also prevent optimization when the result of the expressions is unpredictable due to sign extension. (convert_via): New function to deal with the casting involved from the loaded value to the target SSA. (bswap_replace): Rename load_type in range_type to reflect it's the type the memory accessed shall have before being casted. Select load type according to whether a bswap needs to be done. Cast first to rhs with zero extend and then to lhs with sign extend to keep semantic of original stmts. (pass_optimize_bswap::execute): Adapt to change in struct symbolic_number. Decide if the range accessed should be signed or unsigned before being casted to lhs type based on rhs type and size. 2014-05-29 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/pr61306.c: New test. Patch is in attachment. Is this ok for trunk? I'd rather change the comparisons - if (n-size (int)sizeof (int64_t)) -n-n = ((uint64_t)1 (n-size * BITS_PER_UNIT)) - 1; + if (bitsize / BITS_PER_UNIT (int)sizeof (int64_t)) +n-n = ((uint64_t)1 bitsize) - 1; to work in bits, thus bitsize 8 * sizeof (int64_t) (note that using BITS_PER_UNIT is bogus here - you are dealing with 8-bit bytes on the host, not whatever the target uses). Otherwise it smells like truncation may lose bits (probably not in practice). + /* Sign extension: result is dependent on the value. */ + if (!TYPE_UNSIGNED (type) !TYPE_UNSIGNED (n-type) +type_size TYPE_PRECISION (n-type)) + return NULL_TREE; whether it's sign-extended depends solely on the sign of the converted entity, so I'm not sure why you are restricting this to signed n-type. Consider signed char *p; ((unsigned int)p[0]) 8 | ((unsigned int)p[1]) 16 the loads are still sign-extended but n-type is unsigned. I'm failing to get why you possibly need two casts ... you should only need one, from the bswap/load result to the final type (zero-extended as you say - so the load type should simply be unsigned which it is already). So I think that the testcase in the patch is fixed already by doing the n-type change (and a proper sign-extension detection). Can you please split that part out? That range_type and convert_via looks wrong and unnecessary to me, and it doesn't look like you have a testcase excercising it? Thanks, Richard.
Re: [PATCH] Fix PR61320: disable bswap for unaligned access on SLOW_UNALIGNED_ACCESS targets
On Wed, Jun 4, 2014 at 8:07 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi there, It seems from PR61320 that the bswap pass causes some problems when it replaces an OR expression by an unaligned access. Although it's not clear yet why the unaligned load does not go through the extract_bit_field codepath, it is necessary to provide a solution as this prevent sparc from bootstrapping. This patch takes the simple approach of cancelling the bswap optimization when the load that would replace the OR expression would be an unaligned load and the target has SLOW_UNALIGNED_ACCESS. In the long run this patch should be reverted as soon as the root cause of the current problem is found. The patch also rewrite the test to take into account the fact that the optimization is not done for some target. It also add some alignment hint so that more tests can be run even on STRICT_ALIGNMENT targets. ChangeLog changes are: *** gcc/ChangeLog *** 2014-06-03 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/61320 * tree-ssa-math-opts.c (bswap_replace): Cancel bswap optimization when load is unaligned and would be slow for this target. *** gcc/testsuite/ChangeLog *** 2014-06-03 Thomas Preud'homme thomas.preudho...@arm.com * gcc.dg/optimize-bswaphi-1.c: Make variables global when possible to enforce correct alignment and make the test work better on STRICT_ALIGNMENT targets. Also adjust dg-final selectors when alignment cannot be controlled (read_*_3 ()). * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.dg/optimize-bswapdi-3.c: Likewise. Bootstrapped on x86_64-linux-gnu and no regression found in the testsuite. Patch is in attachment. It applies on top of the one for PR61306 in the email titled [PATCH] Fix PR61306: improve handling of sign and cast in bswap but can be trivially modified to apply directly on trunk should that patch (PR61306) need to be improved. Is this ok for trunk? I'd rather wait for the other bug to be fixed properly (given this patch is ontop of that). Btw, + if (align GET_MODE_ALIGNMENT (TYPE_MODE (load_type)) + SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align)) return false; goes to the next line. I too belive that presenting a wider possibly unaligned load to the expander is good and gives it a chance to produce optimal code for the target. I think that PR61320 is probably a dup of PR61306. Still adding the align GET_MODE_ALIGNMENT (TYPE_MODE (load_type)) condition is correct here, aligned loads are fine even in the bswap code-path. Richard. Best regards, Thomas
Re: [PATCH, i386, Pointer Bounds Checker 10/x] Partitions
On Tue, Jun 3, 2014 at 11:24 PM, Jeff Law l...@redhat.com wrote: On 06/02/14 05:41, Richard Biener wrote: this should be all moved to the symbol table level. (and IDENTIFIER_NODE shouldn't have to have tree_common.chain and thus become smaller). Which ought to be independent of the pointer checking work. There's been some proposals for making first class symbol tables in GCC, but nothing that's made any progress at this point. Maybe I'll sick David on it at some point. You probably missed that we now have such symbol table (ok, it still lacks labels and CONST_DECLs). Richard. jeff
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Christophe Lyon [mailto:christophe.l...@linaro.org] On 29 May 2014 11:58, Thomas Preud'homme thomas.preudho...@arm.com wrote: Does the patch solve the problem you had? What about you Christophe? Hi Thomas, After a quick test, it looks OK to me. Great. What about you Andreas? Does it work fine for you? If yes, is this ok for trunk? Ok. Thanks, Richard. Best regards, Thomas
Re: [PATCH, Pointer Bounds Checker 6/x] New static constructor types
On Wed, Apr 16, 2014 at 2:33 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch add new static constructor types used by Pointer Bounds Checker. It was approved earlier for 4.9 and I'll assume patch is OK for trunk if no objections arise. Patch was bootstrapped and tested for linux-x86_64. Thanks, Ilya -- gcc/ 2014-04-16 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (cgraph_build_static_cdtor_1): Support contructors with chkp ctor and bnd_legacy attributes. * gimplify.c (gimplify_init_constructor): Avoid infinite loop during gimplification of bounds initializer. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 7441784..67ab515 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -3803,10 +3803,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, individual element initialization. Also don't do this for small all-zero initializers (which aren't big enough to merit clearing), and don't try to make bitwise copies of - TREE_ADDRESSABLE types. */ + TREE_ADDRESSABLE types. + + We cannot apply such transformation when compiling chkp static + initializer because creation of initializer image in the memory + will require static initialization of bounds for it. It should + result in another gimplification of similar initializer and we + may fall into infinite loop. */ if (valid_const_initializer !(cleared || num_nonzero_elements == 0) -!TREE_ADDRESSABLE (type)) +!TREE_ADDRESSABLE (type) +(!current_function_decl + || !lookup_attribute (chkp ctor, + DECL_ATTRIBUTES (current_function_decl Simply make the type TREE_ADDRESSABLE? { HOST_WIDE_INT size = int_size_in_bytes (type); unsigned int align; diff --git a/gcc/ipa.c b/gcc/ipa.c index 26e9b03..5ab3aed 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -1345,9 +1345,11 @@ make_pass_ipa_whole_program_visibility (gcc::context *ctxt) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. + be one of 'I' (for a constructor), 'D' (for a destructor), 'P' + (for chp static vars constructor) or 'B' (for chkp static bounds + constructor). BODY is a STATEMENT_LIST containing GENERIC + statements. PRIORITY is the initialization priority for this + constructor or destructor. FINAL specify whether the externally visible name for collect2 should be produced. */ @@ -1406,6 +1408,20 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) DECL_STATIC_CONSTRUCTOR (decl) = 1; decl_init_priority_insert (decl, priority); break; +case 'P': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (chkp ctor), + NULL, + NULL_TREE); Ick. Please try to avoid using attributes for this. Rather adjust the caller of this function to set a flag in the cgraph node. So I don't like this patch at all. Richard. + decl_init_priority_insert (decl, priority); + break; +case 'B': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (bnd_legacy), + NULL, + NULL_TREE); + decl_init_priority_insert (decl, priority); + break; case 'D': DECL_STATIC_DESTRUCTOR (decl) = 1; decl_fini_priority_insert (decl, priority); @@ -1423,9 +1439,11 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. */ + be one of 'I' (for a constructor), 'D' (for a destructor), 'P' + (for chkp static vars constructor) or 'B' (for chkp static bounds + constructor). BODY is a STATEMENT_LIST containing GENERIC + statements. PRIORITY is the initialization priority for this + constructor or destructor. */ void cgraph_build_static_cdtor (char which, tree body, int priority)
Re: [PATCH, Pointer Bounds Checker 13/x] Early versioning
On Wed, Jun 4, 2014 at 8:46 AM, Jeff Law l...@redhat.com wrote: On 06/03/14 03:29, Richard Biener wrote: On Tue, Jun 3, 2014 at 7:55 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-02 21:27 GMT+04:00 Jeff Law l...@redhat.com: On 06/02/14 04:48, Ilya Enkovich wrote: Hmm, so if I understand things correctly, src_fun has no loop structures attached, thus there's nothing to copy. Presumably at some later point we build loop structures for the copy from scratch? I suppose it is just a simple bug with absent NULL pointer check. Here is original code: /* Duplicate the loop tree, if available and wanted. */ if (loops_for_fn (src_cfun) != NULL current_loops != NULL) { copy_loops (id, entry_block_map-loop_father, get_loop (src_cfun, 0)); /* Defer to cfgcleanup to update loop-father fields of basic-blocks. */ loops_state_set (LOOPS_NEED_FIXUP); } /* If the loop tree in the source function needed fixup, mark the destination loop tree for fixup, too. */ if (loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) loops_state_set (LOOPS_NEED_FIXUP); As you may see we have check for absent loops structure in the first if-statement and no check in the second one. I hit segfault and added the check. Downthread you indicated you're not in SSA form which might explain the inconsistency here. If so, then we need to make sure that the loop df structures do get set up properly later. That is what init_data_structures pass will do for us as Richard pointed. Right? loops are set up during the CFG construction and thus are available everywhere. Which would argue that the hunk that checks for the loop tree's existence before accessing it should not be needed. Ilya -- is it possible you hit this prior to Richi's work to build the loop structures as part of CFG construction and maintain them throughout compilation. That's likely. It's still on my list of janitor things to do to remove all those if (current_loops) checks ... the df structures are set up in init_data_structures pass which is run before going into SSA form (I'd like to somehow cleanup that area). OK. So this part should be approved since we've established this code is running prior to going into SSA form. Agreed. Thanks, Richard. jeff
Re: PR61385: phiopt drops some PHIs
On Wed, Jun 4, 2014 at 9:59 AM, Jeff Law l...@redhat.com wrote: On 06/04/14 01:46, Marc Glisse wrote: On Tue, 3 Jun 2014, Jeff Law wrote: On 06/03/14 08:08, Richard Biener wrote: All arguments get the same value (and the PHI in middle-bb is surely a singleton?), so it's way better to re-materialize the PHI as a gimple assignment at the start of the basic block. If they are singletons (which I expect), the easiest way is to propagate their single argument into all uses and simply remove them. I was all for propagating, but replace_uses_by calls fold_stmt on the using statements (that's normal). If I first move the statement and then replace, I may call fold_stmt on a stmt that isn't dominated by the defs of all its arguments. If I first replace, the statement I am supposed to move may have been modified so I need to find out what happened to it. In the end, the assignment seems easier and safer (and not too bad since this case almost never happens in practice). We certainly want to get rid of them :-) I'd start by finding out which pass left the degenerate, ensure it's not marked as SSA_NAME_OCCURS_IN_ABNORMAL_PHI, then propagate it away. If it's a systemic problem that a particular pass can leave degenerate PHIs, then you might want to schedule the phi-only propagator to run after that pass. It does const/copy propagation seeding from degenerate PHIs, so it ought to be very fast. This one comes from VRP2 apparently. But it isn't the only pass that produces singleton phis. I didn't look at them closely, but I assume LIM is normal, DOM maybe less so, etc. DOM VRP can create them via jump threading. Right now we only fire up the pass_phi_only_cprop after DOM though. Right, if we want to get rid of the degenerates early we have to run pass_phi_only_cprop after VRP as well. DOM runs not too long after LIM and it will propagate away the degenerates as well. Not sure where LIM would create degenerate PHIs, but I guess you simply see loop-closed PHIs here which are degenerate. One could argue that propagating away the degenerates ought to occur immediately before any phi-opt pass. I'd say we should run phi_only_cprop after the first VRPs instead and move the 2nd VRP before the existing phi_only_cprop pass. I'll do the appropriate change. bootstrap+testsuite on x86_64-linux-gnu as usual. 2014-06-04 Marc Glisse marc.gli...@inria.fr PR tree-optimization/61385 gcc/ * tree-ssa-phiopt.c (value_replacement): Replace singleton PHIs with assignments. gcc/testsuite/ * gcc.dg/tree-ssa/pr61385.c: New file. Don't you have to verify neither the source nor the destination are marked with SSA_NAME_OCCURS_IN_ABNORMAL_PHI? And if so, you can't propagate away the PHI (which may mean we need to avoid the optimization in that case). I also believe that when turning a PHI into an assignment you have to preserve the parallel nature of the PHI. If the result of one PHI is used as an input in another PHI, then you have to work harder (see the out-of-ssa code). Perhaps this can never be the case with a degenerate. Hmm, indeed that can happen. So I'd say we should instead simply bail out if the middle-bb has a PHI node. Simple and safe. And with the pass ordering fixes above we shouldn't see degenerates anyway. Richard. Jeff
[PATCH][1/2] Improve DSE
This first patch improves DSE by improving the handling of references with non-invariant addresses such as a-b[i].c in stmt_kills_ref_p_1. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-06-04 Richard Biener rguent...@suse.de * tree-ssa-alias.c (stmt_may_clobber_ref_p): Improve handling of accesses with non-invariant address. * gcc.dg/tree-ssa/ssa-dse-16.c: New testcase. Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c(revision 211213) --- gcc/tree-ssa-alias.c(working copy) *** stmt_may_clobber_ref_p (gimple stmt, tre *** 2174,2184 static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { ! /* For a must-alias check we need to be able to constrain ! the access properly. ! FIXME: except for BUILTIN_FREE. */ ! if (!ao_ref_base (ref) ! || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) --- 2174,2180 static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { ! if (!ao_ref_base (ref)) return false; if (gimple_has_lhs (stmt) *** stmt_kills_ref_p_1 (gimple stmt, ao_ref *** 2191,2199 might throw as well. */ !stmt_can_throw_internal (stmt)) { ! tree base, lhs = gimple_get_lhs (stmt); HOST_WIDE_INT size, offset, max_size, ref_offset = ref-offset; ! base = get_ref_base_and_extent (lhs, offset, size, max_size); /* We can get MEM[symbol: sZ, index: D.8862_1] here, so base == ref-base does not always hold. */ if (base != ref-base) --- 2187,2237 might throw as well. */ !stmt_can_throw_internal (stmt)) { ! tree lhs = gimple_get_lhs (stmt); ! /* If LHS is literally a base of the access we are done. */ ! if (ref-ref) ! { ! tree base = ref-ref; ! if (handled_component_p (base)) ! { ! tree saved_lhs0 = NULL_TREE; ! if (handled_component_p (lhs)) ! { ! saved_lhs0 = TREE_OPERAND (lhs, 0); ! TREE_OPERAND (lhs, 0) = integer_zero_node; ! } ! do ! { ! /* Just compare the outermost handled component, if !they are equal we have found a possible common !base. */ ! tree saved_base0 = TREE_OPERAND (base, 0); ! TREE_OPERAND (base, 0) = integer_zero_node; ! bool res = operand_equal_p (lhs, base, 0); ! TREE_OPERAND (base, 0) = saved_base0; ! if (res) ! break; ! /* Otherwise drop handled components of the access. */ ! base = saved_base0; ! } ! while (handled_component_p (base)); ! if (saved_lhs0) ! TREE_OPERAND (lhs, 0) = saved_lhs0; ! } ! /* Finally check if lhs is equal or equal to the base candidate !of the access. */ ! if (operand_equal_p (lhs, base, 0)) ! return true; ! } ! ! /* Now look for non-literal equal bases with the restriction of ! handling constant offset and size. */ ! /* For a must-alias check we need to be able to constrain !the access properly. */ ! if (ref-max_size == -1) ! return false; HOST_WIDE_INT size, offset, max_size, ref_offset = ref-offset; ! tree base = get_ref_base_and_extent (lhs, offset, size, max_size); /* We can get MEM[symbol: sZ, index: D.8862_1] here, so base == ref-base does not always hold. */ if (base != ref-base) *** stmt_kills_ref_p_1 (gimple stmt, ao_ref *** 2261,2266 --- 2299,2308 case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { + /* For a must-alias check we need to be able to constrain +the access properly. */ + if (ref-max_size == -1) + return false; tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); if (!tree_fits_shwi_p (len)) Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-16.c === --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-16.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-16.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-dse1-details } */ + +struct X { struct A { int a[2]; } b[10]; }; +void foo (struct X *x, int i) +{ + struct A a; + /* Confuse SRA here with using a variable index, otherwise it will mess +with the IL too much. */ + a.a[i] = 3; + a.a[1] = 0; + /* The following store is dead. */ + x-b[i].a[0] = 1; + x-b[i] = a; +} + +/* { dg-final { scan-tree-dump Deleted dead store dse1
[PATCH][2/2] Improve DSE
This improves DSE by not stopping to look for a killing stmt at the first may-alias but instead continue looking for a kill (well, until we hit a may-use or run into limitations of the walker). This fixes some embarrassing missing dead store eliminations (it's also more expensive, but the walking is limited thus it's O(1) anyway (haha)). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-06-04 Richard Biener rguent...@suse.de PR tree-optimization/60098 * tree-ssa-dse.c (dse_possible_dead_store_p): Walk until we hit a kill. (dse_optimize_stmt): Simplify, now that we found a kill earlier. * gcc.dg/tree-ssa/ssa-dse-15.c: New testcase. Index: gcc/tree-ssa-dse.c === *** gcc/tree-ssa-dse.c.orig 2014-05-06 16:15:43.030962178 +0200 --- gcc/tree-ssa-dse.c 2014-06-04 10:12:33.584954818 +0200 *** dse_possible_dead_store_p (gimple stmt, *** 198,207 break; } } ! /* We deliberately stop on clobbering statements and not only on ! killing ones to make walking cheaper. Otherwise we can just ! continue walking until both stores have equal reference trees. */ ! while (!stmt_may_clobber_ref_p (temp, gimple_assign_lhs (stmt))); *use_stmt = temp; --- 198,205 break; } } ! /* Continue walking until we reach a kill. */ ! while (!stmt_kills_ref_p (temp, gimple_assign_lhs (stmt))); *use_stmt = temp; *** dse_optimize_stmt (gimple_stmt_iterator *** 248,304 if (!dse_possible_dead_store_p (stmt, use_stmt)) return; /* But only remove *this_2(D) ={v} {CLOBBER} if killed by another clobber stmt. */ if (gimple_clobber_p (stmt) !gimple_clobber_p (use_stmt)) return; ! /* If we have precisely one immediate use at this point and the !stores are to the same memory location or there is a chain of !virtual uses from stmt and the stmt which stores to that same !memory location, then we may have found redundant store. */ ! if ((gimple_has_lhs (use_stmt) ! (operand_equal_p (gimple_assign_lhs (stmt), ! gimple_get_lhs (use_stmt), 0))) ! || stmt_kills_ref_p (use_stmt, gimple_assign_lhs (stmt))) ! { ! basic_block bb; ! /* If use_stmt is or might be a nop assignment, e.g. for !struct { ... } S a, b, *p; ... !b = a; b = b; !or !b = a; b = *p; where p might be b, !or !*p = a; *p = b; where p might be b, !or !*p = *u; *p = *v; where p might be v, then USE_STMT !acts as a use as well as definition, so store in STMT !is not dead. */ ! if (stmt != use_stmt ! ref_maybe_used_by_stmt_p (use_stmt, gimple_assign_lhs (stmt))) ! return; ! ! if (dump_file (dump_flags TDF_DETAILS)) ! { ! fprintf (dump_file, Deleted dead store '); ! print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0); ! fprintf (dump_file, '\n); ! } ! ! /* Then we need to fix the operand of the consuming stmt. */ ! unlink_stmt_vdef (stmt); ! ! /* Remove the dead store. */ ! bb = gimple_bb (stmt); ! if (gsi_remove (gsi, true)) ! bitmap_set_bit (need_eh_cleanup, bb-index); ! ! /* And release any SSA_NAMEs set in this statement back to the !SSA_NAME manager. */ ! release_defs (stmt); } } } --- 246,294 if (!dse_possible_dead_store_p (stmt, use_stmt)) return; + /* Now we know that use_stmt kills the LHS of stmt. */ + /* But only remove *this_2(D) ={v} {CLOBBER} if killed by another clobber stmt. */ if (gimple_clobber_p (stmt) !gimple_clobber_p (use_stmt)) return; ! basic_block bb; ! ! /* If use_stmt is or might be a nop assignment, e.g. for ! struct { ... } S a, b, *p; ... ! b = a; b = b; !or ! b = a; b = *p; where p might be b, !or !*p = a; *p = b; where p might be b, !or !*p = *u; *p = *v; where p might be v, then USE_STMT ! acts as a use as well as definition, so store in STMT ! is not dead. */ ! if (stmt != use_stmt ! ref_maybe_used_by_stmt_p (use_stmt, gimple_assign_lhs (stmt))) ! return; ! if (dump_file (dump_flags TDF_DETAILS)) ! { ! fprintf (dump_file, Deleted dead store '); ! print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0); ! fprintf (dump_file, '\n); } + + /* Then we need to fix the operand of the consuming stmt
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 4, 2014 at 12:42 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] I'd rather change the comparisons - if (n-size (int)sizeof (int64_t)) -n-n = ((uint64_t)1 (n-size * BITS_PER_UNIT)) - 1; + if (bitsize / BITS_PER_UNIT (int)sizeof (int64_t)) +n-n = ((uint64_t)1 bitsize) - 1; to work in bits, thus bitsize 8 * sizeof (int64_t) (note that using BITS_PER_UNIT is bogus here - you are dealing with 8-bit bytes on the host, not whatever the target uses). Otherwise it smells like truncation may lose bits (probably not in practice). Ah yes, right. + /* Sign extension: result is dependent on the value. */ + if (!TYPE_UNSIGNED (type) !TYPE_UNSIGNED (n-type) +type_size TYPE_PRECISION (n-type)) + return NULL_TREE; whether it's sign-extended depends solely on the sign of the converted entity, so I'm not sure why you are restricting this to signed n-type. Consider signed char *p; ((unsigned int)p[0]) 8 | ((unsigned int)p[1]) 16 the loads are still sign-extended but n-type is unsigned. Indeed, I understood it for convert_via (the requirement to be unsigned) but got it wrong here. I'm failing to get why you possibly need two casts ... you should only need one, from the bswap/load result to the final type (zero-extended as you say - so the load type should simply be unsigned which it is already). Because of the type of the shift constant, the bitwise expression is usually of type int. However, if you write a function that does a 32 bit load in host endianness (like a 32 bit little endian load on x86_64) with a result of size 64 bits, then you need to sign extend, since the source type is signed. This is a situation I encountered in bfd_getl32 in binutils I think. Now if you consider bfd_getl16 instead a direct sign extension is out of the question. Suppose bfd_getl16 is called to read from a memory address that contains 0xff 0xfe. The bitwise expression would thus be equivalent to the value 0xfeff since it's of type int. Then after the sign extension to 64bits you'd have 0xfeff. But after replacing the bitwise expression you end up with a 16bit load into a 16bit SSA variable. If you sign extend that directly to 64 bits you'll end up with 0xfeff which is wrong. But if you zero extend to an int value (the type of the bitwise OR expression) and then sign extend to the target type you'll have the correct result. Err, but if you zero-extend directly to the target type you have the correct result, too. But you're right, we can do simpler by sign extending if load size == size of bitwise expression and zero extend else. The change of load_type to range_type would still be needed because in case of a load + bswap it's better to load in the same type as the type of the parameter of bswap. After bswap you'd need to convert to a signed or unsigned value according to the logic above (load size == size of bitwise expression) In the original statement, the bitwise OR expression would have the 2 bytes of higher weight be 0 while the 2 bytes of lower weight would be the value read from memory. The result of the sign extension would be So I think that the testcase in the patch is fixed already by doing the n-type change (and a proper sign-extension detection). I don't remember exactly but I think it didn't fix this bug (but it is a necessary fix anyway). Can you please split that part out? Sure, that part would need to be applied on gcc 4.9 too. I'll try to construct a testcase for that. That range_type and convert_via looks wrong and unnecessary to me, and it doesn't look like you have a testcase excercising it? See above. But nothing for the testsuite? The testcase you add fails foul of sign-extending the loads. Richard.
Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
On Wed, Jun 4, 2014 at 1:15 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] Err, but if you zero-extend directly to the target type you have the correct result, too. Yep but in some case we need sign extend (32 bit bitwise OR stored into 64 bit result). But you'd convert the 32bit bitwise OR to the original type anyway (no extension, but sign-change). The extension to the stored type happens with the original IL. As I said, the logic could be simplified by sign extending if load_size == bitwise expression size and zero extending if not true. I'll rework the patch in this direction. But nothing for the testsuite? The testcase you add fails foul of sign-extending the loads. Ack, I'll add a test for zero extension and one for sign extension. Thanks. Richard. Cheers, Thomas
Re: PR61385: phiopt drops some PHIs
On Wed, Jun 4, 2014 at 1:20 PM, Marc Glisse marc.gli...@inria.fr wrote: On Wed, 4 Jun 2014, Richard Biener wrote: So I'd say we should instead simply bail out if the middle-bb has a PHI node. Sounds good to me, so I am testing the mini-patch I had originally posted to bugzilla: Ok. Thanks, Richard. 2014-06-04 Marc Glisse marc.gli...@inria.fr PR tree-optimization/61385 gcc/ * tree-ssa-phiopt.c (value_replacement): Punt if there are PHI nodes. gcc/testsuite/ * gcc.dg/tree-ssa/pr61385.c: New file. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/pr61385.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr61385.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr61385.c (working copy) @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +#define assert(x) if (!(x)) __builtin_abort () + +int a, b, c, d, e, f, g; + +int +fn1 () +{ + int *h = c; + for (; c 1; c++) +{ + int *i = a, *k = a; + f = 0; + if (b) + return 0; + if (*h) + { + int **j = i; + *j = 0; + d = 0; + } + else + g = e = 0; + if (*h) + { + int **l = k; + *l = g; + } + d = *h; + assert (k == a || k); + assert (i); +} + return 0; +} + +int +main () +{ + fn1 (); + return 0; +} Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 211221) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -842,20 +842,24 @@ value_replacement (basic_block cond_bb, /* Now optimize (x != 0) ? x + y : y to just y. The following condition is too restrictive, there can easily be another stmt in middle_bb, for instance a CONVERT_EXPR for the second argument. */ gimple assign = last_and_only_stmt (middle_bb); if (!assign || gimple_code (assign) != GIMPLE_ASSIGN || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS || (!INTEGRAL_TYPE_P (TREE_TYPE (arg0)) !POINTER_TYPE_P (TREE_TYPE (arg0 return 0; + /* Punt if there are (degenerate) PHIs in middle_bb, there should not be. */ + if (!gimple_seq_empty_p (phi_nodes (middle_bb))) +return 0; + /* Only transform if it removes the condition. */ if (!single_non_singleton_phi_for_edges (phi_nodes (gimple_bb (phi)), e0, e1)) return 0; /* Size-wise, this is always profitable. */ if (optimize_bb_for_speed_p (cond_bb) /* The special case is useless if it has a low probability. */ profile_status_for_fn (cfun) != PROFILE_ABSENT EDGE_PRED (middle_bb, 0)-probability PROB_EVEN /* If assign is cheap, there is no point avoiding it. */
[PATCH][match-and-simplify] Restore bootstrap somewhat
The following patche is necessary to get to stage2 target library building. Bootstrapped until that point, applied. Richard. 2014-06-04 Richard Biener rguent...@suse.de * genmatch.c (error_cb, fatal_at): Annotate with printf format attribute to silence warning. * gimple-match-head.c (gimple_resimplify1): Check for builtin availability and properly zero out excess operands. (gimple_resimplify2): Likewise. (gimple_resimplify3): Likewise. (maybe_push_res_to_seq): Check for builtin availability. (gimple_match_and_simplify): Likewise. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 211221) +++ gcc/genmatch.c (working copy) @@ -549,6 +549,9 @@ write_gimple (FILE *f, vecsimplify * static struct line_maps *line_table; static bool +#if GCC_VERSION = 4001 +__attribute__((format (printf, 6, 0))) +#endif error_cb (cpp_reader *, int, int, source_location location, unsigned int, const char *msg, va_list *ap) { @@ -562,6 +565,9 @@ error_cb (cpp_reader *, int, int, source } static void +#if GCC_VERSION = 4001 +__attribute__((format (printf, 2, 3))) +#endif fatal_at (const cpp_token *tk, const char *msg, ...) { va_list ap; Index: gcc/gimple-match-head.c === --- gcc/gimple-match-head.c (revision 211221) +++ gcc/gimple-match-head.c (working copy) @@ -105,18 +105,23 @@ gimple_resimplify1 (gimple_seq *seq, else { tree decl = builtin_decl_implicit (*res_code); - tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 1, false); - if (tem) + if (decl) { - /* fold_builtin_n wraps the result inside a NOP_EXPR. */ - STRIP_NOPS (tem); - tem = fold_convert (type, tem); + tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 1, false); + if (tem) + { + /* fold_builtin_n wraps the result inside a NOP_EXPR. */ + STRIP_NOPS (tem); + tem = fold_convert (type, tem); + } } } if (tem != NULL_TREE CONSTANT_CLASS_P (tem)) { res_ops[0] = tem; + res_ops[1] = NULL_TREE; + res_ops[2] = NULL_TREE; *res_code = TREE_CODE (res_ops[0]); return true; } @@ -157,17 +162,22 @@ gimple_resimplify2 (gimple_seq *seq, else { tree decl = builtin_decl_implicit (*res_code); - tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 2, false); - if (tem) + if (decl) { - /* fold_builtin_n wraps the result inside a NOP_EXPR. */ - STRIP_NOPS (tem); - tem = fold_convert (type, tem); + tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 2, false); + if (tem) + { + /* fold_builtin_n wraps the result inside a NOP_EXPR. */ + STRIP_NOPS (tem); + tem = fold_convert (type, tem); + } } } if (tem != NULL_TREE) { res_ops[0] = tem; + res_ops[1] = NULL_TREE; + res_ops[2] = NULL_TREE; *res_code = TREE_CODE (res_ops[0]); return true; } @@ -221,18 +231,23 @@ gimple_resimplify3 (gimple_seq *seq, else { tree decl = builtin_decl_implicit (*res_code); - tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 3, false); - if (tem) + if (decl) { - /* fold_builtin_n wraps the result inside a NOP_EXPR. */ - STRIP_NOPS (tem); - tem = fold_convert (type, tem); + tem = fold_builtin_n (UNKNOWN_LOCATION, decl, res_ops, 3, false); + if (tem) + { + /* fold_builtin_n wraps the result inside a NOP_EXPR. */ + STRIP_NOPS (tem); + tem = fold_convert (type, tem); + } } } if (tem != NULL_TREE CONSTANT_CLASS_P (tem)) { res_ops[0] = tem; + res_ops[1] = NULL_TREE; + res_ops[2] = NULL_TREE; *res_code = TREE_CODE (res_ops[0]); return true; } @@ -297,9 +312,11 @@ maybe_push_res_to_seq (code_helper rcode { if (!seq) return NULL_TREE; + tree decl = builtin_decl_implicit (rcode); + if (!decl) + return NULL_TREE; if (!res) res = make_ssa_name (type, NULL); - tree decl = builtin_decl_implicit (rcode); unsigned nargs = type_num_arguments (TREE_TYPE (decl)); gcc_assert (nargs = 3); gimple new_stmt = gimple_build_call (decl, nargs, ops[0], ops[1], ops[2]); @@ -401,14 +418,17 @@ gimple_match_and_simplify (enum built_in
[PATCH][match-and-simplify] Fix cutpaste error
Committed. Hopefully that restores bootstrap ... Richard. 2014-06-04 Richard Biener rguent...@suse.de * gimple-match-head.c (gimple_match_and_simplify): Fix cutpaste error. Index: gcc/gimple-match-head.c === --- gcc/gimple-match-head.c (revision 211226) +++ gcc/gimple-match-head.c (working copy) @@ -604,7 +604,7 @@ gimple_match_and_simplify (gimple stmt, if (!arg1) return false; } - tree arg2 = gimple_call_arg (stmt, 0); + tree arg2 = gimple_call_arg (stmt, 1); if (valueize TREE_CODE (arg2) == SSA_NAME) { arg2 = valueize (arg2);
Re: [patch, mips, tree] align microMIPS functions to 16 bits with -Os
On Tue, 3 Jun 2014, Richard Sandiford wrote: Sandra Loosemore san...@codesourcery.com writes: Catherine included an earlier version of this patch with the microMIPS submission a couple years ago: https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00972.html Richard's response was: Looks like the wrong place to do this. Please treat this as a separate patch and get a tree expert to comment. So, here is the separate patch, finally. :-) Can a tree expert comment? I dug around and didn't see another obvious hook to set function alignment in a target-specific way depending on attributes of the function. Besides the new test cases, I regression-tested this on mips-sde-elf using Mentor's usual assortment of multilibs, specifically including one for microMIPS. OK, I asked Richi on IRC today. To recap, one of the things I was worried about was a test against the address, like (foo 2) == 0, being optimised away before we set the alignment. Richi pointed out that my idea downthread about cgraph_create_node would also be too late to avoid that. Also, looking at it now, I see that we don't trust DECL_ALIGN on functions anyway. From get_object_alignment_2: /* Extract alignment information from the innermost object and possibly adjust bitpos and offset. */ if (TREE_CODE (exp) == FUNCTION_DECL) { /* Function addresses can encode extra information besides their alignment. However, if TARGET_PTRMEMFUNC_VBIT_LOCATION allows the low bit to be used as a virtual bit, we know that the address itself must be at least 2-byte aligned. */ if (TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn) align = 2 * BITS_PER_UNIT; } And since we use the low bit to encode the ISA mode on MIPS, the upshot is that we never assume any alignment for functions. So there's nothing to worry about after all. Richi suggested just changing the alignment at output time. I assume that would be a case of replacing the DECL_ALIGN in: /* Tell assembler to move to target machine's alignment for functions. */ align = floor_log2 (DECL_ALIGN (decl) / BITS_PER_UNIT); if (align 0) { ASM_OUTPUT_ALIGN (asm_out_file, align); } with a hook. (Is that right?) Yeah, kind of. Of course if DECL_ALIGN on function-decls is unused then we may as well initialize it to 1 in tree.c and at an appropriate stage adjust it to the result of a target hook invocation. Appropriate stage would be the above place (which means DECL_ALIGN is essentially unused for FUNCTION_DECLs). So ... can you massage the DECL_ALIGN macro to ICE on FUNCTION_DECLs and see where we access it? (generic code will possibly trip on it, but the question is is there any user that cares?) Richard.
Re: [PATCH, Pointer Bounds Checker 6/x] New static constructor types
On Wed, Jun 4, 2014 at 3:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-06-04 13:58 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Wed, Apr 16, 2014 at 2:33 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch add new static constructor types used by Pointer Bounds Checker. It was approved earlier for 4.9 and I'll assume patch is OK for trunk if no objections arise. Patch was bootstrapped and tested for linux-x86_64. Thanks, Ilya -- gcc/ 2014-04-16 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (cgraph_build_static_cdtor_1): Support contructors with chkp ctor and bnd_legacy attributes. * gimplify.c (gimplify_init_constructor): Avoid infinite loop during gimplification of bounds initializer. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 7441784..67ab515 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -3803,10 +3803,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, individual element initialization. Also don't do this for small all-zero initializers (which aren't big enough to merit clearing), and don't try to make bitwise copies of - TREE_ADDRESSABLE types. */ + TREE_ADDRESSABLE types. + + We cannot apply such transformation when compiling chkp static + initializer because creation of initializer image in the memory + will require static initialization of bounds for it. It should + result in another gimplification of similar initializer and we + may fall into infinite loop. */ if (valid_const_initializer !(cleared || num_nonzero_elements == 0) -!TREE_ADDRESSABLE (type)) +!TREE_ADDRESSABLE (type) +(!current_function_decl + || !lookup_attribute (chkp ctor, + DECL_ATTRIBUTES (current_function_decl Simply make the type TREE_ADDRESSABLE? Wouldn't it be a hack to mark it addressable just to not hit this condition? It would also require to have an unshared copy of the type to not affect other statements with that type. { HOST_WIDE_INT size = int_size_in_bytes (type); unsigned int align; diff --git a/gcc/ipa.c b/gcc/ipa.c index 26e9b03..5ab3aed 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -1345,9 +1345,11 @@ make_pass_ipa_whole_program_visibility (gcc::context *ctxt) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. + be one of 'I' (for a constructor), 'D' (for a destructor), 'P' + (for chp static vars constructor) or 'B' (for chkp static bounds + constructor). BODY is a STATEMENT_LIST containing GENERIC + statements. PRIORITY is the initialization priority for this + constructor or destructor. FINAL specify whether the externally visible name for collect2 should be produced. */ @@ -1406,6 +1408,20 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) DECL_STATIC_CONSTRUCTOR (decl) = 1; decl_init_priority_insert (decl, priority); break; +case 'P': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (chkp ctor), + NULL, + NULL_TREE); Ick. Please try to avoid using attributes for this. Rather adjust the caller of this function to set a flag in the cgraph node. It is too late because all early local passes are executed by that time and I need this attribute to be set before them. Ok, so where do you call this from? It should be possible to create the function in GIMPLE form directly, avoiding the need to go through gimplification. Richard. Ilya So I don't like this patch at all. Richard. + decl_init_priority_insert (decl, priority); + break; +case 'B': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (bnd_legacy), + NULL, + NULL_TREE); + decl_init_priority_insert (decl, priority); + break; case 'D': DECL_STATIC_DESTRUCTOR (decl) = 1; decl_fini_priority_insert (decl, priority); @@ -1423,9 +1439,11 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor
Re: [PATCH, PR 61391]
On Wed, Jun 4, 2014 at 3:17 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is a simple fix for 61391 - missed a check that statement basic block is inside loop. With this fix test-case from bug is compiled successfully. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Ok. Thanks, Richard. ChangeLog: 2014-06-04 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/61319 * tree-if-conv.c (is_cond_scalar_reduction): Add missed check that stmt belongs to loop.
Re: [PATCH, PR 61391]
On Wed, Jun 4, 2014 at 3:37 PM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Jun 4, 2014 at 3:17 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is a simple fix for 61391 - missed a check that statement basic block is inside loop. With this fix test-case from bug is compiled successfully. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Ok. Please also add the testcase. Thanks, Richard. ChangeLog: 2014-06-04 Yuri Rumyantsev ysrum...@gmail.com PR tree-optimization/61319 * tree-if-conv.c (is_cond_scalar_reduction): Add missed check that stmt belongs to loop.