Re: [PATCH, PR38785] Throttle PRE at -O3
On 18/04/2012, at 9:17 PM, Richard Guenther wrote: On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov ma...@codesourcery.com wrote: Steven, Jorn, I am looking into fixing performance regression on EEMBC's bitmnp01, and a version of your combined patch attached to PR38785 still works very well. Would you mind me getting it through upstream review, or are there any issues with contributing this patch to GCC mainline? We (CodeSourcery/Mentor) were carrying this patch in our toolchains since GCC 4.4, and it didn't show any performance or correctness problems on x86, ARM, MIPS, and other architectures. It also reliably fixes bitmnp01 regression, which is still present in current mainline. I have tested this patch on recent mainline on i686-linux-gnu with no regressions. Unless I hear from you to the contrary, I will push this patch for upstream review and, hopefully, get it checked in. Previous discussion of this patch is at http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html The addition of -ftree-pre-partial-partial is ok if you change its name to -ftree-partial-pre and add documentation to invoke.texi. OK. Gerald, does the patch for gcc-4.8/changes.html look OK? + /* Assuming the expression is 50% anticipatable, we have +to multiply the number of insertions needed by two for a cost +comparison. */ why assume 50% anticipatibility if you can compute the exact anticipatibility? Do you mean partial anticipatibility as in the updated patch? To compute exact anticipatibility we would need to traverse the bottom part of CFG, to get the numbers right. + if (!optimize_function_for_speed_p (cfun) please look at how I changed regular PRE to look at whether we want to optimize the path which has the redundancy for speed via optimize_edge_for_speed_p. The same surgerly should be applied to PPRE. OK. In the updated patch we require at least one of the successors the partially anticipates the expression to be on the speed path. Any other comments? Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics pr38785-rel-note.patch Description: Binary data pr38785.ChangeLog Description: Binary data pr38785.patch Description: Binary data
Re: [Patch, Fortran] PR52864 - fix actual/formal checks
Hello Mikael, thanks for the review. Regarding: Mikael Morin wrote: is there a reason to guard the class_pointer condition with attr.class_ok in the first conditional and with CLASS_DATA(...) != NULL in the two other ones? Not that it matters much, and in fact, I think the patch as is is good enough for committal (yes, it is a OK). I'm asking as I never know myself what is the correct, canonical way to handle the class_* hell... It's a good question what's more appropriate. My impression is that both is nearly identical; I frankly don't know whether what's the exact difference. I recall that I once had to use CLASS_DATA() != NULL to avoid a segfault. I don't remember whether it was the issue below or something different. For an expression, CLASS_DATA () != NULL has the big advantage that one avoids to walk the expression: For an expr, one needs to check expr-symtree-n.sym but also the ref tree. Thus, CLASS_DATA is much simpler than class_ok. Looking at my patch, I have a dummy argument (i.e. gfc_symbol) - and for those, one can simply access fsym-sym-attr.class_ok. Tobias
Re: FW: [v3] libstdc++/52689 testcase
This testcase is reported as failed on x86 Noticed that this testcase wasn't put in as part of the patch. Fixed as follows. tested x86/linux -benjamin
combine_conversions int-double-int
Hello, a conversion like int-double-int is just the identity, as long as double is big enough to represent all ints exactly. The most natural way I found to do this optimization is the attached: 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT. Does the approach make sense? I don't know that code, and adding FLOAT_EXPR / FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be multiplied by log2(base), but not doing it is safe. If the approach is ok, I could extend it so int-double-long also skips the intermediate conversion. Bootstrapped and regression tested on linux-x86_64. Should I try and write a testcase for a specific target checking for specific asm instructions there, or is there a better way? (note that I can't commit) -- Marc GlisseIndex: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 186761) +++ tree-ssa-forwprop.c (working copy) @@ -2403,10 +2403,11 @@ combine_conversions (gimple_stmt_iterato { gimple stmt = gsi_stmt (*gsi); gimple def_stmt; tree op0, lhs; enum tree_code code = gimple_assign_rhs_code (stmt); + enum tree_code code2; gcc_checking_assert (CONVERT_EXPR_CODE_P (code) || code == FLOAT_EXPR || code == FIX_TRUNC_EXPR); @@ -2423,11 +2424,13 @@ combine_conversions (gimple_stmt_iterato def_stmt = SSA_NAME_DEF_STMT (op0); if (!is_gimple_assign (def_stmt)) return 0; - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + code2 = gimple_assign_rhs_code (def_stmt); + if (CONVERT_EXPR_CODE_P (code2) || code2 == FLOAT_EXPR + || code2 == FIX_TRUNC_EXPR) { tree defop0 = gimple_assign_rhs1 (def_stmt); tree type = TREE_TYPE (lhs); tree inside_type = TREE_TYPE (defop0); tree inter_type = TREE_TYPE (op0); @@ -2453,13 +2456,16 @@ combine_conversions (gimple_stmt_iterato /* In addition to the cases of two conversions in a row handled below, if we are converting something to its own type via an object of identical or wider precision, neither conversion is needed. */ if (useless_type_conversion_p (type, inside_type) - (((inter_int || inter_ptr) final_int) - || (inter_float final_float)) - inter_prec = final_prec) + (inter_int || inter_ptr) final_int) + || (inter_float final_float)) + inter_prec = final_prec) + || (inside_int final_int inter_float + REAL_MODE_FORMAT (TYPE_MODE (inter_type))-p += (int) inside_prec - !inside_unsignedp))) { gimple_assign_set_rhs1 (stmt, unshare_expr (defop0)); gimple_assign_set_rhs_code (stmt, TREE_CODE (defop0)); update_stmt (stmt); return remove_prop_source_from_use (op0) ? 2 : 1;
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2, both of them are turned on, but gcc does not allow any code growth -- which makes them pretty useless at O2 (very few loops qualify). The default max complete peel iteration is also too low compared with both icc and llvm. This needs to be tuned. I found that at -O3 (where tree unroll is on by default) there is quite a bit of useless unrolling. I got somewhat irritated that my printf debug loops were commonly unrolled. -Andi
Re: move default_tree_printer out of toplev.c
On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote: It seems an obvious cleanup to me. OK? 2012-04-24 Manuel López-Ibáñez m...@gcc.gnu.org gcc/ * tree-pretty-print.h (default_tree_printer): Do not declare. * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and intl.h. (default_tree_diagnostic_starter): Make static. (default_tree_printer): Move to here. Make static. (tree_diagnostics_defaults): New. * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare. * tree.c (free_lang_data): Use tree_diagnostics_defaults. Bogus changelog - you changed general_init. * toplev.c: Do not include tree-pass.h. (default_tree_printer): Move from here. (general_init): Use tree_diagnostics_defaults. Otherwise ok. Thanks, Richard.
[PATCH] Validate the HLE memory models.
From: Andi Kleen a...@linux.intel.com This is on top of Kirill's latest patch. Based on the earlier discussion. Warn and enforce reasonable memory models for HLE ACQUIRE and RELEASE. It's not useful to have a weaker compiler memory model than what the CPU implements for a transaction. With HLE_ACQUIRE model must be ACQUIRE or stronger. With HLE_RELEASE model must be RELEASE or stronger. I implemented this with a new target hook that replaces the target variable originally added in the earlier patch. Passes bootstrap and test suite on x86_64-linux gcc/: 2012-04-24 Andi Kleen a...@linux.intel.com * builtins.c (get_memmodel): Add val. Call target.memmodel_check and return new variable. * config/i386/i386.c (ix86_memmodel_check): Add. (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK. * doc/tm.texi (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK.. * doc/tm.texi.in (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK. * target.def (memmodel_mask): Replace with memmodel_check. --- gcc/builtins.c |8 ++-- gcc/config/i386/i386.c | 35 +-- gcc/doc/tm.texi|7 --- gcc/doc/tm.texi.in |7 --- gcc/target.def |6 +++--- 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index bd66ff0..8f25086 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5338,6 +5338,7 @@ static enum memmodel get_memmodel (tree exp) { rtx op; + int val; /* If the parameter is not a constant, it's a run time value so we'll just convert it to MEMMODEL_SEQ_CST to avoid annoying runtime checking. */ @@ -5346,7 +5347,10 @@ get_memmodel (tree exp) op = expand_normal (exp); - if(INTVAL (op) ~(MEMMODEL_MASK | targetm.memmodel_mask)) + val = INTVAL (op); + if (targetm.memmodel_check) +val = targetm.memmodel_check (val); + else if (val ~MEMMODEL_MASK) { warning (OPT_Winvalid_memory_model, Unknown architecture specifier in memory model to builtin.); @@ -5361,7 +5365,7 @@ get_memmodel (tree exp) return MEMMODEL_SEQ_CST; } - return (enum memmodel) INTVAL (op); + return (enum memmodel) val; } /* Expand the __atomic_exchange intrinsic: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 0b6d6be..d17fcac 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -38967,6 +38967,37 @@ ix86_autovectorize_vector_sizes (void) return (TARGET_AVX !TARGET_PREFER_AVX128) ? 32 | 16 : 0; } +/* Validate target specific memory model bits in VAL. */ + +static unsigned int +ix86_memmodel_check (unsigned val) +{ + unsigned model = val MEMMODEL_MASK; + int strong; + + if (val ~(IX86_HLE_ACQUIRE|IX86_HLE_RELEASE|MEMMODEL_MASK) + || ((val IX86_HLE_ACQUIRE) (val IX86_HLE_RELEASE))) +{ + warning (OPT_Winvalid_memory_model, + Unknown architecture specific memory model); + return MEMMODEL_SEQ_CST; +} + strong = (model == MEMMODEL_ACQ_REL || model == MEMMODEL_SEQ_CST); + if (val IX86_HLE_ACQUIRE !(model == MEMMODEL_ACQUIRE || strong)) +{ + warning (OPT_Winvalid_memory_model, + HLE_ACQUIRE not used with ACQUIRE or stronger memory model); + return MEMMODEL_SEQ_CST | IX86_HLE_ACQUIRE; +} + if (val IX86_HLE_RELEASE !(model == MEMMODEL_RELEASE || strong)) +{ + warning (OPT_Winvalid_memory_model, + HLE_RELEASE not used with RELEASE or stronger memory model); + return MEMMODEL_SEQ_CST | IX86_HLE_RELEASE; +} + return val; +} + /* Initialize the GCC target structure. */ #undef TARGET_RETURN_IN_MEMORY #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory @@ -39066,8 +39097,8 @@ ix86_autovectorize_vector_sizes (void) #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall -#undef TARGET_MEMMODEL_MASK -#define TARGET_MEMMODEL_MASK (IX86_HLE_ACQUIRE|IX86_HLE_RELEASE) +#undef TARGET_MEMMODEL_CHECK +#define TARGET_MEMMODEL_CHECK ix86_memmodel_check #ifdef HAVE_AS_TLS #undef TARGET_HAVE_TLS diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index ee0d700..4c57a9c 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11362,9 +11362,10 @@ MIPS, where add-immediate takes a 16-bit signed value, @code{TARGET_CONST_ANCHOR} is set to @samp{0x8000}. The default value is zero, which disables this optimization. @end deftypevr -@deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_MASK -Some architectures may prvide additional memory model bits. This hook -must mask such a bits. @end deftypevr +@deftypefn {Target Hook} unsigned TARGET_MEMMODEL_CHECK (unsigned @var{val}) +Validate target specific memory model mask bits. When NULL no target specific +memory model bits are allowed. +@end deftypefn @deftypevr {Target Hook} {unsigned char} TARGET_ATOMIC_TEST_AND_SET_TRUEVAL This value
Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
On Tue, 24 Apr 2012, Ulrich Weigand wrote: Hello, PR 52633 is caused by bad interaction between two different vectorizer pattern recognition passed. A minimal test case is: void test (unsigned short *x, signed char *y) { int i; for (i = 0; i 32; i++) x[i] = (short) (y[i] 5); } built with cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp on a arm-linux-gnueabi target. Before the vectorizer, we have something like: short unsigned int D.4976; int D.4975; int D.4974; signed char D.4973; signed char * D.4972; short unsigned int * D.4970; [snip] D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; D.4975_10 = D.4974_9 5; D.4976_11 = (short unsigned int) D.4975_10; *D.4970_5 = D.4976_11; The pattern recognizer now goes through its list of patterns it tries to detect. The first successful one is vect_recog_over_widening_pattern. This will annotate the statements (via related_stmt fields): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; -- D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 5; -- patt.16_31 = D.4992_26 5 D.4976_11 = (short unsigned int) D.4975_10; -- D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; In the next step, vect_recog_widen_shift_pattern *also* matches, and creates a new annotation for the shift statement (using a widening shift): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; -- D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 5; [-- patt.16_31 = D.4992_26 5] -- patt.17_33 = D.4973_8 w 5 D.4976_11 = (short unsigned int) D.4975_10; -- D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; Since the original statement can only point to a single related_stmt, the statement setting patt.16_31 is now longer refered to as related_stmt by any other statement. This causes it to no longer be considered relevant for the vectorizer. However, the statement: -- D.4994_32 = (short unsigned int) patt.16_31 *is* still considered relevant. While analysing it, however, the vectorizer follows through to the def statement for patt.16_31, and gets quite confused to find that it doesn't have a vectype (because it wasn't considered by the vectorizer). The symptom is a failing assertion gcc_assert (*vectype != NULL_TREE); in vect_is_simple_use_1. Now, it seems quite unusual for multiple patterns to match for a single original statement. In fact, most pattern recognizers explicitly refuse to even consider statements that were already recognized. However, vect_recog_widen_shift_pattern makes an exception: /* This statement was also detected as over-widening operation (it can't be any other pattern, because only over-widening detects shifts). LAST_STMT is the final type demotion statement, but its related statement is shift. We analyze the related statement to catch cases: orig code: type a_t; itype res; TYPE a_T, res_T; S1 a_T = (TYPE) a_t; S2 res_T = a_T CONST; S3 res = (itype)res_T; (size of type * 2 = size of itype and size of itype * 2 = size of TYPE) code after over-widening pattern detection: S1 a_T = (TYPE) a_t; -- a_it = (itype) a_t; S2 res_T = a_T CONST; S3 res = (itype)res_T; --- LAST_STMT -- res = a_it CONST; after widen_shift: S1 a_T = (TYPE) a_t; -- a_it = (itype) a_t; - redundant S2 res_T = a_T CONST; S3 res = (itype)res_T; -- res = a_t w CONST; i.e., we replace the three statements with res = a_t w CONST. */ If everything were indeed as described in that comment, things would work out fine. However, what is described above as code after over-widening pattern detection is only one of two possible outcomes of that latter pattern; the other is the one that happens in the current test case, where we still have a final type conversion left after applying the over-widening pattern. I guess one could try to distiguish the two cases somehow and handle both; but the overall approach seems quite fragile to me; it doesn't look really maintainable to have to rely on so many details of the operation of one particular pattern detection function while writing another one, or else risk creating subtle problems like the one described above. So I was wondering why vect_recog_widen_shift_pattern tries to take advantage of an already recognized over-widening pattern. But indeed, if it does not, it will generate less efficient code in cases like the above test case: by itself vect_recog_widen_shift_pattern, would generate code to expand the char to a short, then do a widening shift resulting in an int, followed by narrowing back down to a
Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
On Wed, Apr 25, 2012 at 10:29:36AM +0200, Richard Guenther wrote: Does this look reasonable? Any comments or suggestions appreciated! Yes, getting rid of this fragile interaction by doing more work in vect_recog_widen_shift_pattern sounds like the correct thing to do. Or give up when seeing already pattern recognized stmts when detecting different pattern, unless the current pattern recognizer is prepared to handle them (and in that case tweak everything as necessary). E.g. several pattern recognizers already start with if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) return NULL; Jakub
[PATCH, tree-optimization] Fix for PR 52868
Hi all! I'd like to post for review the patch which makes some costs adjusting in get_computation_cost_at routine in ivopts part. As mentioned in the PR changes also fix the bwaves regression from PR 52272. Also changes introduce no degradations on spec2000/2006 and EEMBC1.1/2.0(this was measured on Atom) on x86 Bootstrapped and regtested on x86. Ok to commit? Changelog: 2012-04-26 Yuri Rumyantsev yuri.rumyant...@intel.com * tree-ssa-loop-ivopts.c (get_computation_cost_at): Add cost of extra addition if cost of address difference is high enough. Thanks, Igor ivopts.patch Description: Binary data
Re: move default_tree_printer out of toplev.c
On 25 April 2012 10:27, Richard Guenther rguent...@suse.de wrote: On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote: It seems an obvious cleanup to me. OK? 2012-04-24 Manuel López-Ibáñez m...@gcc.gnu.org gcc/ * tree-pretty-print.h (default_tree_printer): Do not declare. * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and intl.h. (default_tree_diagnostic_starter): Make static. (default_tree_printer): Move to here. Make static. (tree_diagnostics_defaults): New. * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare. * tree.c (free_lang_data): Use tree_diagnostics_defaults. Bogus changelog - you changed general_init. Not sure what you mean. I changed tree.c (free_lang_data) and toplev.c (general_init). to use tree_diagnostics_defaults. What should I have said in the Changelog? Cheers, Manuel.
Re: combine_conversions int-double-int
On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, a conversion like int-double-int is just the identity, as long as double is big enough to represent all ints exactly. The most natural way I found to do this optimization is the attached: 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT. Does the approach make sense? I don't know that code, and adding FLOAT_EXPR / FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be multiplied by log2(base), but not doing it is safe. If the approach is ok, I could extend it so int-double-long also skips the intermediate conversion. Bootstrapped and regression tested on linux-x86_64. Should I try and write a testcase for a specific target checking for specific asm instructions there, or is there a better way? Well, scanning the forwprop dump for example. Btw, I think not munging this new case into the existing CONVERT_EXPR_P code would be better - it makes the code (even) harder to understand and I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling does not wreck any assumptions in that code. It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always valid? Richard. (note that I can't commit) -- Marc Glisse
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); You repeatedly do this and walk over all blocks. Please think about compile-time issues when writing code. This all looks sort-of target specific to me and I don't see why this very specialized patch is a good idea when unrolling does a very poor job deciding what and how much to unroll generally. Richard. + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) + return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) + { + if (EDGE_COUNT (body[i]-succs) = 2) + { + /* If this block is executed less frequently than the header (loop + entry), then it is weighted based on the ratio of times it is + executed compared to the header. */ + if (body[i]-count header_count) + n += ((float)body[i]-count)/header_count; + + /* When it is executed more frequently than the header (i.e. it is + in a nested inner loop), simply weight the branch at 1.0. */ + else + n += 1.0; + } + } + free (body); + + return n; +} + +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which can increase branch mispredictions. The number of + branches is computed by weighting each branch with its expected execution + probability through the loop based on profile data. If no profile feedback + data exists, simply return the current NUNROLL factor. */ +static unsigned +max_unroll_with_branches(struct loop *loop, unsigned nunroll) +{ + struct loop *outer; + struct niter_desc *outer_desc; + int outer_niters = 1; + float weighted_outer_branches =
Re: [PATCH 04/11] Fix expansion point loc for macro-like tokens
Jason Merrill ja...@redhat.com writes: On 04/10/2012 03:42 PM, Dodji Seketeli wrote: In that case, besides returning NULL, enter_macro_context sets pfile-context-c.macro to NULL, making cpp_get_token_1 forget to set the location of the vari to the expansion point of A. This seems like a bug that should be fixed rather than worked around; we are still expanding A. Right. Below is an updated patch (with an updated introductory text) that addresses the core of the issue. Consider the test case gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c. Its interesting part is: #define A(x) vari x /* line 7. */ #define vari(x) #define B , varj int A(B) ; /* line 10. */ In its initial version, this test was being pre-processed as: # 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c # 1 build/gcc// # 1 command-line # 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c # 10 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c int # 7 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c vari , varj ; Note how int and vari are on separate lines, whereas int and , varj are on the same line. This looks like a bug to me, even independantly from the macro location tracking work. With macro location tracking turned on, the preprocessed output becomes: # 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c # 1 command-line # 1 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c # 10 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c int vari , varj ; Which, IMO, is what we'd expect. This is due to an unexpected side effect of enter_macro_context when passed a token that might look like a function-like macro at first sight, but that it eventually considers to not be a macro after all. This is the case for the vari token which looks like a macro when it is first lexed, but is eventually considered to be a normal token by enter_macro_context because it's not used as a function-like macro invocation. In that case, besides returning NULL, enter_macro_context sets pfile-context-c.macro to NULL, making cpp_get_token_1 forget to set the location of the vari to the expansion point of A. enter_macro_context sets pfile-context-c.macro to NULL in that case because funlike_invocation_p reads one token pass foo, sees that there is no '(' token, so we are not invoking the function-like parameter. It then puts the tokens (which it has read after foo) back into the tokens stream by calling _cpp_push_token_context on it, which sets pfile-context-c.macro to NULL. The fix here is to prevent funlike_invocation_p from forgetting the current macro-ness. Tested on x86_64-unknown-linux-gnu against trunk. Now this test has the same output with and without tracking locations accross macro expansions. Note that the bootstrap with -ftrack-macro-expansion exhibits other separate issues that are addressed in subsequent patches. This patch just fixes one class of problems. The patch does pass bootstrap with -ftrack-macro-expansion turned off, though. libcpp/ * macro.c (funlike_invocation_p): Don't forget macro-ness. gcc/testsuite/ * gcc.dg/debug/dwarf2/pr41445-5.c: Adjust. * gcc.dg/debug/dwarf2/pr41445-6.c: Likewise. --- gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c |5 - gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c |5 - libcpp/macro.c| 12 +++- 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c index 03af604..d21acd5 100644 --- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c @@ -9,6 +9,9 @@ #define B , varj int A(B) ; -/* { dg-final { scan-assembler DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\vari\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0x)?7\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line } } */ +/* We want to check that both vari and varj have the same line +number. */ + +/* { dg-final { scan-assembler DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\vari\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0xa|10)\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line } } */ /* { dg-final { scan-assembler DW_TAG_variable\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\varj\[^\\r\\n\]*DW_AT_name(\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*DW_AT_)*\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\[^0-9a-fA-FxX](0xa|10)\[^0-9a-fA-FxX]\[^\\r\\n\]*DW_AT_decl_line } } */ /* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c index 8aa37d1..d6d79cc 100644 --- a/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c @@ -4,5 +4,8 @@ #include pr41445-5.c -/* { dg-final { scan-assembler
Re: [PATCH, tree-optimization] Fix for PR 52868
On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote: Hi all! I'd like to post for review the patch which makes some costs adjusting in get_computation_cost_at routine in ivopts part. As mentioned in the PR changes also fix the bwaves regression from PR 52272. Also changes introduce no degradations on spec2000/2006 and EEMBC1.1/2.0(this was measured on Atom) on x86 Bootstrapped and regtested on x86. Ok to commit? I can't make sense of the patch and the comment does not help. + diff_cost = cost.cost; cost.cost /= avg_loop_niter (data-current_loop); + add_cost_val = add_cost (TYPE_MODE (ctype), data-speed); + /* Do cost correction if address cost is small enough + and difference cost is high enough. */ + if (address_p diff_cost add_cost_val + get_address_cost (symbol_present, var_present, + offset, ratio, cstepi, + TYPE_MODE (TREE_TYPE (utype)), + TYPE_ADDR_SPACE (TREE_TYPE (utype)), + speed, stmt_is_after_inc, + can_autoinc).cost = add_cost_val) +cost.cost += add_cost_val; Please explain more thoroughly. It also would seem to be better to add an extra case, as later code does /* Now the computation is in shape symbol + var1 + const + ratio * var2. (symbol/var1/const parts may be omitted). If we are looking for an address, find the cost of addressing this. */ if (address_p) return add_costs (cost, get_address_cost (symbol_present, var_present, offset, ratio, cstepi, TYPE_MODE (TREE_TYPE (utype)), TYPE_ADDR_SPACE (TREE_TYPE (utype)), speed, stmt_is_after_inc, can_autoinc)); thus refactoring the code a bit would make it possible to CSE the get_address_cost call and eventually make it clearer what the code does. Richard. Changelog: 2012-04-26 Yuri Rumyantsev yuri.rumyant...@intel.com * tree-ssa-loop-ivopts.c (get_computation_cost_at): Add cost of extra addition if cost of address difference is high enough. Thanks, Igor
Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)
On Tue, Apr 24, 2012 at 10:33:58AM +0200, Richard Guenther wrote: So like this instead? Yes! Unfortunately that didn't bootstrap, apparently double-int.h is included in gen* which aren't linked with double-int.o, and the inlines stay at -O0 or with -fkeep-inline-functions. Even guarding with #ifndef GENERATOR_FILE doesn't work, because gengtype is apparently built both with that and without. So this new version instead just prototypes them in double-int.h and defines in double-int.c. Still ok? 2012-04-25 Jakub Jelinek ja...@redhat.com PR tree-optimization/53058 * double-int.h (double_int_max_value, double_int_min_value): New prototypes. * double-int.c (double_int_max_value, double_int_min_value): New functions. * tree-vrp.c (register_edge_assert_for_2): Compare mask for LE_EXPR or GT_EXPR with double_int_max_value instead of double_int_mask. * gcc.c-torture/compile/pr53058.c: New test. --- gcc/double-int.h.jj 2012-03-29 12:01:31.623648408 +0200 +++ gcc/double-int.h2012-04-24 17:54:37.798945484 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -242,6 +242,9 @@ double_int double_int_sext (double_int, double_int double_int_zext (double_int, unsigned); double_int double_int_mask (unsigned); +double_int double_int_max_value (unsigned int, bool); +double_int double_int_min_value (unsigned int, bool); + #define ALL_ONES (~((unsigned HOST_WIDE_INT) 0)) /* The operands of the following comparison functions must be processed --- gcc/double-int.c.jj 2012-04-02 21:40:10.0 +0200 +++ gcc/double-int.c2012-04-24 17:55:44.745723956 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2009, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -616,6 +616,26 @@ double_int_mask (unsigned prec) return mask; } +/* Returns a maximum value for signed or unsigned integer + of precision PREC. */ + +double_int +double_int_max_value (unsigned int prec, bool uns) +{ + return double_int_mask (prec - (uns ? 0 : 1)); +} + +/* Returns a minimum value for signed or unsigned integer + of precision PREC. */ + +double_int +double_int_min_value (unsigned int prec, bool uns) +{ + if (uns) +return double_int_zero; + return double_int_lshift (double_int_one, prec - 1, prec, false); +} + /* Clears the bits of CST over the precision PREC. If UNS is false, the bits outside of the precision are set to the sign bit (i.e., the PREC-th one), otherwise they are set to zero. --- gcc/tree-vrp.c.jj 2012-04-24 07:59:58.142773712 +0200 +++ gcc/tree-vrp.c 2012-04-24 17:41:10.473562899 +0200 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e INTEGRAL_TYPE_P (TREE_TYPE (name2)) IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1) prec = 2 * HOST_BITS_PER_WIDE_INT + prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val))) live_on_edge (e, name2) !has_single_use (name2)) { @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e new_val = val2; else { + double_int maxval + = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val))); mask = double_int_ior (tree_to_double_int (val2), mask); - if (double_int_minus_one_p (double_int_sext (mask, prec))) + if (double_int_equal_p (mask, maxval)) new_val = NULL_TREE; else new_val = double_int_to_tree (TREE_TYPE (val2), mask); --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj2012-04-24 17:41:10.475662886 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c 2012-04-24 17:41:10.475662886 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/53058 */ + +int a, b, c; + +void +foo () +{ + c = b 16; + if (c 32767) +c = 0; + a = b; +} Jakub
Re: combine_conversions int-double-int
First, thanks a lot for answering. On Wed, 25 Apr 2012, Richard Guenther wrote: On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, a conversion like int-double-int is just the identity, as long as double is big enough to represent all ints exactly. The most natural way I found to do this optimization is the attached: 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT. Does the approach make sense? I don't know that code, and adding FLOAT_EXPR / FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be multiplied by log2(base), but not doing it is safe. If the approach is ok, I could extend it so int-double-long also skips the intermediate conversion. Bootstrapped and regression tested on linux-x86_64. Should I try and write a testcase for a specific target checking for specific asm instructions there, or is there a better way? Well, scanning the forwprop dump for example. No idea how that works, but with some grepping and looking at other tests it shouln't be too hard. I'll try, thanks for the pointer. Btw, I think not munging this new case into the existing CONVERT_EXPR_P code would be better - it makes the code (even) harder to understand and I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling does not wreck any assumptions in that code. Ok. It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always valid? What do you mean? I just tested and my patch does the simplification for int-_Decimal128-int but not int-_Decimal32-int, which makes sense to me as _Decimal32 can't represent exactly every int. For _Decimal64, it doesn't do the simplification although it could, because I didn't multiply by log2(base), as I said above. I also thought of combining in vrp, where the range information could make more simplifications valid, but that seemed harder. -- Marc Glisse
Re: [PATCH 05/11] Make expand_location resolve to locus in main source file
On 04/10/2012 03:49 PM, Dodji Seketeli wrote: Apparently, quite some places in the compiler (like the C/C++ preprocessor, the debug info machinery) expect expand_location to resolve to locations that are in the main source file, even if the token at stake comes from a macro that was defined in a header somewhere. Turning on -ftrack-macro-expansion by default was triggering a lot of failures (not necessarily related to diagnostics) because expand_location resolves to spelling locations instead. Can you elaborate on these failures? For debug info I would think that the spelling location would be useful information, though I suppose without any way to unwind to the expansion context it could be a bit confusing. What is the problem for the preprocessor? For the preprocessor, consider a short excerpt of the the test gcc/testsuite/gcc.dg/cpp/avoidpaste1.c: #define f(x) x #define g #define tricky 1.0e ## -1 :: :g: :f(): :f(^): tricky As the comment in the text says, it should preprocess as: :: : : : : :^: 1.0e- 1 but it actually preprocess as: # 1 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c # 1 command-line # 1 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c # 25 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c :: : : : : :^: # 14 /home/dodji/devel/git/gcc/tmp/gcc/testsuite/gcc.dg/cpp/avoidpaste1.c 1.0e- 1 Note how the 1.0e- 1 is not on the same line as its preceding :: : : : : :^:. The problem is that the pre-processor code indirectly uses expand_location in many spots. For that kind of existing code that is not related to diagnostics, I am really not confident in changing the underlying implicit assumption of the that function which is, to return the expansion point location. I don't even know before hand where all these spots are in the code base, so auditing it is hard for me. I forgot to say that are also other weird failure like: FAIL: g++.dg/cdce3.C -std=gnu++98 scan-tree-dump cdce cdce3.C:92: note: function call is shrink-wrapped into error conditions. due to that change. That's why I prefer erring on the safe side by not changing the assumption of existing code, and provide a way to expand locations to their spelling point, just for diagnostics. I think this is already an improvement over what we previously had. And for the debug info cases, I'd propose that if we find specific examples where the unwinding to the spelling locations is better, then we'll consider using it at that moment. -- Dodji
Re: [C++ Patch] PR 39970
On 04/25/2012 01:41 AM, Jason Merrill wrote: On 04/24/2012 05:24 PM, Paolo Carlini wrote: Perhaps we aren't checking default arguments unless they're actually used; a could change that if they aren't dependent. Your reply reached the mailing list a bit mangled, could you please clarify? If the default argument isn't dependent on other template parameters, we can check immediately whether it is a suitable template argument for its parameter, including being a constant expression. Just wanted to add that we should indeed do this also to catch things like: struct blah { int member; }; template char param = blah::member class template6_blah; which currently we are also missing. Paolo.
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries tom_devr...@mentor.com wrote: On 17/04/12 14:24, Richard Guenther wrote: On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote: On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 #
Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)
On Wed, Apr 25, 2012 at 11:21 AM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Apr 24, 2012 at 10:33:58AM +0200, Richard Guenther wrote: So like this instead? Yes! Unfortunately that didn't bootstrap, apparently double-int.h is included in gen* which aren't linked with double-int.o, and the inlines stay at -O0 or with -fkeep-inline-functions. Even guarding with #ifndef GENERATOR_FILE doesn't work, because gengtype is apparently built both with that and without. So this new version instead just prototypes them in double-int.h and defines in double-int.c. Still ok? Sure. 2012-04-25 Jakub Jelinek ja...@redhat.com PR tree-optimization/53058 * double-int.h (double_int_max_value, double_int_min_value): New prototypes. * double-int.c (double_int_max_value, double_int_min_value): New functions. * tree-vrp.c (register_edge_assert_for_2): Compare mask for LE_EXPR or GT_EXPR with double_int_max_value instead of double_int_mask. * gcc.c-torture/compile/pr53058.c: New test. --- gcc/double-int.h.jj 2012-03-29 12:01:31.623648408 +0200 +++ gcc/double-int.h 2012-04-24 17:54:37.798945484 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -242,6 +242,9 @@ double_int double_int_sext (double_int, double_int double_int_zext (double_int, unsigned); double_int double_int_mask (unsigned); +double_int double_int_max_value (unsigned int, bool); +double_int double_int_min_value (unsigned int, bool); + #define ALL_ONES (~((unsigned HOST_WIDE_INT) 0)) /* The operands of the following comparison functions must be processed --- gcc/double-int.c.jj 2012-04-02 21:40:10.0 +0200 +++ gcc/double-int.c 2012-04-24 17:55:44.745723956 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2009, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -616,6 +616,26 @@ double_int_mask (unsigned prec) return mask; } +/* Returns a maximum value for signed or unsigned integer + of precision PREC. */ + +double_int +double_int_max_value (unsigned int prec, bool uns) +{ + return double_int_mask (prec - (uns ? 0 : 1)); +} + +/* Returns a minimum value for signed or unsigned integer + of precision PREC. */ + +double_int +double_int_min_value (unsigned int prec, bool uns) +{ + if (uns) + return double_int_zero; + return double_int_lshift (double_int_one, prec - 1, prec, false); +} + /* Clears the bits of CST over the precision PREC. If UNS is false, the bits outside of the precision are set to the sign bit (i.e., the PREC-th one), otherwise they are set to zero. --- gcc/tree-vrp.c.jj 2012-04-24 07:59:58.142773712 +0200 +++ gcc/tree-vrp.c 2012-04-24 17:41:10.473562899 +0200 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e INTEGRAL_TYPE_P (TREE_TYPE (name2)) IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1) prec = 2 * HOST_BITS_PER_WIDE_INT + prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val))) live_on_edge (e, name2) !has_single_use (name2)) { @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e new_val = val2; else { + double_int maxval + = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val))); mask = double_int_ior (tree_to_double_int (val2), mask); - if (double_int_minus_one_p (double_int_sext (mask, prec))) + if (double_int_equal_p (mask, maxval)) new_val = NULL_TREE; else new_val = double_int_to_tree (TREE_TYPE (val2), mask); --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj 2012-04-24 17:41:10.475662886 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c 2012-04-24 17:41:10.475662886 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/53058 */ + +int a, b, c; + +void +foo () +{ + c = b 16; + if (c 32767) + c = 0; + a = b; +} Jakub
Re: move default_tree_printer out of toplev.c
On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote: On 25 April 2012 10:27, Richard Guenther rguent...@suse.de wrote: On Tue, 24 Apr 2012, Manuel López-Ibáñez wrote: It seems an obvious cleanup to me. OK? 2012-04-24 Manuel López-Ibáñez m...@gcc.gnu.org gcc/ * tree-pretty-print.h (default_tree_printer): Do not declare. * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and intl.h. (default_tree_diagnostic_starter): Make static. (default_tree_printer): Move to here. Make static. (tree_diagnostics_defaults): New. * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare. * tree.c (free_lang_data): Use tree_diagnostics_defaults. Bogus changelog - you changed general_init. Not sure what you mean. I changed tree.c (free_lang_data) and toplev.c (general_init). to use tree_diagnostics_defaults. What should I have said in the Changelog? Oops. Misread the patch. Sorry for the noise, Richard.
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote: void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I don't think we guarantee the above, after all, even pure functions may use __builtin_return_address (0) - it doesn't modify memory, and we happily remove pure calls, CSE the return values etc. Jakub
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 25/04/12 12:09, Jakub Jelinek wrote: On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote: void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I don't think we guarantee the above, after all, even pure functions may use __builtin_return_address (0) - it doesn't modify memory, and we happily remove pure calls, CSE the return values etc. Jakub, pure: - no effects except the return value - return value depends only on the parameters and/or global variables AFAIU, given this definition, a pure function cannot use __builtin_return_address since it would mean that the return value depends on something else than parameters and global variables. Thanks, - Tom Jakub
Re: PR c/53066 Wshadow should not warn for shadowing an extern function
On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote: On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote: Wshadow warns whenever any declaration shadows a global function declaration. This is almost always noise, since most (always?) of the time one cannot mistakenly replace a function by another variable. The false positives are too common (Linus mentions using the name 'index' when including string.h). I think the correct rule would be: warn if a variable *with pointer-to-function type* shadows a function, and warn if a nested function shadows another function, but don't warn for variables shadowing functions if the variable has any other type (because if the variable has some type that isn't a pointer-to-function, no confusion is likely without another error being given). Right. How does one check that a decl is a nested function? Cheers, Manuel.
Re: Continue strict-volatile-bitfields fixes
Hi! On Thu, 19 Apr 2012 19:46:17 +0200, I wrote: Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c: extern void abort (void); enum tree_code { BIND_EXPR, }; struct tree_common { enum tree_code code:8; }; void voidify_wrapper_expr (struct tree_common *common) { switch (common-code) { case BIND_EXPR: if (common-code != BIND_EXPR) abort (); } } The diff for -fdump-tree-all between compiling with -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins with the following, right in 20030922-1.c.003t.original: diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original --- fnsvb/20030922-1.c.003t.original2012-04-19 16:51:18.322150866 +0200 +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200 @@ -7,7 +7,7 @@ switch ((int) common-code) { case 0:; - if (common-code != 0) + if ((BIT_FIELD_REF *common, 32, 0 255) != 0) { abort (); } That is, for -fno-strict-volatile-bitfields the second instance of »common-code« it is a component_ref, whereas for -fstrict-volatile-bitfields it is a bit_field_ref. The former will be optimized as intended, the latter will not. So, what should be emitted in the -fstrict-volatile-bitfields case? A component_ref -- but this is what Bernd's commit r182545 (for PR51200) changed, I think? Or should later optimization stages learn to better deal with a bit_field_ref, and where would this have to happen? I figured out why this generic code is behaving differently for SH targets. I compared to ARM as well as x86, for which indeed the optimization possibility is not lost even when compiling with -fstrict-volatile-bitfields. The component_ref inside the if statement (but not the one in the switch statement) is fed into optimize_bit_field_compare where it is optimized to »BIT_FIELD_REF *common, 32, 0 255« only for SH, because get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to SImode, hoping for better code generation »since it may eliminate subsequent memory access if subsequent accesses occur to other fields in the same word of the structure, but to different bytes«. (Well, the converse happens here...) After that, in optimize_bit_field_compare, for ARM and x86, »nbitsize == lbitsize«, and thus the optimization is aborted, whereas for SH it will be performed, that is, the component_ref is replaced with »BIT_FIELD_REF *common, 32, 0 255«. If I manually force SH's SImode back to QImode (that is, abort optimize_bit_field_compare), the later optimization passes work as they do for ARM and x86. Before Bernd's r182545, optimize_bit_field_compare returned early (that is, aborted this optimization attempt), because »lbitsize == GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is VOIDmode.) Is emmitting »BIT_FIELD_REF *common, 32, 0 255« wrong in this case, or should a later optimization pass be able to figure out that »BIT_FIELD_REF *common, 32, 0 255« is in fact the same as common-code, and then be able to conflate these? Any suggestions where/how to tackle this? Grüße, Thomas pgp0SPag23wvg.pgp Description: PGP signature
Re: Symbol table 13/many: reachability code rewrite
This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53088 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53089 The first one seems just moved warning because of different gimplification order. Fixed thus. Note that the warning comes out at weird place with input_location, but that problem existed here before the patch. I've comitted the following. I am looking into the fortran coarays now. It seems that coarray registration is somehow broken, but I am not even sure how it is supposed to work. Honza Index: gcc.target/i386/pr39082-1.c === --- gcc.target/i386/pr39082-1.c (revision 186814) +++ gcc.target/i386/pr39082-1.c (working copy) @@ -13,7 +13,7 @@ extern int bar1 (union un); extern union un bar2 (int); int -foo1 (union un u) +foo1 (union un u) /* { dg-message note: the ABI of passing union with long double has changed in GCC 4.4 } */ { bar1 (u); return u.i; @@ -30,6 +30,6 @@ foo2 (void) int foo3 (int x) { - union un u = bar2 (x); /* { dg-message note: the ABI of passing union with long double has changed in GCC 4.4 } */ + union un u = bar2 (x); return u.i; } Index: ChangeLog === --- ChangeLog (revision 186814) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2012-04-25 Jan Hubicka j...@suse.cz + + PR middle-end/53088 + * gcc.target/i386/pr39082-1.c: Update warning location. + 2012-04-25 Jakub Jelinek ja...@redhat.com PR c/52880
[PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c
PING! Here is the original post: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html This patch fixes the failure in gcc.dg/pr52283.c by adding the missing dg-warning and dg-options. OK for trunk? Thanks, Greta gcc/testsuite/ChangeLog 2012-04-20 Greta Yorsh greta.yo...@arm.com * gcc.dg/pr52283.c: Add missing dg-warning and dg-options. diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c index 33785a5..070e71a 100644 --- a/gcc/testsuite/gcc.dg/pr52283.c +++ b/gcc/testsuite/gcc.dg/pr52283.c @@ -1,6 +1,7 @@ /* Test for case labels not integer constant expressions but folding to integer constants (used in Linux kernel). */ /* { dg-do compile } */ +/* { dg-options -pedantic } */ extern unsigned int u; @@ -9,7 +10,7 @@ b (int c) { switch (c) { -case (int) (2 | ((4 8) ? 8 : u)): +case (int) (2 | ((4 8) ? 8 : u)): /* { dg-warning case label is not an integer constant expression } */ ; } }
Re: RFC reminder: an alternative -fsched-pressure implementation
Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: Taking your results for S390 and ARM with Neon into account, I guess it should be included and probably made by default for these 2 targets (for sure for s390). OK, thanks to both of you. Ulrich and Andreas: would you be happy for s390 to use this by default? I'll update the patch and install if so. I've talked to Andreas, and we agree that s390 should use this as default. If you install the base patch, we'll do the back-end change accordingly. (I'll also work with Ramana to enable it on ARM where it makes sense, probably when targeting Cortex-A cores.) Thanks again to you and Vlad for looking into this long-standing problem! Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
Jakub Jelinek wrote: On Wed, Apr 25, 2012 at 10:29:36AM +0200, Richard Guenther wrote: Does this look reasonable? Any comments or suggestions appreciated! Yes, getting rid of this fragile interaction by doing more work in vect_recog_widen_shift_pattern sounds like the correct thing to do. Or give up when seeing already pattern recognized stmts when detecting different pattern, unless the current pattern recognizer is prepared to handle them (and in that case tweak everything as necessary). E.g. several pattern recognizers already start with if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) return NULL; Yes, we should do that, and it'll fix the ICE for sure, but -as I said in the original mail- doing *only* this will cause regressions in some cases because we no longer get the combined fix-over-widening-*and*- use-widening-shift effect ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PR tree-optimization/52558]: RFC: questions on store data race
On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez al...@redhat.com wrote: On 04/13/12 03:46, Richard Guenther wrote: On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandezal...@redhat.com wrote: Richard. Thanks so much for reviewing and providing an alternative approach, which AFAICT provides superior results. A similar effect could be obtained by keeping a flag whether we entered the path that stored the value before the transform. Thus, do lsm = g2; // technically not needed, if you also handle loads flag = false; for (...) { if (g1) { if (flag) g2 = lsm; } else { lsm = 0; flag = true; } } if (flag) g2 = lsm; I have implemented this in the attached patch, and it seems to be generating better code than my other if-changed approach. It also avoids generating unnecessary loads that may trap. For the original testcase: int g_1 = 1; int g_2 = 0; int func_1(void) { int l; for (l = 0; l 1234; l++) { if (g_1) return l; else g_2 = 0; } return 999; } After all optimizations are done, we are now generating the following for the flags approach: new: func_1: movl g_1(%rip), %edx xorl %eax, %eax testl %edx, %edx je .L5 rep ret .L5: movl $0, g_2(%rip) movw $999, %ax ret Which is significantly better than the unmodified trunk of: func_1: movl g_1(%rip), %edx movl g_2(%rip), %eax testl %edx, %edx je .L5 movl %eax, g_2(%rip) xorl %eax, %eax ret .L5: movl $0, g_2(%rip) movl $999, %eax ret And in other less contrived cases, we generate correct code that avoids writes on code paths that did not have it. For example, in Hans register promotion example: int count; struct obj { int data; struct obj *next; } *q; void func() { struct obj *p; for (p = q; p; p = p-next) if (p-data 0) count++; } Under the new memory model we should avoid promoting count to a register and unilaterally writing to it upon exiting the loop. With the attached patch, we now generate: func: .LFB0: movq q(%rip), %rax xorl %ecx, %ecx movl count(%rip), %edx testq %rax, %rax je .L1 .L9: movl (%rax), %esi testl %esi, %esi jle .L3 addl $1, %edx movl $1, %ecx .L3: movq 8(%rax), %rax testq %rax, %rax jne .L9 testb %cl, %cl jne .L12 .L1: rep ret .L12: movl %edx, count(%rip) ret Which is as expected. I don't understand your suggestion of: lsm = g2; // technically not needed, if you also handle loads that would allow to extend this to cover the cases where the access may eventually trap (if you omit the load before the loop). Can't I just remove the lsm=g2? What's this about also handling loads? Not sure which is more efficient (you can eventually combine flag variables for different store locations, combining the if-changed compare is not so easy I guess). So far, I see better code generated with the flags approach, so... 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2. In the PR, Richard mentions that both FRE/PRE can figure this out, but they are not run after store motion. DOM should also be able to, but apparently does not :(. Tips? Well. Schedule FRE after loop opts ... DOM is not prepared to handle loads/stores with differing VOPs - it was never updated really after the alias-improvements merge. 2. The GIMPLE_CONDs I create in this patch are currently causing problems with complex floats/doubles. do_jump is complaining that it can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex values since complex lowering has already happened (??). Is there a more canonical way of creating a GIMPLE_COND that may contain complex floats at this stage? As you are handling redundant stores you want a bitwise comparison anyway, but yes, complex compares need to be lowered. But as said, you want a bitwise comparison, not a value-comparison. You can try using I can ignore all this. I have implemented both alternatives, with a target hook as suggested, but I'm thinking of removing it altogether and just leaving the flags approach. Few comments on the patch itself. + new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops new_bb-loop_father) + add_bb_to_loop (then_bb, new_bb-loop_father); + gsi = gsi_start_bb (new_bb); + t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL); Hmm, why do you need to re-load the
Re: [rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
Richard Guenther wrote: On Tue, 24 Apr 2012, Ulrich Weigand wrote: Does this look reasonable? Any comments or suggestions appreciated! Yes, getting rid of this fragile interaction by doing more work in vect_recog_widen_shift_pattern sounds like the correct thing to do. OK, I'll try to come up with a patch along those lines. Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[PATCH] Fix bitfield expansion (PR middle-end/52979)
Hi! This patch fixes the attached execute/ testcases. In some cases get_best_mode was ignoring bitregion_* restrictions and returned mode that overlapped into adjacent non-bitfields, or store_bit_field_1 for insv could ignore those restrictions completely. After fixing that I've run into several issues that the other parts of the patch fix, namely that get_best_mode returning VOIDmode more often now may lead to store_split_bit_field being called when it wasn't before, and when approaching the bitregion_end point we can't just keep using large units (modes), but need to decrease their size in order not to overwrite adjacent non-bitfields. The get_best_mode dropping of % align change prevents a mutual recursion, e.g. for bitregion_end == 103 it would prevent using QImode even far before the bit region boundary, even when not touching anything close to that bit region end. But the extra + (bitregion_end == 0 + || bitpos - (bitpos % unit) + unit = bitregion_end + 1)) needed to be added after that change, otherwise we'd regress on c-c++-common/simulate-thread/bitfields-2.c. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-04-25 Jakub Jelinek ja...@redhat.com PR middle-end/52979 * stor-layout.c (get_best_mode): Don't return mode with bitsize larger than maxbits. Don't compute maxbits modulo align. Also check that unit bytes long store at bitpos / unit * unit doesn't affect bits beyond bitregion_end. * expmed.c (store_bit_field_1): Avoid trying insv if OP_MODE MEM would not fit into bitregion_start ... bitregion_end + 1 bit region. (store_split_bit_field): Decrease unit close to end of bitregion_end if access is restricted in order to avoid mutual recursion. * gcc.c-torture/compile/pr52979-1.c: New test. * gcc.c-torture/execute/pr52979-1.c: New test. * gcc.c-torture/execute/pr52979-2.c: New test. --- gcc/stor-layout.c.jj2012-03-26 11:53:20.0 +0200 +++ gcc/stor-layout.c 2012-04-25 10:26:18.881631697 +0200 @@ -2624,7 +2624,7 @@ get_best_mode (int bitsize, int bitpos, if (!bitregion_end) maxbits = MAX_FIXED_MODE_SIZE; else -maxbits = (bitregion_end - bitregion_start) % align + 1; +maxbits = bitregion_end - bitregion_start + 1; /* Find the narrowest integer mode that contains the bit field. */ for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; @@ -2645,7 +2645,10 @@ get_best_mode (int bitsize, int bitpos, (Though at least one Unix compiler ignores this problem: that on the Sequent 386 machine. */ || MIN (unit, BIGGEST_ALIGNMENT) align - || (largest_mode != VOIDmode unit GET_MODE_BITSIZE (largest_mode))) + || (largest_mode != VOIDmode unit GET_MODE_BITSIZE (largest_mode)) + || unit maxbits + || (bitregion_end + bitpos - (bitpos % unit) + unit bitregion_end + 1)) return VOIDmode; if ((SLOW_BYTE_ACCESS ! volatilep) @@ -2663,7 +2666,9 @@ get_best_mode (int bitsize, int bitpos, unit = MIN (align, BIGGEST_ALIGNMENT) unit = maxbits (largest_mode == VOIDmode - || unit = GET_MODE_BITSIZE (largest_mode))) + || unit = GET_MODE_BITSIZE (largest_mode)) + (bitregion_end == 0 + || bitpos - (bitpos % unit) + unit = bitregion_end + 1)) wide_mode = tmode; } --- gcc/expmed.c.jj 2012-04-19 11:09:13.0 +0200 +++ gcc/expmed.c2012-04-24 17:14:34.264897874 +0200 @@ -640,7 +640,13 @@ store_bit_field_1 (rtx str_rtx, unsigned !(MEM_P (op0) MEM_VOLATILE_P (op0) flag_strict_volatile_bitfields 0) ! ((REG_P (op0) || GET_CODE (op0) == SUBREG) -(bitsize + bitpos GET_MODE_BITSIZE (op_mode +(bitsize + bitpos GET_MODE_BITSIZE (op_mode))) + /* Do not use insv if the bit region is restricted and +op_mode integer at offset doesn't fit into the +restricted region. */ + !(MEM_P (op0) bitregion_end + bitnum - bitpos + GET_MODE_BITSIZE (op_mode) + bitregion_end + 1)) { struct expand_operand ops[4]; int xbitpos = bitpos; @@ -760,7 +766,7 @@ store_bit_field_1 (rtx str_rtx, unsigned || GET_MODE_BITSIZE (GET_MODE (op0)) maxbits || (op_mode != MAX_MACHINE_MODE GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (op_mode))) - bestmode = get_best_mode (bitsize, bitnum, + bestmode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end, MEM_ALIGN (op0), (op_mode == MAX_MACHINE_MODE @@ -1096,6 +1102,16 @@ store_split_bit_field (rtx op0, unsigned offset = (bitpos + bitsdone) / unit; thispos = (bitpos + bitsdone) % unit; + /*
[Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? Thanks, Greta gcc/testsuite/ChangeLog 2012-04-23 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (dg-final): Adjust expected number of vectorized loops for arm with neon. diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c index d211ef9..beaa96c 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c @@ -52,7 +52,9 @@ int main (int argc, const char* argv[]) return 0; } -/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target vect_perm_byte } } } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { vect_perm_byte arm_neon_ok } } } } */ +/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target { vect_perm_byte arm_neon_ok } } } }*/ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { vect_perm_byte {! arm_neon_ok } } } } } */ /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { target vect_perm_byte } } } */ /* { dg-final { cleanup-tree-dump vect } } */
Re: Continue strict-volatile-bitfields fixes
On Wed, Apr 25, 2012 at 1:27 PM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! On Thu, 19 Apr 2012 19:46:17 +0200, I wrote: Here is my current test case, reduced from gcc.dg/tree-ssa/20030922-1.c: extern void abort (void); enum tree_code { BIND_EXPR, }; struct tree_common { enum tree_code code:8; }; void voidify_wrapper_expr (struct tree_common *common) { switch (common-code) { case BIND_EXPR: if (common-code != BIND_EXPR) abort (); } } The diff for -fdump-tree-all between compiling with -fno-strict-volatile-bitfields and -fstrict-volatile-bitfields begins with the following, right in 20030922-1.c.003t.original: diff -ru fnsvb/20030922-1.c.003t.original fsvb/20030922-1.c.003t.original --- fnsvb/20030922-1.c.003t.original 2012-04-19 16:51:18.322150866 +0200 +++ fsvb/20030922-1.c.003t.original 2012-04-19 16:49:18.132088498 +0200 @@ -7,7 +7,7 @@ switch ((int) common-code) { case 0:; - if (common-code != 0) + if ((BIT_FIELD_REF *common, 32, 0 255) != 0) { abort (); } That is, for -fno-strict-volatile-bitfields the second instance of »common-code« it is a component_ref, whereas for -fstrict-volatile-bitfields it is a bit_field_ref. The former will be optimized as intended, the latter will not. So, what should be emitted in the -fstrict-volatile-bitfields case? A component_ref -- but this is what Bernd's commit r182545 (for PR51200) changed, I think? Or should later optimization stages learn to better deal with a bit_field_ref, and where would this have to happen? I figured out why this generic code is behaving differently for SH targets. I compared to ARM as well as x86, for which indeed the optimization possibility is not lost even when compiling with -fstrict-volatile-bitfields. The component_ref inside the if statement (but not the one in the switch statement) is fed into optimize_bit_field_compare where it is optimized to »BIT_FIELD_REF *common, 32, 0 255« only for SH, because get_best_mode for SH returns SImode (ARM, x86: QImode), because SH has »#define SLOW_BYTE_ACCESS 1«, and thus the original QImode is widened to SImode, hoping for better code generation »since it may eliminate subsequent memory access if subsequent accesses occur to other fields in the same word of the structure, but to different bytes«. (Well, the converse happens here...) After that, in optimize_bit_field_compare, for ARM and x86, »nbitsize == lbitsize«, and thus the optimization is aborted, whereas for SH it will be performed, that is, the component_ref is replaced with »BIT_FIELD_REF *common, 32, 0 255«. If I manually force SH's SImode back to QImode (that is, abort optimize_bit_field_compare), the later optimization passes work as they do for ARM and x86. Before Bernd's r182545, optimize_bit_field_compare returned early (that is, aborted this optimization attempt), because »lbitsize == GET_MODE_BITSIZE (lmode)« (8 bits). (With the current sources, lmode is VOIDmode.) Is emmitting »BIT_FIELD_REF *common, 32, 0 255« wrong in this case, or should a later optimization pass be able to figure out that »BIT_FIELD_REF *common, 32, 0 255« is in fact the same as common-code, and then be able to conflate these? Any suggestions where/how to tackle this? The BIT_FIELD_REF is somewhat of a red-herring. It is created by fold-const.c in optimize_bit_field_compare, code that I think should be removed completely. Or it needs to be made aware of strict-volatile bitfield and C++ memory model details. Richard. Grüße, Thomas
Re: [PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c
On Wed, Apr 25, 2012 at 1:40 PM, Greta Yorsh greta.yo...@arm.com wrote: PING! Here is the original post: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html This patch fixes the failure in gcc.dg/pr52283.c by adding the missing dg-warning and dg-options. OK for trunk? Didn't I approve this already? Richard. Thanks, Greta gcc/testsuite/ChangeLog 2012-04-20 Greta Yorsh greta.yo...@arm.com * gcc.dg/pr52283.c: Add missing dg-warning and dg-options. diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c index 33785a5..070e71a 100644 --- a/gcc/testsuite/gcc.dg/pr52283.c +++ b/gcc/testsuite/gcc.dg/pr52283.c @@ -1,6 +1,7 @@ /* Test for case labels not integer constant expressions but folding to integer constants (used in Linux kernel). */ /* { dg-do compile } */ +/* { dg-options -pedantic } */ extern unsigned int u; @@ -9,7 +10,7 @@ b (int c) { switch (c) { - case (int) (2 | ((4 8) ? 8 : u)): + case (int) (2 | ((4 8) ? 8 : u)): /* { dg-warning case label is not an integer constant expression } */ ; } }
Re: PR c/53066 Wshadow should not warn for shadowing an extern function
On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote: On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote: On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote: Wshadow warns whenever any declaration shadows a global function declaration. This is almost always noise, since most (always?) of the time one cannot mistakenly replace a function by another variable. The false positives are too common (Linus mentions using the name 'index' when including string.h). I think the correct rule would be: warn if a variable *with pointer-to-function type* shadows a function, and warn if a nested function shadows another function, but don't warn for variables shadowing functions if the variable has any other type (because if the variable has some type that isn't a pointer-to-function, no confusion is likely without another error being given). Right. How does one check that a decl is a nested function? I think you should check if the decl has function type or pointer to function type. (Nested functions are simply I think the only valid case where you can end up with one function shadowing another; has function type is the logical check rather than is a nested function.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix bitfield expansion (PR middle-end/52979)
On Wed, 25 Apr 2012, Jakub Jelinek wrote: Hi! This patch fixes the attached execute/ testcases. In some cases get_best_mode was ignoring bitregion_* restrictions and returned mode that overlapped into adjacent non-bitfields, or store_bit_field_1 for insv could ignore those restrictions completely. After fixing that I've run into several issues that the other parts of the patch fix, namely that get_best_mode returning VOIDmode more often now may lead to store_split_bit_field being called when it wasn't before, and when approaching the bitregion_end point we can't just keep using large units (modes), but need to decrease their size in order not to overwrite adjacent non-bitfields. The get_best_mode dropping of % align change prevents a mutual recursion, e.g. for bitregion_end == 103 it would prevent using QImode even far before the bit region boundary, even when not touching anything close to that bit region end. But the extra + (bitregion_end == 0 + || bitpos - (bitpos % unit) + unit = bitregion_end + 1)) needed to be added after that change, otherwise we'd regress on c-c++-common/simulate-thread/bitfields-2.c. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2012-04-25 Jakub Jelinek ja...@redhat.com PR middle-end/52979 * stor-layout.c (get_best_mode): Don't return mode with bitsize larger than maxbits. Don't compute maxbits modulo align. Also check that unit bytes long store at bitpos / unit * unit doesn't affect bits beyond bitregion_end. * expmed.c (store_bit_field_1): Avoid trying insv if OP_MODE MEM would not fit into bitregion_start ... bitregion_end + 1 bit region. (store_split_bit_field): Decrease unit close to end of bitregion_end if access is restricted in order to avoid mutual recursion. * gcc.c-torture/compile/pr52979-1.c: New test. * gcc.c-torture/execute/pr52979-1.c: New test. * gcc.c-torture/execute/pr52979-2.c: New test. --- gcc/stor-layout.c.jj 2012-03-26 11:53:20.0 +0200 +++ gcc/stor-layout.c 2012-04-25 10:26:18.881631697 +0200 @@ -2624,7 +2624,7 @@ get_best_mode (int bitsize, int bitpos, if (!bitregion_end) maxbits = MAX_FIXED_MODE_SIZE; else -maxbits = (bitregion_end - bitregion_start) % align + 1; +maxbits = bitregion_end - bitregion_start + 1; /* Find the narrowest integer mode that contains the bit field. */ for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; @@ -2645,7 +2645,10 @@ get_best_mode (int bitsize, int bitpos, (Though at least one Unix compiler ignores this problem: that on the Sequent 386 machine. */ || MIN (unit, BIGGEST_ALIGNMENT) align - || (largest_mode != VOIDmode unit GET_MODE_BITSIZE (largest_mode))) + || (largest_mode != VOIDmode unit GET_MODE_BITSIZE (largest_mode)) + || unit maxbits + || (bitregion_end +bitpos - (bitpos % unit) + unit bitregion_end + 1)) return VOIDmode; if ((SLOW_BYTE_ACCESS ! volatilep) @@ -2663,7 +2666,9 @@ get_best_mode (int bitsize, int bitpos, unit = MIN (align, BIGGEST_ALIGNMENT) unit = maxbits (largest_mode == VOIDmode - || unit = GET_MODE_BITSIZE (largest_mode))) + || unit = GET_MODE_BITSIZE (largest_mode)) +(bitregion_end == 0 + || bitpos - (bitpos % unit) + unit = bitregion_end + 1)) wide_mode = tmode; } --- gcc/expmed.c.jj 2012-04-19 11:09:13.0 +0200 +++ gcc/expmed.c 2012-04-24 17:14:34.264897874 +0200 @@ -640,7 +640,13 @@ store_bit_field_1 (rtx str_rtx, unsigned !(MEM_P (op0) MEM_VOLATILE_P (op0) flag_strict_volatile_bitfields 0) ! ((REG_P (op0) || GET_CODE (op0) == SUBREG) - (bitsize + bitpos GET_MODE_BITSIZE (op_mode + (bitsize + bitpos GET_MODE_BITSIZE (op_mode))) + /* Do not use insv if the bit region is restricted and + op_mode integer at offset doesn't fit into the + restricted region. */ + !(MEM_P (op0) bitregion_end + bitnum - bitpos + GET_MODE_BITSIZE (op_mode) +bitregion_end + 1)) { struct expand_operand ops[4]; int xbitpos = bitpos; @@ -760,7 +766,7 @@ store_bit_field_1 (rtx str_rtx, unsigned || GET_MODE_BITSIZE (GET_MODE (op0)) maxbits || (op_mode != MAX_MACHINE_MODE GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (op_mode))) - bestmode = get_best_mode (bitsize, bitnum, + bestmode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end, MEM_ALIGN (op0), (op_mode == MAX_MACHINE_MODE @@ -1096,6 +1102,16 @@ store_split_bit_field (rtx op0, unsigned offset =
Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote: The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? I think the proper fix is to instead of for (i = 0; i N; i++) { input[i] = i; output[i] = 0; if (input[i] 256) abort (); } use for (i = 0; i N; i++) { input[i] = i; output[i] = 0; __asm__ volatile (); } to prevent vectorization of initialization loops. Thanks, Greta gcc/testsuite/ChangeLog 2012-04-23 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (dg-final): Adjust expected number of vectorized loops for arm with neon.
Re: PR c/53066 Wshadow should not warn for shadowing an extern function
On Wed, Apr 25, 2012 at 5:36 AM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote: On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote: Wshadow warns whenever any declaration shadows a global function declaration. This is almost always noise, since most (always?) of the time one cannot mistakenly replace a function by another variable. The false positives are too common (Linus mentions using the name 'index' when including string.h). I think the correct rule would be: warn if a variable *with pointer-to-function type* shadows a function, and warn if a nested function shadows another function, but don't warn for variables shadowing functions if the variable has any other type (because if the variable has some type that isn't a pointer-to-function, no confusion is likely without another error being given). Right. How does one check that a decl is a nested function? is DECL_CONTEXT (decl_function_context (t)) a FUNCTION_DECL?
Re: PR c/53066 Wshadow should not warn for shadowing an extern function
On Wed, Apr 25, 2012 at 6:54 AM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 Apr 2012, Manuel López-Ibáñez wrote: On 25 April 2012 00:01, Joseph S. Myers jos...@codesourcery.com wrote: On Sun, 22 Apr 2012, Manuel López-Ibáñez wrote: Wshadow warns whenever any declaration shadows a global function declaration. This is almost always noise, since most (always?) of the time one cannot mistakenly replace a function by another variable. The false positives are too common (Linus mentions using the name 'index' when including string.h). I think the correct rule would be: warn if a variable *with pointer-to-function type* shadows a function, and warn if a nested function shadows another function, but don't warn for variables shadowing functions if the variable has any other type (because if the variable has some type that isn't a pointer-to-function, no confusion is likely without another error being given). Right. How does one check that a decl is a nested function? I think you should check if the decl has function type or pointer to function type. (Nested functions are simply I think the only valid case where you can end up with one function shadowing another; has function type is the logical check rather than is a nested function.) indeed. That should be enough to remove most of the false positives.
Re: PowerPC prologue and epilogue 6
On Wed, Apr 25, 2012 at 1:20 AM, Alan Modra amo...@gmail.com wrote: This patch adds a testcase to verify register saves and restores. I tried to write it so that it will run on all powerpc targets. From past experience it probably won't. OK to apply anyway, and fix fallout later? * gcc.target/powerpc/savres.c: New test. * gcc.target/powerpc/powerpc.exp: Run it. Okay. Thanks, David
Re: [PATCH] reload: Try alternative with swapped operands before going to the next
Andreas Krebbel wrote: 2011-11-17 Andreas Krebbel andreas.kreb...@de.ibm.com * reload.c (find_reloads): Change the loop nesting when trying an alternative with swapped operands. This is OK. Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[PATCH] Fix testsuite fallout of last patch
Committed, tested on x86_64-unknown-linux-gnu. Richard. 2012-04-25 Richard Guenther rguent...@suse.de * gcc.target/i386/l_fma_float_5.c: Adjust. * gcc.target/i386/l_fma_double_4.c: Likewise. * gcc.target/i386/l_fma_float_2.c: Likewise. * gcc.target/i386/l_fma_float_6.c: Likewise. * gcc.target/i386/l_fma_double_1.c: Likewise. * gcc.target/i386/l_fma_double_5.c: Likewise. * gcc.target/i386/l_fma_float_3.c: Likewise. * gcc.target/i386/l_fma_double_2.c: Likewise. * gcc.target/i386/l_fma_double_6.c: Likewise. * gcc.target/i386/l_fma_float_4.c: Likewise. * gcc.target/i386/l_fma_double_3.c: Likewise. * gcc.target/i386/l_fma_float_1.c: Likewise. Index: gcc/testsuite/gcc.target/i386/l_fma_float_5.c === --- gcc/testsuite/gcc.target/i386/l_fma_float_5.c (revision 186812) +++ gcc/testsuite/gcc.target/i386/l_fma_float_5.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times vfmsub132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmadd132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmsub132ps 8 } } */ -/* { dg-final { scan-assembler-times vfmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfmsub132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmsub132ss 8 } } */ +/* { dg-final { scan-assembler-times vfmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfmsub132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmsub132ss 16 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_4.c === --- gcc/testsuite/gcc.target/i386/l_fma_double_4.c (revision 186812) +++ gcc/testsuite/gcc.target/i386/l_fma_double_4.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times vfmsub132pd 8 } } */ /* { dg-final { scan-assembler-times vfnmadd132pd 8 } } */ /* { dg-final { scan-assembler-times vfnmsub132pd 8 } } */ -/* { dg-final { scan-assembler-times vfmadd132sd 8 } } */ -/* { dg-final { scan-assembler-times vfmsub132sd 8 } } */ -/* { dg-final { scan-assembler-times vfnmadd132sd 8 } } */ -/* { dg-final { scan-assembler-times vfnmsub132sd 8 } } */ +/* { dg-final { scan-assembler-times vfmadd132sd 16 } } */ +/* { dg-final { scan-assembler-times vfmsub132sd 16 } } */ +/* { dg-final { scan-assembler-times vfnmadd132sd 16 } } */ +/* { dg-final { scan-assembler-times vfnmsub132sd 16 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_2.c === --- gcc/testsuite/gcc.target/i386/l_fma_float_2.c (revision 186812) +++ gcc/testsuite/gcc.target/i386/l_fma_float_2.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times vfmsub132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmadd132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmsub132ps 8 } } */ -/* { dg-final { scan-assembler-times vfmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfmsub132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmsub132ss 8 } } */ +/* { dg-final { scan-assembler-times vfmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfmsub132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmsub132ss 16 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_6.c === --- gcc/testsuite/gcc.target/i386/l_fma_float_6.c (revision 186812) +++ gcc/testsuite/gcc.target/i386/l_fma_float_6.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times vfmsub132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmadd132ps 8 } } */ /* { dg-final { scan-assembler-times vfnmsub132ps 8 } } */ -/* { dg-final { scan-assembler-times vfmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfmsub132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmadd132ss 8 } } */ -/* { dg-final { scan-assembler-times vfnmsub132ss 8 } } */ +/* { dg-final { scan-assembler-times vfmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfmsub132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmadd132ss 16 } } */ +/* { dg-final { scan-assembler-times vfnmsub132ss 16 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_1.c === --- gcc/testsuite/gcc.target/i386/l_fma_double_1.c (revision 186812) +++ gcc/testsuite/gcc.target/i386/l_fma_double_1.c (working copy) @@ -16,11 +16,11 @@ /* { dg-final { scan-assembler-times vfnmadd231pd 4 } } */ /* { dg-final { scan-assembler-times vfnmsub132pd 4 } } */ /* { dg-final {
fix incorrect SRA transformation on non-integral VIEW_CONVERT argument
Hello, For the PA(1).Z := 44; assignment in the attached Ada testcase, we observe the gcc 4.5 SRA pass performing an invalid transformation, turning: struct { system__pack_48__bits_48 OBJ; } D.1432; D.1432.OBJ = D.1435; T1b.F = VIEW_CONVERT_EXPRstruct pt__point(D.1432); into: SR.12_17 = D.1435_3; T1b.F = VIEW_CONVERT_EXPRstruct pt__point(SR.12_17); where we have var_decl D.1432 type record_type 0x77fb72a0 BLK size integer_cst 0x77fac960 constant 48 and var_decl SR.12 type integer_type system__pack_48__bits_48 size integer_cst 0x77ecd870 64 type integer_type system__pack_48__bits_48___UMT size integer_cst 64 At least the change in size is problematic because the conversion outcome might differ after the replacement, in particular on big-endian targets. mainline does something slightly different with the same effects eventually (same testcase failure on sparc-solaris). The attached patch is a proposal to address this at the point where we already check for VCE in the access creation process, disqualifying scalarization for a VCE argument of non-integral size. We (AdaCore) have been using this internally for a while now. I also checked that it fixes the observable problem on sparc, then bootstrapped and regtested on i686-suse-linux. OK to commit ? Thanks in advance for your feedback, Olivier 2012-04-25 Olivier Hainque hain...@adacore.com * tree-sra.c (create_access): Additional IN_VCE argument, telling if EXPR is VIEW_CONVERT_EXPR input. Disqualify base if access size is not that of an integer mode in this case. (build_access_from_expr_1): Adjust caller. testsuite/ * gnat.dg/sra_vce[_decls].adb: New testcase. sravce.dif Description: video/dv
Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion
Gabriel Dos Reis g...@integrable-solutions.net writes: On Wed, Apr 11, 2012 at 3:46 AM, Dodji Seketeli do...@redhat.com wrote: There are various conversion related warnings that trigger on potentially dangerous uses of NULL (or __null). NULL is defined as a macro in a system header, so calling warning or warning_at on a virtual location of NULL yields no diagnostic. So the test accompanying this patch (as well as others), was failling when run with -ftrack-macro-expansion. I think it's necessary to use the location of NULL that is in the main source code (instead of, e.g, the spelling location that is in the system header where the macro is defined) in those cases. Note that for __null, we don't have the issue. I like the idea. However, you should write a separate function (get_null_ptr_cst_location?) that computes the location of the null pointer constant token and call it from where you need the location. OK. I have introduced such a new function and I gave it the slightly more generic name expansion_point_location_if_in_system_header as I suspect it can be useful for macros that are similar to NULL. I haven't spotted such macros (from test regressions) yet, though. Here is the updated patch that passes bootstrap with -ftrack-macro-expansion turned off; it also passes bootstrap with -ftrack-macro-expansion turned on with the whole series of patches I have locally on my tree. gcc/ * input.h (expansion_point_location_if_in_system_header): Declare new function. * input.c (expansion_point_location_if_in_system_header): Define it. gcc/cp/ * call.c (conversion_null_warnings): Use the new expansion_point_location_if_in_system_header. * cvt.c (build_expr_type_conversion): Likewise. * typeck.c (cp_build_binary_op): Likewise. gcc/testsuite/ * g++.dg/warn/Wconversion-null-2.C: Add testing for __null, alongside the previous testing for NULL. --- gcc/cp/call.c | 16 ++- gcc/cp/cvt.c | 16 ++- gcc/cp/typeck.c| 18 +++-- gcc/input.c| 14 ++ gcc/input.h|1 + gcc/testsuite/g++.dg/warn/Wconversion-null-2.C | 31 +++- 6 files changed, 88 insertions(+), 8 deletions(-) diff --git a/gcc/cp/call.c b/gcc/cp/call.c index f9a7f08..1dc57fc 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -5598,12 +5598,24 @@ conversion_null_warnings (tree totype, tree expr, tree fn, int argnum) if (expr == null_node TREE_CODE (totype) != BOOLEAN_TYPE ARITHMETIC_TYPE_P (totype)) { + /* The location of the warning here is most certainly the one for +the token that represented null_node in the source code. +That is either NULL or __null. If it is NULL, then that +macro is defined in a system header and, as a consequence, +warning_at won't emit any diagnostic for it. In this case, +we are going to resolve that location to the point in the +source code where the expression that triggered this error +message is, rather than the point where the NULL macro is +defined. */ + source_location loc = + expansion_point_location_if_in_system_header (input_location); + if (fn) - warning_at (input_location, OPT_Wconversion_null, + warning_at (loc, OPT_Wconversion_null, passing NULL to non-pointer argument %P of %qD, argnum, fn); else - warning_at (input_location, OPT_Wconversion_null, + warning_at (loc, OPT_Wconversion_null, converting to non-pointer type %qT from NULL, totype); } diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c index 3dab372..8defc61 100644 --- a/gcc/cp/cvt.c +++ b/gcc/cp/cvt.c @@ -1472,8 +1472,20 @@ build_expr_type_conversion (int desires, tree expr, bool complain) if (expr == null_node (desires WANT_INT) !(desires WANT_NULL)) -warning_at (input_location, OPT_Wconversion_null, - converting NULL to non-pointer type); +{ + /* The location of the warning here is the one for the +(expansion of the) NULL macro, or for __null. If it is for +NULL, then, as that that macro is defined in a system header, +warning_at won't emit any diagnostic. In this case, we are +going to resolve that location to the point in the source +code where the expression that triggered this warning is, +rather than the point where the NULL macro is defined. */ + source_location loc = + expansion_point_location_if_in_system_header (input_location); + + warning_at (loc, OPT_Wconversion_null, + converting NULL to non-pointer type); +} basetype = TREE_TYPE (expr); diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
Re: [PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c
On Apr 25, 2012, at 4:52 AM, Richard Guenther wrote: On Wed, Apr 25, 2012 at 1:40 PM, Greta Yorsh greta.yo...@arm.com wrote: PING! Here is the original post: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html This patch fixes the failure in gcc.dg/pr52283.c by adding the missing dg-warning and dg-options. OK for trunk? Didn't I approve this already? http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01361.html I think that is the right message... :-)
[PATCH] Default to -gdwarf-4
Hi! For reasonable debugging experience recent GCC versions need GDB = 7.0 for quite some time, and DWARF4 is almost 2 years old now, and offers lots of improvements over DWARF2 we still default to. So, I'd like to make -gdwarf-4 (just the version, of course -g is needed to enable debug info generation) the default, together with -fno-debug-types-section (as .debug_types isn't right now always a win, see the data in the dwz-0.1 announcement plus not all DWARF consumers can handle it yet or are gaining support only very recently (e.g. valgrind)) and -grecord-gcc-switches which is IMHO worth the few extra bytes per CU (unless every CU is compiled with different code generation affecting options usually just 4 extra bytes). In Fedora we default to this combo already for some time. Targets that have tools that are upset by any extensions that defaulted to -gno-strict-dwarf previously will now default to -gdwarf-2 as before. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-04-25 Jakub Jelinek ja...@redhat.com * common.opt (flag_debug_types_section): Default to 0. (dwarf_version): Default to 4. (dwarf_record_gcc_switches): Default to 1. (dwarf_strict): Default to 0. * toplev.c (process_options): Don't handle dwarf_strict or dwarf_version here. * config/vxworks.c (vxworks_override_options): Don't test whether dwarf_strict or dwarf_version are negative, instead test !global_options_set.x_dwarf_*. * config/darwin.c (darwin_override_options): Default to dwarf_version 2. * doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches and -fno-debug-types-section are now the default. --- gcc/common.opt.jj 2012-04-25 12:14:55.0 +0200 +++ gcc/common.opt 2012-04-25 12:35:33.944460663 +0200 @@ -967,7 +967,7 @@ Common Joined RejectNegative Var(common_ Map one directory name to another in debug information fdebug-types-section -Common Report Var(flag_debug_types_section) Init(1) +Common Report Var(flag_debug_types_section) Init(0) Output .debug_types section when using DWARF v4 debuginfo. ; Nonzero for -fdefer-pop: don't pop args after each function call @@ -2212,7 +2212,7 @@ Common JoinedOrMissing Negative(gdwarf-) Generate debug information in COFF format gdwarf- -Common Joined UInteger Var(dwarf_version) Init(-1) Negative(gstabs) +Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs) Generate debug information in DWARF v2 (or later) format ggdb @@ -2220,7 +2220,7 @@ Common JoinedOrMissing Generate debug information in default extended format gno-record-gcc-switches -Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0) +Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(1) Don't record gcc command line switches in DWARF DW_AT_producer. grecord-gcc-switches @@ -2236,7 +2236,7 @@ Common JoinedOrMissing Negative(gvms) Generate debug information in extended STABS format gno-strict-dwarf -Common RejectNegative Var(dwarf_strict,0) Init(-1) +Common RejectNegative Var(dwarf_strict,0) Init(0) Emit DWARF additions beyond selected version gstrict-dwarf --- gcc/toplev.c.jj 2012-04-25 12:14:55.0 +0200 +++ gcc/toplev.c2012-04-25 12:34:38.016819701 +0200 @@ -1375,15 +1375,6 @@ process_options (void) } } - /* Unless over-ridden for the target, assume that all DWARF levels - may be emitted, if DWARF2_DEBUG is selected. */ - if (dwarf_strict 0) -dwarf_strict = 0; - - /* And select a default dwarf level. */ - if (dwarf_version 0) -dwarf_version = 2; - /* A lot of code assumes write_symbols == NO_DEBUG if the debugging level is 0. */ if (debug_info_level == DINFO_LEVEL_NONE) --- gcc/config/vxworks.c.jj 2012-04-25 12:14:55.0 +0200 +++ gcc/config/vxworks.c2012-04-25 12:34:04.868027337 +0200 @@ -1,5 +1,5 @@ /* Common VxWorks target definitions for GNU compiler. - Copyright (C) 2007, 2008, 2010 + Copyright (C) 2007, 2008, 2010, 2012 Free Software Foundation, Inc. Contributed by CodeSourcery, Inc. @@ -147,9 +147,9 @@ vxworks_override_options (void) /* Default to strict dwarf-2 to prevent potential difficulties observed with non-gdb debuggers on extensions 2. */ - if (dwarf_strict 0) + if (!global_options_set.x_dwarf_strict) dwarf_strict = 1; - if (dwarf_version 0) + if (!global_options_set.x_dwarf_version) dwarf_version = 2; } --- gcc/config/darwin.c.jj 2011-12-01 11:45:06.0 +0100 +++ gcc/config/darwin.c 2012-04-25 12:21:03.965982850 +0200 @@ -1,6 +1,6 @@ /* Functions for generic Darwin as target machine for GNU C compiler. Copyright (C) 1989, 1990, 1991, 1992, 1993, 2000, 2001, 2002, 2003, 2004, - 2005, 2006, 2007, 2008, 2009, 2010, 2011 + 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. Contributed by Apple Computer Inc. @@ -2973,6 +2973,8
Re: [PATCH] Make sizetypes no longer sign-extending
On Tue, 24 Apr 2012, Richard Guenther wrote: On Tue, 24 Apr 2012, Richard Guenther wrote: I've been carrying this patch for quite some while now and really want to go forward with it - the problem is that while all default languages work fine after this patch Ada shows some testsuite regressions. I've had various hacks/workarounds throughout the Ada frontend for them, but lost track of what fixed what and they all felt like hacks anyway. Thus - I know the patch will add Ada testsuite regressions. But it will not break Ada bootstrap. Ada is not in the set of default languages, nor is it considered release critical. Are the Ada folks happy with helping to fix the fallout after-the-fact (I got Eric to fix the bootstrap issues that were first present - thanks for that)? I am happy to revisit my hacks/workarounds and post them, but it will be ultimatively easier to review them if you can see the FAIL for yourself (there are some workarounds/hacks posted on the mailinglist for previous attempts IIRC). Thanks for your consideration. The patch is currently under re-testing (it needs the 2nd patch below, which was already approved but back in time broke Ada bootstrap - I didn't verify if that still occurs). To followup myself - bootstrap with just the 2nd patch is still broken: /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic -W -Wall -gnatpg -nostdinc -m32 s-secsta.adb -o s-secsta.o s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk' is too large Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size); ^ make[9]: *** [s-secsta.o] Error 1 And the following is the list of regressions introduced by the combined patch set: === acats tests === FAIL: a71004a FAIL: c36204d FAIL: c36205l FAIL: c37404b FAIL: c41107a FAIL: c41204a FAIL: c43204c FAIL: c43204e FAIL: c43204f FAIL: c43204g FAIL: c43204h FAIL: c43204i FAIL: c52102a FAIL: c52102c FAIL: c64103c FAIL: c64103d FAIL: c64106a FAIL: c95087a FAIL: cc1224a FAIL: cc1311a FAIL: cc3106b FAIL: cc3224a FAIL: cd2a31a === acats Summary === # of expected passes2297 # of unexpected failures23 === gnat tests === Running target unix/ FAIL: gnat.dg/array11.adb (test for warnings, line 12) FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) FAIL: gnat.dg/loop_optimization3.adb execution test FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) FAIL: gnat.dg/test_8bitlong_overflow.adb execution test === gnat Summary for unix/ === # of expected passes1089 # of unexpected failures8 # of expected failures 13 # of unsupported tests 2 Running target unix//-m32 FAIL: gnat.dg/array11.adb (test for warnings, line 12) FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) FAIL: gnat.dg/loop_optimization3.adb execution test FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) FAIL: gnat.dg/test_8bitlong_overflow.adb execution test === gnat Summary for unix//-m32 === # of expected passes1089 # of unexpected failures8 # of expected failures 13 # of unsupported tests 2 === gnat Summary === # of expected passes2178 # of unexpected failures16 # of expected failures 26 # of unsupported tests 4 Most of the ACATS errors are raised STORAGE_ERROR : object too large or error: size of variable '...' is too large. The gnat testsuite adds warning: Storage_Error will be raised at run time to this. I remember one workaround (which usually involves re-setting TREE_OVERFLOW at strathegic places) fixes most of ACATS. I'll try to isolate that (but it's a hack and does not feel right). Ah, and all ACATS fails and -FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) -FAIL: gnat.dg/loop_optimization3.adb execution test -FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) -FAIL: gnat.dg/test_8bitlong_overflow.adb execution test are fixed by for example Index: trunk/gcc/stor-layout.c === ---
Re: [PATCH, PR38785] Throttle PRE at -O3
On Wed, 25 Apr 2012, Maxim Kuvyrkov wrote: OK. Gerald, does the patch for gcc-4.8/changes.html look OK? Yes, it does, thank you. Just add a and the in three places, as in A new option, the...optimization and at the...level when committing it, please. Cheers, Gerald
[PATCH 10/13] Fix location for static class members
Consider the test case g++.dg/other/offsetof5.C: #include stddef.h struct A { char c; int i; }; int j = offsetof (A, i);// { dg-warning invalid access|offsetof } template typename T struct S { T h; T i; static const int j = offsetof (S, i); // { dg-warning invalid access|offsetof } }; int k = Sint::j; // { dg-message required from here } The second warning (that involves the instantiation of the S template) is not emitted when -ftrack-macro-expansion is on. This is because during the instantiation of the member j of S template, the location that is used for the warning is the one for the DECL j (set by instantiate_decl). And that location is inaccurately set to the locus of 'offsetof', which is a macro defined in a system header, so it's discarded by the diagnostics machinery. Note that when we reach the point where we emit the warning in build_class_member_access_expr offsetof expression has long been folded, so we cannot use e.g, the location of the ')' token that would have been in the source code. So I believe the location of 'j' is the best we can get at this point. The patch below sets the location of the DECL for 'j' to what I believe is its precise location; with that, the test case passes with and without -ftrack-macro-expansion. But I had to adjust g++.dg/template/sfinae6_neg.C for that. Tested on x86_64-unknown-linux-gnu against trunk. gcc/cp * decl.c (grokdeclarator): Use the location carried by the declarator for the DECL of the static class member. gcc/testsuite/ * g++.dg/template/sfinae6_neg.C: Adjust. --- gcc/cp/decl.c |3 ++- gcc/testsuite/g++.dg/template/sfinae6_neg.C |4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 28c7cee..40818a3 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10267,7 +10267,8 @@ grokdeclarator (const cp_declarator *declarator, { /* C++ allows static class members. All other work for this is done by grokfield. */ - decl = build_lang_decl (VAR_DECL, unqualified_id, type); + decl = build_lang_decl_loc (declarator-id_loc, + VAR_DECL, unqualified_id, type); set_linkage_for_static_data_member (decl); /* Even if there is an in-class initialization, DECL is considered undefined until an out-of-class diff --git a/gcc/testsuite/g++.dg/template/sfinae6_neg.C b/gcc/testsuite/g++.dg/template/sfinae6_neg.C index d4be5dd..9b7bdfd1 100644 --- a/gcc/testsuite/g++.dg/template/sfinae6_neg.C +++ b/gcc/testsuite/g++.dg/template/sfinae6_neg.C @@ -21,9 +21,9 @@ no_type check_is_callable2(...); templatetypename F, typename T1, typename T2 = T1 struct is_callable2 { - static const bool value = + static const bool value = // { dg-error within this context } (sizeof(check_is_callable2(typeF(), typeT1(), typeT2())) - == sizeof(yes_type)); // { dg-error within this context } + == sizeof(yes_type)); }; #define JOIN( X, Y ) DO_JOIN( X, Y ) -- Dodji
Re: combine_conversions int-double-int
On Wed, 25 Apr 2012, Richard Guenther wrote: On Wed, Apr 25, 2012 at 10:12 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, a conversion like int-double-int is just the identity, as long as double is big enough to represent all ints exactly. The most natural way I found to do this optimization is the attached: 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT. Does the approach make sense? I don't know that code, and adding FLOAT_EXPR / FIX_TRUNC_EXPR was a bit of guesswork. The precision of double could be multiplied by log2(base), but not doing it is safe. If the approach is ok, I could extend it so int-double-long also skips the intermediate conversion. Bootstrapped and regression tested on linux-x86_64. Should I try and write a testcase for a specific target checking for specific asm instructions there, or is there a better way? Well, scanning the forwprop dump for example. Btw, I think not munging this new case into the existing CONVERT_EXPR_P code would be better - it makes the code (even) harder to understand and I'm not convinced that adding FLOAT_EXPR/FIX_TRUNC_EXPR handling does not wreck any assumptions in that code. It also seems that for DECIMAL_FLOAT_TYPE_P the transform is always valid? Richard. (note that I can't commit) Here is take 2 on this patch, which seems cleaner. Bootstrapped and regression tested. gcc/ChangeLog 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * tree-ssa-forwprop.c (combine_conversions): Handle INT-FP-INT. gcc/testsuite/ChangeLog 2012-04-25 Marc Glisse marc.gli...@inria.fr PR middle-end/27139 * gcc.dg/tree-ssa/forwprop-18.c: New test. In my patch, the lines with gimple_assign_* are vaguely guessed from what is around, I don't pretend to understand them. While doing this, I noticed what I think is a mistake in a comment: --- gcc/real.c (revision 186761) +++ gcc/real.c (working copy) @@ -2814,11 +2814,11 @@ significand_size (enum machine_mode mode return 0; if (fmt-b == 10) { /* Return the size in bits of the largest binary value that can be -held by the decimal coefficient for this mode. This is one more +held by the decimal coefficient for this mode. This is one less than the number of bits required to hold the largest coefficient of this mode. */ double log2_10 = 3.3219281; return fmt-p * log2_10; } -- Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/forwprop-18.c === --- testsuite/gcc.dg/tree-ssa/forwprop-18.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/forwprop-18.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-forwprop1 } */ + +short f1(short n) +{ + return (double)n; +} +long f2(short n) +{ + return (double)n; +} + +long g1(long n) +{ + return (float)n; +} +short g2(long n) +{ + return (float)n; +} + +/* { dg-final { scan-tree-dump-times \\\(float\\\) 2 forwprop1 } } */ +/* { dg-final { scan-tree-dump-not \\\(double\\\) forwprop1 } } */ +/* { dg-final { cleanup-tree-dump forwprop1 } } */ Property changes on: testsuite/gcc.dg/tree-ssa/forwprop-18.c ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 186761) +++ tree-ssa-forwprop.c (working copy) @@ -2403,10 +2403,11 @@ combine_conversions (gimple_stmt_iterato { gimple stmt = gsi_stmt (*gsi); gimple def_stmt; tree op0, lhs; enum tree_code code = gimple_assign_rhs_code (stmt); + enum tree_code code2; gcc_checking_assert (CONVERT_EXPR_CODE_P (code) || code == FLOAT_EXPR || code == FIX_TRUNC_EXPR); @@ -2423,11 +2424,13 @@ combine_conversions (gimple_stmt_iterato def_stmt = SSA_NAME_DEF_STMT (op0); if (!is_gimple_assign (def_stmt)) return 0; - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + code2 = gimple_assign_rhs_code (def_stmt); + + if (CONVERT_EXPR_CODE_P (code2)) { tree defop0 = gimple_assign_rhs1 (def_stmt); tree type = TREE_TYPE (lhs); tree inside_type = TREE_TYPE (defop0); tree inter_type = TREE_TYPE (op0); @@ -2552,10 +2555,45 @@ combine_conversions (gimple_stmt_iterato gimple_assign_set_rhs_from_tree (gsi, tem); update_stmt (gsi_stmt (*gsi)); return 1; } } + else if (code == FIX_TRUNC_EXPR code2 == FLOAT_EXPR) +{ + tree defop0 = gimple_assign_rhs1 (def_stmt); + tree type = TREE_TYPE (lhs); + tree inside_type = TREE_TYPE (defop0); + tree inter_type = TREE_TYPE (op0); + int inside_int = INTEGRAL_TYPE_P
PATCH: PR debug/52857: DW_OP_GNU_regval_type is generated with INVALID_REGNUM
Hi, We may generate DW_OP_GNU_regval_type with INVALID_REGNUM. This patch adds an assert to make sure that we don't. OK for trunk? Thanks. H.J. --- 2012-04-06 H.J. Lu hongjiu...@intel.com PR debug/52857 * dwarf2out.c (dbx_reg_number): Assert return value != INVALID_REGNUM. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index ca88fc5..515a824 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10167,7 +10167,9 @@ dbx_reg_number (const_rtx rtl) } #endif - return DBX_REGISTER_NUMBER (regno); + regno = DBX_REGISTER_NUMBER (regno); + gcc_assert (regno != INVALID_REGNUM); + return regno; } /* Optionally add a DW_OP_piece term to a location description expression.
[PATCH 11/13] Fix va_start related location
In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being emitted because the relevant location was inside the var_start macro defined in a system header. It can even point to a token for a builtin macro there. This patch unwinds to the first token in real source code in that case. Tested on x86_64-unknown-linux-gnu against trunk. * builtins.c (fold_builtin_next_arg): Unwinds to the first location in real source code. --- gcc/builtins.c | 16 ++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index b47f218..ef90b25 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -12164,8 +12164,20 @@ fold_builtin_next_arg (tree exp, bool va_start_p) the default argument promotions, the behavior is undefined. */ else if (DECL_REGISTER (arg)) -warning (0, undefined behaviour when second parameter of - %va_start% is declared with %register% storage); + { + /* There is good chance the current input_location points +inside the definition of the va_start macro (perhaps on +the token for builtin) in a system header, so the warning +will not be emitted. Use the location in real source +code. */ + source_location current_location = + linemap_unwind_to_first_non_reserved_loc (line_table, input_location, + NULL); + warning_at (current_location, + 0, + undefined behaviour when second parameter of + %va_start% is declared with %register% storage); + } /* We want to verify the second parameter just once before the tree optimizers are run and then avoid keeping it in the tree, -- Dodji
Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion
On Wed, Apr 25, 2012 at 8:42 AM, Dodji Seketeli do...@redhat.com wrote: Gabriel Dos Reis g...@integrable-solutions.net writes: On Wed, Apr 11, 2012 at 3:46 AM, Dodji Seketeli do...@redhat.com wrote: There are various conversion related warnings that trigger on potentially dangerous uses of NULL (or __null). NULL is defined as a macro in a system header, so calling warning or warning_at on a virtual location of NULL yields no diagnostic. So the test accompanying this patch (as well as others), was failling when run with -ftrack-macro-expansion. I think it's necessary to use the location of NULL that is in the main source code (instead of, e.g, the spelling location that is in the system header where the macro is defined) in those cases. Note that for __null, we don't have the issue. I like the idea. However, you should write a separate function (get_null_ptr_cst_location?) that computes the location of the null pointer constant token and call it from where you need the location. OK. I have introduced such a new function and I gave it the slightly more generic name expansion_point_location_if_in_system_header as I suspect it can be useful for macros that are similar to NULL. I haven't spotted such macros (from test regressions) yet, though. Thanks. But a point of the suggestion was that we won't need the same comment/explanation duplicated over at least 3 places. Just one. All those three places deal exactly with one instance: null pointer constant. That deserves a function in and of itself, which is documented by the duplicated comments. Please make that change. Everything else is OK. thanks.
Re: [PATCH 11/13] Fix va_start related location
On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote: In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being emitted because the relevant location was inside the var_start macro defined in a system header. It can even point to a token for a builtin macro there. This patch unwinds to the first token in real source code in that case. While you are at it, could you also use a non-zero value for the second argument argument to warning_at? Tested on x86_64-unknown-linux-gnu against trunk. * builtins.c (fold_builtin_next_arg): Unwinds to the first location in real source code. --- gcc/builtins.c | 16 ++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index b47f218..ef90b25 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -12164,8 +12164,20 @@ fold_builtin_next_arg (tree exp, bool va_start_p) the default argument promotions, the behavior is undefined. */ else if (DECL_REGISTER (arg)) - warning (0, undefined behaviour when second parameter of - %va_start% is declared with %register% storage); + { + /* There is good chance the current input_location points + inside the definition of the va_start macro (perhaps on + the token for builtin) in a system header, so the warning + will not be emitted. Use the location in real source + code. */ + source_location current_location = + linemap_unwind_to_first_non_reserved_loc (line_table, input_location, + NULL); + warning_at (current_location, + 0, + undefined behaviour when second parameter of + %va_start% is declared with %register% storage); + } /* We want to verify the second parameter just once before the tree optimizers are run and then avoid keeping it in the tree, -- Dodji
Re: [PATCH 10/13] Fix location for static class members
On Wed, Apr 25, 2012 at 8:55 AM, Dodji Seketeli do...@redhat.com wrote: Consider the test case g++.dg/other/offsetof5.C: #include stddef.h struct A { char c; int i; }; int j = offsetof (A, i); // { dg-warning invalid access|offsetof } template typename T struct S { T h; T i; static const int j = offsetof (S, i); // { dg-warning invalid access|offsetof } }; int k = Sint::j; // { dg-message required from here } The second warning (that involves the instantiation of the S template) is not emitted when -ftrack-macro-expansion is on. This is because during the instantiation of the member j of S template, the location that is used for the warning is the one for the DECL j (set by instantiate_decl). And that location is inaccurately set to the locus of 'offsetof', which is a macro defined in a system header, so it's discarded by the diagnostics machinery. Note that when we reach the point where we emit the warning in build_class_member_access_expr offsetof expression has long been folded, so we cannot use e.g, the location of the ')' token that would have been in the source code. So I believe the location of 'j' is the best we can get at this point. The patch below sets the location of the DECL for 'j' to what I believe is its precise location; with that, the test case passes with and without -ftrack-macro-expansion. But I had to adjust g++.dg/template/sfinae6_neg.C for that. Tested on x86_64-unknown-linux-gnu against trunk. OK. gcc/cp * decl.c (grokdeclarator): Use the location carried by the declarator for the DECL of the static class member. gcc/testsuite/ * g++.dg/template/sfinae6_neg.C: Adjust. --- gcc/cp/decl.c | 3 ++- gcc/testsuite/g++.dg/template/sfinae6_neg.C | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 28c7cee..40818a3 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10267,7 +10267,8 @@ grokdeclarator (const cp_declarator *declarator, { /* C++ allows static class members. All other work for this is done by grokfield. */ - decl = build_lang_decl (VAR_DECL, unqualified_id, type); + decl = build_lang_decl_loc (declarator-id_loc, + VAR_DECL, unqualified_id, type); set_linkage_for_static_data_member (decl); /* Even if there is an in-class initialization, DECL is considered undefined until an out-of-class diff --git a/gcc/testsuite/g++.dg/template/sfinae6_neg.C b/gcc/testsuite/g++.dg/template/sfinae6_neg.C index d4be5dd..9b7bdfd1 100644 --- a/gcc/testsuite/g++.dg/template/sfinae6_neg.C +++ b/gcc/testsuite/g++.dg/template/sfinae6_neg.C @@ -21,9 +21,9 @@ no_type check_is_callable2(...); templatetypename F, typename T1, typename T2 = T1 struct is_callable2 { - static const bool value = + static const bool value = // { dg-error within this context } (sizeof(check_is_callable2(typeF(), typeT1(), typeT2())) - == sizeof(yes_type)); // { dg-error within this context } + == sizeof(yes_type)); }; #define JOIN( X, Y ) DO_JOIN( X, Y ) -- Dodji
Re: fix incorrect SRA transformation on non-integral VIEW_CONVERT argument
On Wed, Apr 25, 2012 at 3:37 PM, Olivier Hainque hain...@adacore.com wrote: Hello, For the PA(1).Z := 44; assignment in the attached Ada testcase, we observe the gcc 4.5 SRA pass performing an invalid transformation, turning: struct { system__pack_48__bits_48 OBJ; } D.1432; D.1432.OBJ = D.1435; T1b.F = VIEW_CONVERT_EXPRstruct pt__point(D.1432); into: SR.12_17 = D.1435_3; T1b.F = VIEW_CONVERT_EXPRstruct pt__point(SR.12_17); where we have var_decl D.1432 type record_type 0x77fb72a0 BLK size integer_cst 0x77fac960 constant 48 and var_decl SR.12 type integer_type system__pack_48__bits_48 size integer_cst 0x77ecd870 64 type integer_type system__pack_48__bits_48___UMT size integer_cst 64 At least the change in size is problematic because the conversion outcome might differ after the replacement, in particular on big-endian targets. mainline does something slightly different with the same effects eventually (same testcase failure on sparc-solaris). The attached patch is a proposal to address this at the point where we already check for VCE in the access creation process, disqualifying scalarization for a VCE argument of non-integral size. We (AdaCore) have been using this internally for a while now. I also checked that it fixes the observable problem on sparc, then bootstrapped and regtested on i686-suse-linux. OK to commit ? Thanks in advance for your feedback, Does this really cover all problematic cases? Ah, the existing code already rejects all VIEW_CONVERT_EXPRs down in the reference chain. I think much better would be to simply disallow any toplevel VIEW_CONVERT_EXPR of BLKmode, so, something like Index: gcc/tree-sra.c === --- gcc/tree-sra.c (revision 186817) +++ gcc/tree-sra.c (working copy) @@ -1001,9 +1001,10 @@ build_access_from_expr_1 (tree expr, gim /* We need to dive through V_C_Es in order to get the size of its parameter and not the result type. Ada produces such statements. We are also - capable of handling the topmost V_C_E but not any of those buried in other - handled components. */ - if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) + capable of handling the topmost V_C_E if it has a suitable mode but + not any of those buried in other handled components. */ + if (TREE_CODE (expr) == VIEW_CONVERT_EXPR + TYPE_MODE (TREE_TYPE (expr)) != BLKmode) expr = TREE_OPERAND (expr, 0); if (contains_view_convert_expr_p (expr)) Does that fix your problems, too? If so I prefer that. Thanks, Richard. Olivier 2012-04-25 Olivier Hainque hain...@adacore.com * tree-sra.c (create_access): Additional IN_VCE argument, telling if EXPR is VIEW_CONVERT_EXPR input. Disqualify base if access size is not that of an integer mode in this case. (build_access_from_expr_1): Adjust caller. testsuite/ * gnat.dg/sra_vce[_decls].adb: New testcase.
[PATCH 12/13] Adjust relevant test cases wrt -ftrack-macro-expansion=[0|2]
Even after all the patches I have already submitted for this -ftrack-macro-expansion business, some test cases where errors happens on tokens that are defined in macros see their output change in an incompatible way, when you run them with or without -ftrack-macro-expansion. I think this is expected, because the (spelling) locus inside the definition of the macro pointed to with -ftrack-macro-expansion is different from the locus of the expansion point of the macro pointed to without -ftrack-macro-expansion. In those cases this patch either adjusts the test case and forces it be run either with -ftrack-macro-expansion, or it just forces it to be run without -ftrack-macro-expansion. There are so many libstdc++ tests that were failing because of that benign issue that I preferred to just run them with -ftrack-macro-expansion diabled, after inspecting each of them to be sure there was nothing more serious underneath. I believe we can latter turn on -ftrack-macro-expansion there on a case by case basis. Boostrapped on x86_64-unknown-linux-gnu against trunk with and without -ftrack-macro-expansion turned on. gcc/testsuite/ * c-c++-common/tm/attrib-1.c: Force the test case to run without -ftrack-macro-expansion. * c-c++-common/warn-ommitted-condop.c: Likewise. * gcc.dg/assign-warn-1.c: Likewise. * gcc.dg/assign-warn-2.c: Likewise. * gcc.dg/attr-alloc_size.c: Likewise. * gcc.dg/builtin-stringop-chk-1.c: Likewise. * gcc.dg/builtin-stringop-chk-2.c: Likewise. * gcc.dg/builtin-strncat-chk-1.c: Likewise. * gcc.dg/c90-const-expr-9.c: Likewise. * gcc.dg/c99-const-expr-9.c: Likewise. * gcc.dg/cpp/direct2.c: Likewise. Adjust. * gcc.dg/cpp/direct2s.c: Likewise. * gcc/testsuite/gcc.dg/cpp/pr28709.c: Likewise. * gcc.dg/cpp/pragma-diagnostic-1.c: Likewise. * gcc.dg/dfp/composite-type.c: Likewise. * gcc.dg/uninit-6-O0.c: Adjust the test case and force it to run with -ftrack-macro-expansion * g++.dg/cpp0x/constexpr-ex3.C: Likewise. * g++.dg/cpp0x/constexpr-overflow.C: Likewise. * g++.dg/ext/cleanup-1.C: Likewise. * g++.dg/ext/gnu-inline-global-reject.C: Likewise. * g++.dg/template/sfinae10.C: Likewise. * g++.dg/tm/wrap-2.C: Likewise. * g++.dg/warn/Wconversion-real-integer.C: Likewise. * g++.dg/warn/Wsign-conversion.C: Likewise. * g++.dg/warn/multiple-overflow-warn-1.C: Likewise. * g++.old-deja/g++.mike/p10769b.C: Likewise. * g++.dg/warn/Wdouble-promotion.C: Adjust the test case and force it to run with -ftrack-macro-expansion. * libstdc++-v3/scripts/testsuite_flags.in: By default, run the test cases without -ftrack-macro-expansion. --- gcc/testsuite/c-c++-common/tm/attrib-1.c |2 +- gcc/testsuite/c-c++-common/warn-ommitted-condop.c |2 +- gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C |2 +- gcc/testsuite/g++.dg/cpp0x/constexpr-overflow.C|2 +- gcc/testsuite/g++.dg/ext/cleanup-1.C |2 +- .../g++.dg/ext/gnu-inline-global-reject.C |2 +- gcc/testsuite/g++.dg/template/sfinae10.C |2 +- gcc/testsuite/g++.dg/tm/wrap-2.C |2 +- .../g++.dg/warn/Wconversion-real-integer.C |2 +- gcc/testsuite/g++.dg/warn/Wdouble-promotion.C |6 +++--- gcc/testsuite/g++.dg/warn/Wsign-conversion.C |2 +- .../g++.dg/warn/multiple-overflow-warn-1.C |2 +- gcc/testsuite/g++.old-deja/g++.mike/p10769b.C |2 +- gcc/testsuite/gcc.dg/assign-warn-1.c |2 +- gcc/testsuite/gcc.dg/assign-warn-2.c |2 +- gcc/testsuite/gcc.dg/attr-alloc_size.c |2 +- gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c |2 +- gcc/testsuite/gcc.dg/builtin-stringop-chk-2.c |2 +- gcc/testsuite/gcc.dg/builtin-strncat-chk-1.c |2 +- gcc/testsuite/gcc.dg/c90-const-expr-9.c|2 +- gcc/testsuite/gcc.dg/c99-const-expr-9.c|2 +- gcc/testsuite/gcc.dg/cpp/direct2.c | 12 +++- gcc/testsuite/gcc.dg/cpp/direct2s.c|2 +- gcc/testsuite/gcc.dg/cpp/pr28709.c |8 +--- gcc/testsuite/gcc.dg/cpp/pragma-diagnostic-1.c |2 +- gcc/testsuite/gcc.dg/dfp/composite-type.c |2 +- gcc/testsuite/gcc.dg/uninit-6-O0.c |6 +++--- libstdc++-v3/scripts/testsuite_flags.in|2 +- 28 files changed, 42 insertions(+), 38 deletions(-) diff --git a/gcc/testsuite/c-c++-common/tm/attrib-1.c b/gcc/testsuite/c-c++-common/tm/attrib-1.c index 536aeb3..534fa0e 100644 --- a/gcc/testsuite/c-c++-common/tm/attrib-1.c +++ b/gcc/testsuite/c-c++-common/tm/attrib-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -fgnu-tm } */ +/* { dg-options -fgnu-tm -ftrack-macro-expansion=0 } */
[C++ Patch] PR 53096
Hi, this PR is about the resolution of core/1333 being unimplemented, thus we reject things like: struct foo { foo(foo) = default; // ERROR HERE }; (and this can be annoying, as explained by Eric on the reflector, for example when one has to resort to out-of-class defaulting which means non-trivial) Thus, the below elementary patch appears to work fine (I also double checked that in such cases the type remains trivial). It's all there is to it? Thanks, Paolo. / /cp 2012-04-25 Paolo Carlini paolo.carl...@oracle.com PR c++/53096 * class.c (check_bases_and_members): Implement core/1333, do not disallow defaulted in the class body special members. /testsuite 2012-04-25 Paolo Carlini paolo.carl...@oracle.com PR c++/53096 * g++.dg/cpp0x/defaulted35.C: New. * g++.dg/cpp0x/defaulted15.C: Adjust. Index: testsuite/g++.dg/cpp0x/defaulted15.C === --- testsuite/g++.dg/cpp0x/defaulted15.C(revision 186816) +++ testsuite/g++.dg/cpp0x/defaulted15.C(working copy) @@ -43,7 +43,7 @@ SA(__has_trivial_copy(E)); struct F { - F(F) = default; // { dg-error non-const } + F(F) = default; }; struct G: public F Index: testsuite/g++.dg/cpp0x/defaulted35.C === --- testsuite/g++.dg/cpp0x/defaulted35.C(revision 0) +++ testsuite/g++.dg/cpp0x/defaulted35.C(revision 0) @@ -0,0 +1,8 @@ +// PR c++/53096 +// { dg-options -std=c++0x } + +struct foo +{ + foo(foo) = default; + foo operator=(foo) = default; +}; Index: cp/class.c === --- cp/class.c (revision 186816) +++ cp/class.c (working copy) @@ -1,6 +1,7 @@ /* Functions related to building classes and their related objects. Copyright (C) 1987, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011 + 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011, + 2012 Free Software Foundation, Inc. Contributed by Michael Tiemann (tiem...@cygnus.com) @@ -5144,9 +5145,6 @@ check_bases_and_members (tree t) give the synthesis error. */ error (%q+D declared to take const reference, but implicit declaration would take non-const, fn); - else if (imp_const_p !fn_const_p) - error (%q+D declared to take non-const reference cannot be -defaulted in the class body, fn); } defaulted_late_check (fn); }
RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
Richard Guenther wrote: On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote: The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? I think the proper fix is to instead of for (i = 0; i N; i++) { input[i] = i; output[i] = 0; if (input[i] 256) abort (); } use for (i = 0; i N; i++) { input[i] = i; output[i] = 0; __asm__ volatile (); } to prevent vectorization of initialization loops. Actually, it looks like both arm and powerpc vectorize this initialization loop (line 31), because the control flow is hoisted outside the loop by previous optimizations. In addition, arm with neon vectorizes the second loop (line 39), but powerpc does not: 39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9; If this is the expected behaviour for powerpc, then the patch I proposed is still needed to fix the test failure on arm. Also, there would be no need to disable vectorization of the initialization loop, right? Ah, I thought that was what changed. Btw, the if () abort () tries to disable vectorization but does not succeed in doing so. Richard. Here is an updated patch. It prevents vectorization of the initialization loop, as Richard suggested, and updates the expected number of vectorized loops accordingly. This patch assumes that the second loop in main (line 39) should only be vectorized on arm with neon. The test passes for arm and powerpc. OK for trunk? Thank you, Greta gcc/testsuite/ChangeLog 2012-04-25 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization of initialization loop. (dg-final): Adjust the expected number of vectorized loops. diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c index d211ef9..aaa6cbb 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c @@ -32,8 +32,7 @@ int main (int argc, const char* argv[]) { input[i] = i; output[i] = 0; - if (input[i] 256) -abort (); + __asm__ volatile (); } for (i = 0; i N / 3; i++) @@ -52,7 +51,8 @@ int main (int argc, const char* argv[]) return 0; } -/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target vect_perm_byte } } } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { vect_perm_byte arm_neon_ok } } } } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { vect_perm_byte {! arm_neon_ok } } } } } */ /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { target vect_perm_byte } } } */ /* { dg-final { cleanup-tree-dump vect } } */
Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote: The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? I think the proper fix is to instead of for (i = 0; i N; i++) { input[i] = i; output[i] = 0; if (input[i] 256) abort (); } use for (i = 0; i N; i++) { input[i] = i; output[i] = 0; __asm__ volatile (); } to prevent vectorization of initialization loops. Actually, it looks like both arm and powerpc vectorize this initialization loop (line 31), because the control flow is hoisted outside the loop by previous optimizations. In addition, arm with neon vectorizes the second loop (line 39), but powerpc does not: 39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9; If this is the expected behaviour for powerpc, then the patch I proposed is still needed to fix the test failure on arm. Also, there would be no need to disable vectorization of the initialization loop, right? Ah, I thought that was what changed. Btw, the if () abort () tries to disable vectorization but does not succeed in doing so. Richard. Here is an updated patch. It prevents vectorization of the initialization loop, as Richard suggested, and updates the expected number of vectorized loops accordingly. This patch assumes that the second loop in main (line 39) should only be vectorized on arm with neon. The test passes for arm and powerpc. OK for trunk? If arm cannot handle 9 * i then the approrpiate condition would be vect_int_mult, not arm_neon_ok. Ok with that change. Richard. Thank you, Greta gcc/testsuite/ChangeLog 2012-04-25 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization of initialization loop. (dg-final): Adjust the expected number of vectorized loops.
Re: [PATCH, tree-optimization] Fix for PR 52868
On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote: Hi all! I'd like to post for review the patch which makes some costs adjusting in get_computation_cost_at routine in ivopts part. As mentioned in the PR changes also fix the bwaves regression from PR 52272. Also changes introduce no degradations on spec2000/2006 and EEMBC1.1/2.0(this was measured on Atom) on x86 Bootstrapped and regtested on x86. Ok to commit? I can't make sense of the patch and the comment does not help. + diff_cost = cost.cost; cost.cost /= avg_loop_niter (data-current_loop); + add_cost_val = add_cost (TYPE_MODE (ctype), data-speed); + /* Do cost correction if address cost is small enough + and difference cost is high enough. */ + if (address_p diff_cost add_cost_val + get_address_cost (symbol_present, var_present, + offset, ratio, cstepi, + TYPE_MODE (TREE_TYPE (utype)), + TYPE_ADDR_SPACE (TREE_TYPE (utype)), + speed, stmt_is_after_inc, + can_autoinc).cost = add_cost_val) + cost.cost += add_cost_val; Please explain more thoroughly. It also would seem to be better to add an extra case, as later code does For example for such code for (j=0; jM;j++) { for (i=0; iN; i++) sum += ptr-a[j][i] * ptr-c[k][i]; } we currently have following gimple on x86 target (I provided a piece of all phase output): # ivtmp.13_30 = PHI ivtmp.13_31(3), ivtmp.13_33(7) D.1748_34 = (void *) ivtmp.13_30; D.1722_7 = MEM[base: D.1748_34, offset: 0B]; D.1750_36 = ivtmp.27_28; D.1751_37 = D.1750_36 + ivtmp.13_30; -- we got non-invariant add which is not taken into account currently in cost model D.1752_38 = (void *) D.1751_37; D.1753_39 = (sizetype) k_8(D); D.1754_40 = D.1753_39 * 800; D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B]; ... With proposed fix we produce: # ivtmp.14_30 = PHI ivtmp.14_31(3), 0(7) D.1749_34 = (struct S *) ivtmp.25_28; D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B]; D.1750_35 = (sizetype) k_8(D); D.1751_36 = D.1750_35 * 800; D.1752_37 = ptr_6(D) + D.1751_36; D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: 16000B]; which is more effective on platforms where address cost is cheaper than cost of addition operation. That's basically what this adjustment is for. So comment in the source code now looks as follows /* Do cost correction when address difference produces additional non-invariant add operation which is less profitable if address cost is cheaper than cost of add. */ /* Now the computation is in shape symbol + var1 + const + ratio * var2. (symbol/var1/const parts may be omitted). If we are looking for an address, find the cost of addressing this. */ if (address_p) return add_costs (cost, get_address_cost (symbol_present, var_present, offset, ratio, cstepi, TYPE_MODE (TREE_TYPE (utype)), TYPE_ADDR_SPACE (TREE_TYPE (utype)), speed, stmt_is_after_inc, can_autoinc)); thus refactoring the code a bit would make it possible to CSE the get_address_cost call and eventually make it clearer what the code does. 'offset' could be changed beetween two calls of get_address_cost so such refactoring looks useless. New patch (only the comment was changed) attached. Changelog was changed as well. Richard. Changelog: 2012-04-26 Yuri Rumyantsev yuri.rumyant...@intel.com * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust cost model when address difference produces additional non-invariant add operation which is less profitable if address cost is cheaper than cost of add. Thanks, Igor ivopts1.patch Description: Binary data
[PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.
Hopefully closing the series, this patch switches the compiler to -ftrack-macro-expansion=2 by default. Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk. libcpp/ * init.c (cpp_create_reader): Switch -ftrack-macro-expansion=2 on by default. Add comments. gcc/docs/ * cppopts.texi: Adjust for enabling -ftrack-macro-expansion=2 by default. --- gcc/doc/cppopts.texi |2 ++ libcpp/init.c|6 ++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/gcc/doc/cppopts.texi b/gcc/doc/cppopts.texi index 205d870..27b1095 100644 --- a/gcc/doc/cppopts.texi +++ b/gcc/doc/cppopts.texi @@ -617,6 +617,8 @@ tokens locations completely. This value is the most memory hungry. When this option is given no argument, the default parameter value is @samp{2}. +Note that -ftrack-macro-expansion=2 is activated by default. + @item -fexec-charset=@var{charset} @opindex fexec-charset @cindex character set, execution diff --git a/libcpp/init.c b/libcpp/init.c index 5fa82ca..ba1e9cd 100644 --- a/libcpp/init.c +++ b/libcpp/init.c @@ -174,6 +174,12 @@ cpp_create_reader (enum c_lang lang, hash_table *table, CPP_OPTION (pfile, warn_dollars) = 1; CPP_OPTION (pfile, warn_variadic_macros) = 1; CPP_OPTION (pfile, warn_builtin_macro_redefined) = 1; + /* By default, track locations of tokens resulting from macro + expansion. The '2' means, track the locations with the highest + accuracy. Read the comments for struct + cpp_options::track_macro_expansion to learn about the other + values. */ + CPP_OPTION (pfile, track_macro_expansion) = 2; CPP_OPTION (pfile, warn_normalize) = normalized_C; /* Default CPP arithmetic to something sensible for the host for the -- Dodji
Re: [PATCH, tree-optimization] Fix for PR 52868
On Wed, Apr 25, 2012 at 4:32 PM, Igor Zamyatin izamya...@gmail.com wrote: On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin izamya...@gmail.com wrote: Hi all! I'd like to post for review the patch which makes some costs adjusting in get_computation_cost_at routine in ivopts part. As mentioned in the PR changes also fix the bwaves regression from PR 52272. Also changes introduce no degradations on spec2000/2006 and EEMBC1.1/2.0(this was measured on Atom) on x86 Bootstrapped and regtested on x86. Ok to commit? I can't make sense of the patch and the comment does not help. + diff_cost = cost.cost; cost.cost /= avg_loop_niter (data-current_loop); + add_cost_val = add_cost (TYPE_MODE (ctype), data-speed); + /* Do cost correction if address cost is small enough + and difference cost is high enough. */ + if (address_p diff_cost add_cost_val + get_address_cost (symbol_present, var_present, + offset, ratio, cstepi, + TYPE_MODE (TREE_TYPE (utype)), + TYPE_ADDR_SPACE (TREE_TYPE (utype)), + speed, stmt_is_after_inc, + can_autoinc).cost = add_cost_val) + cost.cost += add_cost_val; Please explain more thoroughly. It also would seem to be better to add an extra case, as later code does For example for such code for (j=0; jM;j++) { for (i=0; iN; i++) sum += ptr-a[j][i] * ptr-c[k][i]; } we currently have following gimple on x86 target (I provided a piece of all phase output): # ivtmp.13_30 = PHI ivtmp.13_31(3), ivtmp.13_33(7) D.1748_34 = (void *) ivtmp.13_30; D.1722_7 = MEM[base: D.1748_34, offset: 0B]; D.1750_36 = ivtmp.27_28; D.1751_37 = D.1750_36 + ivtmp.13_30; -- we got non-invariant add which is not taken into account currently in cost model D.1752_38 = (void *) D.1751_37; D.1753_39 = (sizetype) k_8(D); D.1754_40 = D.1753_39 * 800; D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B]; ... With proposed fix we produce: # ivtmp.14_30 = PHI ivtmp.14_31(3), 0(7) D.1749_34 = (struct S *) ivtmp.25_28; D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B]; D.1750_35 = (sizetype) k_8(D); D.1751_36 = D.1750_35 * 800; D.1752_37 = ptr_6(D) + D.1751_36; D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: 16000B]; which is more effective on platforms where address cost is cheaper than cost of addition operation. That's basically what this adjustment is for. If we generally miss to account for the add then why is the adjustment conditional on diff_cost add_cost and address_cost = add_cost? Is this a new heuristic or a fix for not accurately computing the cost for the stmts we generate? Richard. So comment in the source code now looks as follows /* Do cost correction when address difference produces additional non-invariant add operation which is less profitable if address cost is cheaper than cost of add. */ /* Now the computation is in shape symbol + var1 + const + ratio * var2. (symbol/var1/const parts may be omitted). If we are looking for an address, find the cost of addressing this. */ if (address_p) return add_costs (cost, get_address_cost (symbol_present, var_present, offset, ratio, cstepi, TYPE_MODE (TREE_TYPE (utype)), TYPE_ADDR_SPACE (TREE_TYPE (utype)), speed, stmt_is_after_inc, can_autoinc)); thus refactoring the code a bit would make it possible to CSE the get_address_cost call and eventually make it clearer what the code does. 'offset' could be changed beetween two calls of get_address_cost so such refactoring looks useless. New patch (only the comment was changed) attached. Changelog was changed as well. Richard. Changelog: 2012-04-26 Yuri Rumyantsev yuri.rumyant...@intel.com * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust cost model when address difference produces additional non-invariant add operation which is less profitable if address cost is cheaper than cost of add. Thanks, Igor
Re: [C] improve missing initializers diagnostics
On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: This patch improves missing initializers diagnostics. From: pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers] .h = {1}, ^ pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’) [-Wmissing-field-initializers] .h = {1}, ^ to: pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct h’ [-Wmissing-field-initializers] .h = {1}, ^ pr36446.c:3:7: note: ‘b’ declared here int b; ^ Bootstrapped/regression tested. OK? 2012-04-19 Manuel López-Ibáñez m...@gcc.gnu.org * c-typeck.c (pop_init_level): Improve diagnostics. testsuite/ * gcc.dg/m-un-2.c: Update. * gcc.dg/20011021-1.c: Update. On Linux/x86, revision 186808 gave me: FAIL: gcc.dg/20011021-1.c (test for warnings, line 34) FAIL: gcc.dg/20011021-1.c (test for warnings, line 41) FAIL: gcc.dg/20011021-1.c (test for warnings, line 44) FAIL: gcc.dg/20011021-1.c (test for excess errors) FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27) FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30) FAIL: gcc.dg/m-un-2.c (test for excess errors) FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 14) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 7) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 8) FAIL: gcc.dg/missing-field-init-2.c (test for excess errors) Revision 186806 is OK. -- H.J.
Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion
Gabriel Dos Reis g...@integrable-solutions.net writes: Thanks. But a point of the suggestion was that we won't need the same comment/explanation duplicated over at least 3 places. Just one. All those three places deal exactly with one instance: null pointer constant. That deserves a function in and of itself, which is documented by the duplicated comments. Please make that change. Everything else is OK. thanks. I am sorry for the round trips. Please find below a patch udpated accordingly. I am bootstrapping the whole patch set, but the impacted files of this patch have built fine so far. Thanks. gcc/ * input.h (expansion_point_location_if_in_system_header): Declare new function. * input.c (expansion_point_location_if_in_system_header): Define it. gcc/cp/ * call.c (conversion_null_warnings): Use the new expansion_point_location_if_in_system_header. * cvt.c (build_expr_type_conversion): Likewise. * typeck.c (cp_build_binary_op): Likewise. gcc/testsuite/ * g++.dg/warn/Wconversion-null-2.C: Add testing for __null, alongside the previous testing for NULL. --- gcc/cp/call.c |7 - gcc/cp/cvt.c |9 +- gcc/cp/typeck.c|9 -- gcc/input.c| 20 +++ gcc/input.h|1 + gcc/testsuite/g++.dg/warn/Wconversion-null-2.C | 31 +++- 6 files changed, 69 insertions(+), 8 deletions(-) diff --git a/gcc/cp/call.c b/gcc/cp/call.c index f9a7f08..85e45c2 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -5598,12 +5598,15 @@ conversion_null_warnings (tree totype, tree expr, tree fn, int argnum) if (expr == null_node TREE_CODE (totype) != BOOLEAN_TYPE ARITHMETIC_TYPE_P (totype)) { + source_location loc = + expansion_point_location_if_in_system_header (input_location); + if (fn) - warning_at (input_location, OPT_Wconversion_null, + warning_at (loc, OPT_Wconversion_null, passing NULL to non-pointer argument %P of %qD, argnum, fn); else - warning_at (input_location, OPT_Wconversion_null, + warning_at (loc, OPT_Wconversion_null, converting to non-pointer type %qT from NULL, totype); } diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c index 3dab372..49ba38a 100644 --- a/gcc/cp/cvt.c +++ b/gcc/cp/cvt.c @@ -1472,8 +1472,13 @@ build_expr_type_conversion (int desires, tree expr, bool complain) if (expr == null_node (desires WANT_INT) !(desires WANT_NULL)) -warning_at (input_location, OPT_Wconversion_null, - converting NULL to non-pointer type); +{ + source_location loc = + expansion_point_location_if_in_system_header (input_location); + + warning_at (loc, OPT_Wconversion_null, + converting NULL to non-pointer type); +} basetype = TREE_TYPE (expr); diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index fb2f1bc..52d264b 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -3838,9 +3838,12 @@ cp_build_binary_op (location_t location, || (!null_ptr_cst_p (orig_op1) !TYPE_PTR_P (type1) !TYPE_PTR_TO_MEMBER_P (type1))) (complain tf_warning)) -/* Some sort of arithmetic operation involving NULL was - performed. */ -warning (OPT_Wpointer_arith, NULL used in arithmetic); +{ + source_location loc = + expansion_point_location_if_in_system_header (input_location); + + warning_at (loc, OPT_Wpointer_arith, NULL used in arithmetic); +} switch (code) { diff --git a/gcc/input.c b/gcc/input.c index 260be7e..5f14489 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -162,6 +162,26 @@ expand_location_to_spelling_point (source_location loc) return expand_location_1 (loc, /*expansion_piont_p=*/false); } +/* If LOCATION is in a sytem header and if it's a virtual location for + a token coming from the expansion of a macro M, unwind it to the + location of the expansion point of M. Otherwise, just return + LOCATION. + + This is used for instance when we want to emit diagnostics about a + token that is located in a macro that is itself defined in a system + header -- e.g for the NULL macro. In that case, if LOCATION is + passed to diagnostics emitting functions like warning_at as is, no + diagnostic won't be emitted. */ + +source_location +expansion_point_location_if_in_system_header (source_location location) +{ + if (in_system_header_at (location)) +location = linemap_resolve_location (line_table, location, +LRK_MACRO_EXPANSION_POINT, +NULL); + return location; +} #define ONE_K 1024 #define ONE_M (ONE_K * ONE_K) diff --git a/gcc/input.h b/gcc/input.h
RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: 25 April 2012 15:32 To: Greta Yorsh Cc: gcc-patches@gcc.gnu.org; mikest...@comcast.net; r...@cebitec.uni- bielefeld.de; Richard Earnshaw Subject: Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp- perm-8.c On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote: The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? I think the proper fix is to instead of for (i = 0; i N; i++) { input[i] = i; output[i] = 0; if (input[i] 256) abort (); } use for (i = 0; i N; i++) { input[i] = i; output[i] = 0; __asm__ volatile (); } to prevent vectorization of initialization loops. Actually, it looks like both arm and powerpc vectorize this initialization loop (line 31), because the control flow is hoisted outside the loop by previous optimizations. In addition, arm with neon vectorizes the second loop (line 39), but powerpc does not: 39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9; If this is the expected behaviour for powerpc, then the patch I proposed is still needed to fix the test failure on arm. Also, there would be no need to disable vectorization of the initialization loop, right? Ah, I thought that was what changed. Btw, the if () abort () tries to disable vectorization but does not succeed in doing so. Richard. Here is an updated patch. It prevents vectorization of the initialization loop, as Richard suggested, and updates the expected number of vectorized loops accordingly. This patch assumes that the second loop in main (line 39) should only be vectorized on arm with neon. The test passes for arm and powerpc. OK for trunk? If arm cannot handle 9 * i then the approrpiate condition would be vect_int_mult, not arm_neon_ok. It's the other way around: arm can handle this multiplication, but powerpc does not handle it for some reason. Thank you, Greta Ok with that change. Richard. Thank you, Greta gcc/testsuite/ChangeLog 2012-04-25 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization of initialization loop. (dg-final): Adjust the expected number of vectorized loops.
Re: Symbol table 13/many: reachability code rewrite
Hi, the fortran problem is caused by fact that fortran does nested funtions that are static constructors. I did not really think of that case in cgraph construction code. Fixed thus. Honza PR middle-end/53089 * cgraphunit.c (cgraph_process_new_functions): Do not enqueue new functions here. (referred_to_p): Move ahead in file to avoid forward declaration. (cgraph_finalize_function): Finalize them here. * symtab.c (dump_symtab): Dump ctors and dtors. Index: cgraphunit.c === *** cgraphunit.c(revision 186770) --- cgraphunit.c(working copy) *** cgraph_process_new_functions (void) *** 243,249 cgraph_finalize_function (fndecl, false); output = true; cgraph_call_function_insertion_hooks (node); - enqueue_node ((symtab_node) node); break; case CGRAPH_STATE_IPA: --- 243,248 *** cgraph_reset_node (struct cgraph_node *n *** 320,325 --- 319,340 cgraph_node_remove_callees (node); } + /* Return true when there are references to NODE. */ + + static bool + referred_to_p (symtab_node node) + { + int i; + struct ipa_ref *ref; + + for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); +i++) + return true; + if (symtab_function_p (node) cgraph (node)-callers) + return true; + return false; + } + /* DECL has been parsed. Take it, queue it, compile it at the whim of the logic in effect. If NESTED is true, then our caller cannot stand to have the garbage collector run at the moment. We would need to either create *** cgraph_finalize_function (tree decl, boo *** 372,377 --- 387,397 if (!nested) ggc_collect (); + + if (cgraph_state == CGRAPH_STATE_CONSTRUCTION +(cgraph_decide_is_function_needed (node, decl) + || referred_to_p ((symtab_node)node))) + enqueue_node ((symtab_node)node); } /* Add the function FNDECL to the call graph. *** process_function_and_variable_attributes *** 1114,1135 } } - /* Return true when there are references to NODE. */ - - static bool - referred_to_p (symtab_node node) - { - int i; - struct ipa_ref *ref; - - for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); -i++) - return true; - if (symtab_function_p (node) cgraph (node)-callers) - return true; - return false; - } - /* Mark DECL as finalized. By finalizing the declaration, frontend instruct the middle end to output the variable to asm file, if needed or externally visible. */ --- 1134,1139 Index: symtab.c === *** symtab.c(revision 186770) --- symtab.c(working copy) *** dump_symtab_base (FILE *f, symtab_node n *** 414,419 --- 414,426 fprintf (f, virtual); if (DECL_ARTIFICIAL (node-symbol.decl)) fprintf (f, artificial); + if (TREE_CODE (node-symbol.decl) == FUNCTION_DECL) + { + if (DECL_STATIC_CONSTRUCTOR (node-symbol.decl)) + fprintf (f, constructor); + if (DECL_STATIC_DESTRUCTOR (node-symbol.decl)) + fprintf (f, destructor); + } fprintf (f, \n); if (node-symbol.same_comdat_group)
[Ada] Atomic protected types
This patch cleans up the implementation of atomic protected types. No changes in behavior. Tested on x86_64-pc-linux-gnu, committed on trunk 2012-04-25 Hristian Kirtchev kirtc...@adacore.com * exp_ch9.adb: Rename Lock_Free_Sub_Type to Lock_Free_Subprogram. Remove type Subprogram_Id. Rename LF_Sub_Table to Lock_Free_Subprogram_Table. (Allow_Lock_Free_Implementation): Renamed to Allows_Lock_Free_Implementation. Update the comment on lock-free restrictions. Code clean up and restructuring. (Build_Lock_Free_Protected_Subprogram_Body): Update the profile and related comments. Code clean up and restructuring. (Build_Lock_Free_Unprotected_Subprogram_Body): Update the profile and related comments. Code clean up and restructuring. (Comp_Of): Removed. Index: exp_ch9.adb === --- exp_ch9.adb (revision 186823) +++ exp_ch9.adb (working copy) @@ -81,29 +81,24 @@ -- Lock Free Data Structure -- -- - -- A data structure used for the Lock Free (LF) implementation of protected - -- objects. Since a protected subprogram can only access a single protected - -- component in the LF implementation, this structure stores each protected - -- subprogram and its accessed protected component when the protected - -- object allows the LF implementation. - - type Lock_Free_Sub_Type is record + type Lock_Free_Subprogram is record Sub_Body : Node_Id; Comp_Id : Entity_Id; end record; + -- This data structure and its fields must be documented, ALL global + -- data structures must be documented. We never rely on guessing what + -- things mean from their names. - subtype Subprogram_Id is Nat; + -- The following table establishes a relation between a subprogram body and + -- an unique protected component referenced in this body. - -- The following table used for the Lock Free implementation of protected - -- objects maps Lock_Free_Sub_Type to Subprogram_Id. - - package LF_Sub_Table is new Table.Table ( - Table_Component_Type = Lock_Free_Sub_Type, - Table_Index_Type = Subprogram_Id, + package Lock_Free_Subprogram_Table is new Table.Table ( + Table_Component_Type = Lock_Free_Subprogram, + Table_Index_Type = Nat, Table_Low_Bound = 1, Table_Initial= 5, Table_Increment = 5, - Table_Name = LF_Sub_Table); + Table_Name = Lock_Free_Subprogram_Table); --- -- Local Subprograms -- @@ -139,9 +134,19 @@ --Decls is the list of declarations to be enhanced. --Ent is the entity for the original entry body. - function Allow_Lock_Free_Implementation (N : Node_Id) return Boolean; - -- Given a protected body N, return True if N permits a lock free - -- implementation. + function Allows_Lock_Free_Implementation (N : Node_Id) return Boolean; + -- Given a protected body N, return True if N satisfies the following list + -- of lock-free restrictions: + -- + --1) Protected type + -- May not contain entries + -- May contain only scalar components + -- Component types must support atomic compare and exchange + -- + --2) Protected subprograms + -- May not have side effects + -- May not contain loop statements or procedure calls + -- Function calls and attribute references must be static function Build_Accept_Body (Astat : Node_Id) return Node_Id; -- Transform accept statement into a block with added exception handler. @@ -189,20 +194,20 @@ -- Build subprogram declaration for previous one function Build_Lock_Free_Protected_Subprogram_Body - (N : Node_Id; - Pid : Node_Id; - N_Op_Spec : Node_Id) return Node_Id; - -- This function is used to construct the lock free version of a protected - -- subprogram when the protected type denoted by Pid allows the lock free - -- implementation. It only contains a call to the unprotected version of - -- the subprogram body. + (N : Node_Id; + Prot_Typ: Node_Id; + Unprot_Spec : Node_Id) return Node_Id; + -- N denotes a subprogram body of protected type Prot_Typ. Unprot_Spec is + -- the subprogram specification of the unprotected version of N. Transform + -- N such that it invokes the unprotected version of the body. function Build_Lock_Free_Unprotected_Subprogram_Body - (N : Node_Id; - Pid : Node_Id) return Node_Id; - -- This function is used to construct the lock free version of an - -- unprotected subprogram when the protected type denoted by Pid allows the - -- lock free implementation. + (N: Node_Id; + Prot_Typ : Node_Id) return Node_Id; + -- N denotes a subprogram body of protected type Prot_Typ. Build a version
Re: [PATCH 11/13] Fix va_start related location
Gabriel Dos Reis g...@integrable-solutions.net writes: On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote: In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being emitted because the relevant location was inside the var_start macro defined in a system header. It can even point to a token for a builtin macro there. This patch unwinds to the first token in real source code in that case. While you are at it, could you also use a non-zero value for the second argument argument to warning_at? I couldn't find any obvious value for it. I am thinking maybe it would make sense to introduction a new -Wva_start to warn about possible dangerous uses of the va_start macro and use that as the second argument for the relevant warnings emitted by fold_builtin_next_arg. What do you think? In any case, this topic seems logically unrelated to the purpose of enable -ftrack-macro-expansion by default, so IMHO it would be better to do this in a separate self contain patch. Among other things, this would ease possible future back-ports. Would you agree? Thanks. -- Dodji
Re: [PATCH 08/11] Make conversion warnings work on NULL with -ftrack-macro-expansion
OK, thanks! -- Gaby
Re: [PATCH 11/13] Fix va_start related location
On Wed, Apr 25, 2012 at 10:20 AM, Dodji Seketeli do...@redhat.com wrote: Gabriel Dos Reis g...@integrable-solutions.net writes: On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli do...@redhat.com wrote: In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being emitted because the relevant location was inside the var_start macro defined in a system header. It can even point to a token for a builtin macro there. This patch unwinds to the first token in real source code in that case. While you are at it, could you also use a non-zero value for the second argument argument to warning_at? I couldn't find any obvious value for it. I am thinking maybe it would make sense to introduction a new -Wva_start to warn about possible dangerous uses of the va_start macro and use that as the second argument for the relevant warnings emitted by fold_builtin_next_arg. What do you think? or -Wvarargs? In any case, this topic seems logically unrelated to the purpose of enable -ftrack-macro-expansion by default, so IMHO it would be better to do this in a separate self contain patch. Among other things, this would ease possible future back-ports. Would you agree? OK. -- Gaby
Re: [PATCH 12/13] Adjust relevant test cases wrt -ftrack-macro-expansion=[0|2]
On Wed, Apr 25, 2012 at 9:16 AM, Dodji Seketeli do...@redhat.com wrote: Even after all the patches I have already submitted for this -ftrack-macro-expansion business, some test cases where errors happens on tokens that are defined in macros see their output change in an incompatible way, when you run them with or without -ftrack-macro-expansion. I think this is expected, because the (spelling) locus inside the definition of the macro pointed to with -ftrack-macro-expansion is different from the locus of the expansion point of the macro pointed to without -ftrack-macro-expansion. In those cases this patch either adjusts the test case and forces it be run either with -ftrack-macro-expansion, or it just forces it to be run without -ftrack-macro-expansion. There are so many libstdc++ tests that were failing because of that benign issue that I preferred to just run them with -ftrack-macro-expansion diabled, after inspecting each of them to be sure there was nothing more serious underneath. I believe we can latter turn on -ftrack-macro-expansion there on a case by case basis. Boostrapped on x86_64-unknown-linux-gnu against trunk with and without -ftrack-macro-expansion turned on. OK. -- Gaby
Re: [PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.
On Wed, Apr 25, 2012 at 9:33 AM, Dodji Seketeli do...@redhat.com wrote: Hopefully closing the series, this patch switches the compiler to -ftrack-macro-expansion=2 by default. Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk. Makes sense to me; Tom? -- Gaby
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote: * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. You should add documentation for these new PARAMs to doc/invoke.texi. Ok, will do. I don't really like these new PARAMs: All other loop PARAMs are based on the number of insns in a loop, or the maximum number of times a transformation is applied. Your new PARAM_MIN_ITER_UNROLL_WITH_BRANCHES is completely different, because it is a number of iterations. This makes the PARAM value feel even more arbitrary than all the other PARAMs to some extend already do... That's true, they are different in what they are checking than some of the other loop unrolling params. But I need some threshold for determining when a loop is hot enough that its unrolled branches will be executed frequently enough to train the branch predictor and also where the impact on the branch prediction in the outer region of code is less likely to matter overall. The defaults were chosen so that the new unrolling limit should only kick in for loops that are not iterating much anyway, and where the outer hot region has quite a few branches. (The only other PARAM like that is PARAM_ALIGN_LOOP_ITERATIONS, and its default value also looks quite arbitrary...) Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; So you only detect single-set FP operations where some insns stores in a float mode. It wouldn't be very difficult to just walk over all sets and look for float modes. This is also necessary e.g. for x87 sincos, as well as various insns on other machines. Your comments say you don't want to apply the new heuristic to loops containing FP operations because these loops usually benefit more from unrolling. Therefore, you should IMHO look at non-single_set() insns also here, to avoid applying the heuristics to loops containing non-single_set() FP insns. Ok, thanks for the suggestion, I will expand this for the next version of the patch. + } + } + } + free (body); + return false; +} Nit: You are calling loop_has_call and loop_has_FP_comp() twice on each loop (first for constant iterations and next for runtime iterations), I don't think that is true for loop_has_FP_comp, since it is called in decide_unroll_constant_iterations and decide_unroll_runtime_iterations just after we have checked if the loop has a constant number of iterations, and returned early depending on the result of this check and which routine we are in. So each inner loop will only reach the call to loop_has_FP_comp in one of these routines. In the case of loop_has_call, which is only called for a hot outer loop, it is true we could invoke that more than once. That would happen if a hot outer loop contains more than one nested inner loop with a small iteration count and branches that we attempt to unroll (it is called at most once per inner loop that we attempt to unroll). I thought about attempting to cache this info for the outer loop in the structure returned by get_simple_loop_desc() as you also suggest below. I was concerned that currently this returns an niter_desc structure which holds info about the # iterations, and this information doesn't fit into that category. However, I could go ahead and add it to that structure and perhaps rename the structure to something more generic like loop_desc. What do you think? The other issue is that we don't need this
Re: [PATCH 05/11] Make expand_location resolve to locus in main source file
I have re-based my tree on top of a recent tree that incorporates the changes for caret diagnostics and I had to update this patch accordingly. Here is its new version on trunk of today. It basically updates the new diagnostic_show_locus function to point to spelling locations. The patch does pass bootstrap with -ftrack-macro-expansion turned off, and passes bootstrap with the full series with -ftrack-macro-expansion turned on. gcc/ * input.c (expand_location_1): New. Takes a parameter to choose whether to resolve the location to spelling or expansion point. Was factorized from ... (expand_location): ... here. (expand_location_to_spelling_point): New. Implemented in terms of expand_location_1. * diagnostic.c (diagnostic_build_prefix): Use the new expand_location_to_spelling_point instead of expand_location. --- gcc/diagnostic.c |4 ++-- gcc/input.c | 40 +++- gcc/input.h |9 + 3 files changed, 46 insertions(+), 7 deletions(-) diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 4496803..729e865 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -214,7 +214,7 @@ diagnostic_build_prefix (diagnostic_context *context, must-not-happen }; const char *text = _(diagnostic_kind_text[diagnostic-kind]); - expanded_location s = expand_location (diagnostic-location); + expanded_location s = expand_location_to_spelling_point (diagnostic-location); if (diagnostic-override_column) s.column = diagnostic-override_column; gcc_assert (diagnostic-kind DK_LAST_DIAGNOSTIC_KIND); @@ -266,7 +266,7 @@ diagnostic_show_locus (diagnostic_context * context, || diagnostic-location = BUILTINS_LOCATION) return; - s = expand_location(diagnostic-location); + s = expand_location_to_spelling_point (diagnostic-location); line = location_get_source_line (s); if (line == NULL) return; diff --git a/gcc/input.c b/gcc/input.c index bf5fe48..e9ba301 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -32,16 +32,22 @@ struct line_maps *line_table; /* Expand the source location LOC into a human readable location. If LOC resolves to a builtin location, the file name of the readable - location is set to the string built-in. */ - -expanded_location -expand_location (source_location loc) + location is set to the string built-in. If EXPANSION_POINT_P is + TRUE and LOC is virtual, then it is resolved to the expansion + point of the involved macro. Otherwise, it is resolved to the + spelling location of the token. */ + +static expanded_location +expand_location_1 (source_location loc, + bool expansion_point_p) { expanded_location xloc; const struct line_map *map; loc = linemap_resolve_location (line_table, loc, - LRK_SPELLING_LOCATION, map); + expansion_point_p + ? LRK_MACRO_EXPANSION_POINT + : LRK_SPELLING_LOCATION, map); xloc = linemap_expand_location (line_table, map, loc); if (loc = BUILTINS_LOCATION) @@ -109,6 +115,30 @@ location_get_source_line(expanded_location xloc) return buffer; } +/* Expand the source location LOC into a human readable location. If + LOC is virtual, it resolves to the expansion point of the involved + macro. If LOC resolves to a builtin location, the file name of the + readable location is set to the string built-in. */ + +expanded_location +expand_location (source_location loc) +{ + return expand_location_1 (loc, /*expansion_point_p=*/true); +} + +/* Expand the source location LOC into a human readable location. If + LOC is virtual, it resolves to the expansion location of the + relevant macro. If LOC resolves to a builtin location, the file + name of the readable location is set to the string + built-in. */ + +expanded_location +expand_location_to_spelling_point (source_location loc) +{ + return expand_location_1 (loc, /*expansion_piont_p=*/false); +} + + #define ONE_K 1024 #define ONE_M (ONE_K * ONE_K) diff --git a/gcc/input.h b/gcc/input.h index 4b15222..f755cdf 100644 --- a/gcc/input.h +++ b/gcc/input.h @@ -39,6 +39,7 @@ extern char builtins_location_check[(BUILTINS_LOCATION extern expanded_location expand_location (source_location); extern const char * location_get_source_line(expanded_location xloc); +extern expanded_location expand_location_to_spelling_point (source_location); /* Historically GCC used location_t, while cpp used source_location. This could be removed but it hardly seems worth the effort. */ @@ -50,6 +51,14 @@ extern location_t input_location; #define LOCATION_LINE(LOC) ((expand_location (LOC)).line) #define LOCATION_COLUMN(LOC)((expand_location (LOC)).column) +#define EXPANSION_POINT_LOCATION_FILE(LOC) \ + ((expand_location_to_expansion_point (LOC)).file) +#define
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen a...@firstfloor.org wrote: tejohn...@google.com (Teresa Johnson) writes: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? One problem with any unrolling heuristics is currently that gcc has both the tree level and the rtl level unroller. The tree one is even on at -O3. So if you tweak anything for one you have to affect both, otherwise the other may still do the wrong thing(tm). It's true that the tree level unroller could benefit from taking branch mispredict effects into account as well. But since that is only performing full unrolling of constant trip count loops I suspect that there will be additional things that need to be considered, such as whether the full unrolling enables better optimization in the surrounding code/loop. Hence I wanted to tackle that later. For some other tweaks I looked into a shared cost model some time ago. May be still needed. Yes, I think it would be good to unify some of the profitability checks between the two unrolling passes, or at least between the tree and rtl level full unrollers/peelers. Teresa -Andi -- a...@linux.intel.com -- Speaking for myself only -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Wed, Apr 25, 2012 at 2:03 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); You repeatedly do this and walk over all blocks. Please think about compile-time issues when writing code. See my response to Steven where I address this issue and mention some approaches to reducing the loop body walks. Please let me know if you have any feedback on that. This all looks sort-of target specific to me and I don't see why this very specialized patch is a good idea when unrolling does a very poor job deciding what and how much to unroll generally. I am hoping this will improve upon the job the unroller does in deciding when/how to unroll. I didn't think that it was too target specific as branch mispredictions could affect many targets. Note that there are already some much more basic checks for the branch misprediction effects in both decide_peel_simple and decide_unroll_stupid, for example: /* Do not simply peel loops with branches inside -- it increases number of mispredicts. */ if (num_loop_branches (loop) 1) { if (dump_file) fprintf (dump_file, ;; Not peeling, contains branches\n); return; } It is possible that both of these checks could be made less aggressive using the approach in this patch, which affects many more loops and hence I am trying to add some more intelligent checking of whether branch mispredicts might be triggered. Thanks, Teresa Richard. + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) + return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) + { + if (EDGE_COUNT (body[i]-succs) = 2) + { + /* If this block is
Re: [RFA] Update config.sub to 2012-04-18 version.
ChangeLog: * config.sub: Update to 2012-04-18 version from official repo. Thanks to everyone who answered. This patch is no in, both GCC src. -- Joel
Re: [PATCH] Default to -gdwarf-4
On 04/25/12 06:47, Jakub Jelinek wrote: 2012-04-25 Jakub Jelinek ja...@redhat.com * common.opt (flag_debug_types_section): Default to 0. (dwarf_version): Default to 4. (dwarf_record_gcc_switches): Default to 1. (dwarf_strict): Default to 0. * toplev.c (process_options): Don't handle dwarf_strict or dwarf_version here. * config/vxworks.c (vxworks_override_options): Don't test whether dwarf_strict or dwarf_version are negative, instead test !global_options_set.x_dwarf_*. * config/darwin.c (darwin_override_options): Default to dwarf_version 2. * doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches and -fno-debug-types-section are now the default. Ok. r~
Re: [C] improve missing initializers diagnostics
On 25 April 2012 16:46, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: This patch improves missing initializers diagnostics. From: pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers] .h = {1}, ^ pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’) [-Wmissing-field-initializers] .h = {1}, ^ to: pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct h’ [-Wmissing-field-initializers] .h = {1}, ^ pr36446.c:3:7: note: ‘b’ declared here int b; ^ Bootstrapped/regression tested. OK? 2012-04-19 Manuel López-Ibáñez m...@gcc.gnu.org * c-typeck.c (pop_init_level): Improve diagnostics. testsuite/ * gcc.dg/m-un-2.c: Update. * gcc.dg/20011021-1.c: Update. On Linux/x86, revision 186808 gave me: FAIL: gcc.dg/20011021-1.c (test for warnings, line 34) FAIL: gcc.dg/20011021-1.c (test for warnings, line 41) FAIL: gcc.dg/20011021-1.c (test for warnings, line 44) FAIL: gcc.dg/20011021-1.c (test for excess errors) FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27) FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30) FAIL: gcc.dg/m-un-2.c (test for excess errors) FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 14) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 7) FAIL: gcc.dg/missing-field-init-2.c (test for warnings, line 8) FAIL: gcc.dg/missing-field-init-2.c (test for excess errors) Revision 186806 is OK. Somehow I committed a broken version of the patch. It should have been this: --- gcc/c-typeck.c (revision 186821) +++ gcc/c-typeck.c (working copy) @@ -7063,11 +7063,11 @@ pop_init_level (int implicit, struct obs if (warning_at (input_location, OPT_Wmissing_field_initializers, missing initializer for field %qD of %qT, constructor_unfilled_fields, constructor_type)) inform (DECL_SOURCE_LOCATION (constructor_unfilled_fields), - %qT declared here, constructor_unfilled_fields); + %qD declared here, constructor_unfilled_fields); } } I'll commit as soon as it finishes bootstrapping+testing. Sorry for the mistake, Manuel.
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
I think the general mechanism applies to most of the targets. What is needed is target specific parameter (branch budget) tuning which can be done separately -- there exist a way to do that already. David On Wed, Apr 25, 2012 at 2:03 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); You repeatedly do this and walk over all blocks. Please think about compile-time issues when writing code. This all looks sort-of target specific to me and I don't see why this very specialized patch is a good idea when unrolling does a very poor job deciding what and how much to unroll generally. Richard. + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) + return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) + { + if (EDGE_COUNT (body[i]-succs) = 2) + { + /* If this block is executed less frequently than the header (loop + entry), then it is weighted based on the ratio of times it is + executed compared to the header. */ + if (body[i]-count header_count) + n += ((float)body[i]-count)/header_count; + + /* When it is executed more frequently than the header (i.e. it is + in a nested inner loop), simply weight the branch at 1.0. */ + else + n += 1.0; + } + } + free (body); + + return n; +} + +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which can increase branch mispredictions. The number of + branches is computed by weighting each branch with its expected execution + probability through the loop based on
Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c
On 25/04/12 15:31, Richard Guenther wrote: On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com wrote: Richard Guenther wrote: On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com wrote: The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled: FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1 loops 2 The test expects 2 loops to be vectorized, while gcc successfully vectorizes 3 loops in this test using neon on arm. This patch adjusts the expected output. Fixed test passes on qemu for arm and powerpc. OK for trunk? I think the proper fix is to instead of for (i = 0; i N; i++) { input[i] = i; output[i] = 0; if (input[i] 256) abort (); } use for (i = 0; i N; i++) { input[i] = i; output[i] = 0; __asm__ volatile (); } to prevent vectorization of initialization loops. Actually, it looks like both arm and powerpc vectorize this initialization loop (line 31), because the control flow is hoisted outside the loop by previous optimizations. In addition, arm with neon vectorizes the second loop (line 39), but powerpc does not: 39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9; If this is the expected behaviour for powerpc, then the patch I proposed is still needed to fix the test failure on arm. Also, there would be no need to disable vectorization of the initialization loop, right? Ah, I thought that was what changed. Btw, the if () abort () tries to disable vectorization but does not succeed in doing so. Richard. Here is an updated patch. It prevents vectorization of the initialization loop, as Richard suggested, and updates the expected number of vectorized loops accordingly. This patch assumes that the second loop in main (line 39) should only be vectorized on arm with neon. The test passes for arm and powerpc. OK for trunk? If arm cannot handle 9 * i then the approrpiate condition would be vect_int_mult, not arm_neon_ok. The issue is that arm has (well, should be marked has having) vect_char_mult. The difference in count of vectorized loops is based on that. R. Ok with that change. Richard. Thank you, Greta gcc/testsuite/ChangeLog 2012-04-25 Greta Yorsh greta.yo...@arm.com * gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization of initialization loop. (dg-final): Adjust the expected number of vectorized loops.
Re: various minor obvious fixes in th track-macro-expansion code
Manuel López-Ibáñez lopeziba...@gmail.com writes: Hi Dodji, Hello Manuel, Sorry for my late reply, I was on the road. I was going to commit this as obvious, but I want to make sure that it doesn't conflict with your new track-macro-expansion patches. It can also wait until you commit all your patches. [snip] 2012-04-18 Manuel López-Ibáñez m...@gcc.gnu.org * tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Fix comment. Delete unused parameter first_exp_point_map. (virt_loc_aware_diagnostic_finalizer): Update call. libcpp/ * line-map.c (linemap_resolve_location): Synchronize comments with those in line-map.h. * include/line-map.h (linemap_resolve_location): Fix spelling in comment. This is fine by me. I'll easily rebase my tree on your changes. Git FTW. Thanks. -- Dodji
Re: Symbol table 13/many: reachability code rewrite
On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote: + /* Return true when there are references to NODE. */ + + static bool + referred_to_p (symtab_node node) + { + int i; + struct ipa_ref *ref; + + for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); + i++) + return true; This looks odd. Don't you want to do something with ref? + if (symtab_function_p (node) cgraph (node)-callers) + return true; + return false; + } Ciao! Steven
Re: Symbol table 13/many: reachability code rewrite
On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote: + /* Return true when there are references to NODE. */ + + static bool + referred_to_p (symtab_node node) + { + int i; + struct ipa_ref *ref; + + for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); + i++) + return true; This looks odd. Don't you want to do something with ref? No, I have just iteration API, not API to check how many refs are there and I want to see if there are any... Honza
Re: Symbol table 14/many: cleanups of cgraphunit.c
Hello, Thanks for this work! Below are a bunch of spelling fixes ;-) On Wed, Apr 25, 2012 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote: --- 19,28 along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ ! /* This module implements main driver of compilation process. This module implements the main driver of the compilation process. The main scope of this file is to act as an interface in between ! tree based frontends and the backend. s/tree based/GENERIC based/, or leave the 'tree' bit out (is there another kind of front end?). s/frontend/front end/g s/backend/back end/g The front-end is supposed to use following functionality: s/front-end/front end/ The function can be called multiple times when multiple source level ! compilation units are combined. Is this still applicable? I thought -combine was removed. Or is this for things like Fortran modules? - cgraph_optimize ! This passes control to the back-end. Optimizations are performed and s/back-end/back end/ ! final assembler is generated. This is done in the following way. Note ! that with link time optimization the process is split into three ! stages (compile time, linktime analysis and parallel linktime as s/linktime/link time/g ! indicated bellow). ! ! Compile time: ! ! 1) Inter-procedural optimization. ! (ipa_passes) ! ! This part is further split into: ! ! a) early optimizations. These are local passes executed in ! the topological order on the callgraph. ! ! The purpose of early optimiations is to optimize away simple ! things that may otherwise confuse IP analysis. Very simple ! propagation across the callgraph is done i.e. to discover s/i.e./e.g./ ! functions without side effects and simple inlining is performed. ! ! b) early small interprocedural passes. ! ! Those are interprocedural passes executed only at compilation ! time. These include, for exmaple, transational memory lowering, s/exmaple/example/ ! unreachable code removal and other simple transformations. ! ! c) IP analysis stage. All interprocedural passes do their ! analysis. ! ! Interprocedural passes differ from small interprocedural ! passes by their ability to operate across whole program s/whole program/the whole program/, or a whole program, or whole programs. ! at linktime. Their analysis stage is performed early to ! both reduce linking times and linktime memory usage by ! not having to represent whole program in memory. ! ! d) LTO sreaming. When doing LTO, everything important gets s/sreaming/streaming/ ! streamed into the object file. ! ! Compile time and or linktime analysis stage (WPA): ! ! At linktime units gets streamed back and symbol table is s/units gets/units get/ s/and symbol/and the symbol/ ! merged. Function bodies are not streamed in and not ! available. ! e) IP propagation stage. All IP passes execute their ! IP propagation. This is done based on the earlier analysis ! without having function bodies at hand. ! f) Ltrans streaming. When doing WHOPR LTO, the program ! is partitioned and streamed into multple object files. s/multple/multiple/ ! ! Compile time and/or parallel linktime stage (ltrans) + Each of the object files is streamed back and compiled + separately. Now the function bodies becomes available + again. + + 2) Virtual clone materialization + (cgraph_materialize_clone) + + IP passes can produce copies of existing functoins (such s/functoins/functions/ + as versioned clones or inline clones) without actually + manipulating their bodies by creating virtual clones in + the callgraph. At this time the virtual clones are + turned into real functions add newline + 3) IP transformation + + All IP passes transform function bodies based on earlier + decision of the IP propagation. + + 4) late small IP passes + + Simple IP passes working within single program partition. + + 5) Expansion + (cgraph_expand_all_functions) + + At this stage functions that needs to be output into s/needs/need/ + assembler are identified and compiled in topological order s/order/order./ add newline + 6) Output of variables and aliases + Now it is known what variable references was not optimized s/references was not/references were not/ + out and thus all variables are output to the file. + +
Re: [PATCH 13/13] Switch -ftrack-macro-expansion=2 on by default.
Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes: Gaby On Wed, Apr 25, 2012 at 9:33 AM, Dodji Seketeli do...@redhat.com wrote: Hopefully closing the series, this patch switches the compiler to -ftrack-macro-expansion=2 by default. Tested and bootstrapped on x86_64-unknown-linux-gnu against trunk. Gaby Makes sense to me; Tom? Yes, do it. Thanks. Tom
Re: FW: [v3] libstdc++/52689 testcase
On 25 April 2012 08:33, Igor Zamyatin wrote: This testcase is reported as failed on x86 Yep. http://gcc.gnu.org/ml/gcc-testresults/2012-04/msg02547.html Benjamin? Noticed that this testcase wasn't put in as part of the patch. Fixed as follows. tested x86/linux -benjamin
[v3] fix scoped_allocator problems
* include/std/scoped_allocator (scoped_allocator::__outermost): Do not pass non-POD to varargs function. * testsuite/20_util/scoped_allocator/1.cc: Fix test. This fixes a potential problem in scoped_allocator and fixes a broken test (and ensures it actually runs!) Tested x86_64-linux, committed to trunk, will put it on 4.7 too.
Re: Symbol table 13/many: reachability code rewrite
On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote: + /* Return true when there are references to NODE. */ + + static bool + referred_to_p (symtab_node node) + { + int i; + struct ipa_ref *ref; + + for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); + i++) + return true; This looks odd. Don't you want to do something with ref? No, I have just iteration API, not API to check how many refs are there and I want to see if there are any... Actually it is bit too much of cutpaste. Just if (ipa_ref_list_referring_iterate (node-symbol.ref_list, 0, ref)) would work too. i will update it ;) Honza Honza
Re: Symbol table 14/many: cleanups of cgraphunit.c
Hello, Thanks for this work! Below are a bunch of spelling fixes ;-) Thanks! The function can be called multiple times when multiple source level ! compilation units are combined. Is this still applicable? I thought -combine was removed. Or is this for things like Fortran modules? It is still possible to finalize multiple times and it is used internally to finalize after aliases. No frontend is using it as far as I know. Perhaps I will drop it after cleaning up the aliases. Honza
Re: PATCH: PR debug/52857: DW_OP_GNU_regval_type is generated with INVALID_REGNUM
On 04/25/12 06:59, H.J. Lu wrote: PR debug/52857 * dwarf2out.c (dbx_reg_number): Assert return value != INVALID_REGNUM. Ok. r~
Re: Symbol table 13/many: reachability code rewrite
On 04/25/12 11:46, Jan Hubicka wrote: On Wed, Apr 25, 2012 at 5:04 PM, Jan Hubicka hubi...@ucw.cz wrote: + /* Return true when there are references to NODE. */ + + static bool + referred_to_p (symtab_node node) + { + int i; + struct ipa_ref *ref; + + for (i = 0; ipa_ref_list_referring_iterate (node-symbol.ref_list, i, ref); +i++) + return true; This looks odd. Don't you want to do something with ref? No, I have just iteration API, not API to check how many refs are there and I want to see if there are any... Actually it is bit too much of cutpaste. Just if (ipa_ref_list_referring_iterate (node-symbol.ref_list, 0, ref)) would work too. i will update it ;) And with a comment too, please. r~
[PATCH][Cilkplus] Elemental function insertion
Hello Everyone, This patch is for the Cilkplus branch affecting both C and C++ compilers. This patch will insert elemental functions when the loop is vectorized. Thanking You, Yours Sincerely, Balaji V. Iyer.diff --git a/gcc/cilk.h b/gcc/cilk.h index 27d5dd0..27ccd16 100644 --- a/gcc/cilk.h +++ b/gcc/cilk.h @@ -269,5 +269,7 @@ extern void cilk_remove_annotated_functions (rtx first); extern bool cilk_annotated_function_p (char *); extern void debug_zca_data (void); extern zca_data *get_zca_entry (int); -extern void insert_in_zca_table (zca_data); +extern void insert_in_zca_table (zca_data); +extern bool is_elem_fn (tree); +extern tree find_elem_fn_name (tree, tree, tree); #endif /* GCC_CILK_H */ diff --git a/gcc/elem-function.c b/gcc/elem-function.c index 4cc9035..42f6248 100644 --- a/gcc/elem-function.c +++ b/gcc/elem-function.c @@ -83,6 +83,58 @@ static elem_fn_info *extract_elem_fn_values (tree); static tree create_optimize_attribute (int); static tree create_processor_attribute (elem_fn_info *, tree *); +/* this is an helper function for find_elem_fn_param_type */ +static enum elem_fn_parm_type +find_elem_fn_parm_type_1 (tree fndecl, int parm_no) +{ + int ii = 0; + elem_fn_info *elem_fn_values; + + elem_fn_values = extract_elem_fn_values (fndecl); + if (!elem_fn_values) +return TYPE_NONE; + + for (ii = 0; ii elem_fn_values-no_lvars; ii++) +if (elem_fn_values-linear_location[ii] == parm_no) + return TYPE_LINEAR; + + for (ii = 0; ii elem_fn_values-no_uvars; ii++) +if (elem_fn_values-uniform_location[ii] == parm_no) + return TYPE_UNIFORM; + + return TYPE_NONE; +} + + +/* this function will return the type of a parameter in elemental function. + The choices are UNIFORM or LINEAR. */ +enum elem_fn_parm_type +find_elem_fn_parm_type (gimple stmt, tree op) +{ + tree fndecl, parm = NULL_TREE; + int ii, nargs; + enum elem_fn_parm_type return_type = TYPE_NONE; + + if (gimple_code (stmt) != GIMPLE_CALL) +return TYPE_NONE; + + fndecl = gimple_call_fndecl (stmt); + gcc_assert (fndecl); + + nargs = gimple_call_num_args (stmt); + + for (ii = 0; ii nargs; ii++) +{ + parm = gimple_call_arg (stmt, ii); + if (op == parm) + { + return_type = find_elem_fn_parm_type_1 (fndecl, 1); + return return_type; + } +} + return return_type; +} + /* this function will concatinate the suffix to the existing function decl */ static tree rename_elem_fn (tree decl, const char *suffix) @@ -108,7 +160,7 @@ rename_elem_fn (tree decl, const char *suffix) /* this function will check to see if the node is part of an function that * needs to be converted to its vector equivalent. */ -static bool +bool is_elem_fn (tree fndecl) { tree ii_tree; @@ -349,6 +401,55 @@ find_suffix (elem_fn_info *elem_fn_values, bool masked) return suffix; } +tree +find_elem_fn_name (tree old_fndecl, + tree vectype_out ATTRIBUTE_UNUSED, + tree vectype_in ATTRIBUTE_UNUSED) +{ + elem_fn_info *elem_fn_values = NULL; + tree new_fndecl = NULL_TREE, arg_type = NULL_TREE; + char *suffix = NULL; + + elem_fn_values = extract_elem_fn_values (old_fndecl); + + if (elem_fn_values) +{ + if (elem_fn_values-no_vlengths 0) + { + if (elem_fn_values-vectorlength[0] == + (int)TYPE_VECTOR_SUBPARTS (vectype_out)) + suffix = find_suffix (elem_fn_values, false); + else + return NULL_TREE; + } + else + return NULL_TREE; +} + else +return NULL_TREE; + + new_fndecl = copy_node (rename_elem_fn (old_fndecl, suffix)); + TREE_TYPE (new_fndecl) = copy_node (TREE_TYPE (old_fndecl)); + + TYPE_ARG_TYPES (TREE_TYPE (new_fndecl)) = +copy_list (TYPE_ARG_TYPES (TREE_TYPE (new_fndecl))); + + for (arg_type = TYPE_ARG_TYPES (TREE_TYPE (new_fndecl)); + arg_type arg_type != void_type_node; + arg_type = TREE_CHAIN (arg_type)) +TREE_VALUE (arg_type) = vectype_out; + + if (TREE_TYPE (TREE_TYPE (new_fndecl)) != void_type_node) +{ + TREE_TYPE (TREE_TYPE (new_fndecl)) = + copy_node (TREE_TYPE (TREE_TYPE (new_fndecl))); + TREE_TYPE (TREE_TYPE (new_fndecl)) = vectype_out; + DECL_MODE (new_fndecl) = TYPE_MODE (vectype_out); +} + + return new_fndecl; +} + /* this function wil create the elemental vector function node */ static struct cgraph_node * create_elem_fn_nodes (struct cgraph_node *node) diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 1381b53..bea2773 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3. If not see #include langhooks.h #include tree-affine.h #include params.h +#include cilk.h static struct datadep_stats { @@ -4383,8 +4384,18 @@ find_data_references_in_stmt (struct loop *nest, gimple stmt, if
[PATCH] Improve andq $0xffffffff, %reg handling (PR target/53110)
Hi! We have a splitter for reg1 = reg2 0x, but only if regnums are different. But movl %edi, %edi is a cheaper variant of andq $0x, %rdi even with the same register and doesn't clobber flags, so this patch attempts to expand it as a zero extension early. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-04-25 Jakub Jelinek ja...@redhat.com PR target/53110 * config/i386/i386.md (andmode3): For andq $0x, reg instead expand it as zero extension. --- gcc/config/i386/i386.md.jj 2012-04-25 12:14:54.0 +0200 +++ gcc/config/i386/i386.md 2012-04-25 14:50:48.708925963 +0200 @@ -7694,7 +7694,17 @@ (define_expand andmode3 (and:SWIM (match_operand:SWIM 1 nonimmediate_operand) (match_operand:SWIM 2 general_szext_operand)))] - ix86_expand_binary_operator (AND, MODEmode, operands); DONE;) +{ + if (MODEmode == DImode + GET_CODE (operands[2]) == CONST_INT + INTVAL (operands[2]) == (HOST_WIDE_INT) 0x + REG_P (operands[1])) +emit_insn (gen_zero_extendsidi2 (operands[0], +gen_lowpart (SImode, operands[1]))); + else +ix86_expand_binary_operator (AND, MODEmode, operands); + DONE; +}) (define_insn *anddi_1 [(set (match_operand:DI 0 nonimmediate_operand =r,rm,r,r) Jakub
Re: [PATCH] Improve andq $0xffffffff, %reg handling (PR target/53110)
On 04/25/12 12:14, Jakub Jelinek wrote: 2012-04-25 Jakub Jelinek ja...@redhat.com PR target/53110 * config/i386/i386.md (andmode3): For andq $0x, reg instead expand it as zero extension. Ok. r~
FW: [PATCH][Cilkplus] Elemental function insertion
Sorry, I forgot to include Changelog entries. I am attaching the updated patch. Thanks, Balaji V. Iyer. From: Iyer, Balaji V Sent: Wednesday, April 25, 2012 3:08 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH][Cilkplus] Elemental function insertion Hello Everyone, This patch is for the Cilkplus branch affecting both C and C++ compilers. This patch will insert elemental functions when the loop is vectorized. Thanking You, Yours Sincerely, Balaji V. Iyer.diff --git a/gcc/ChangeLog.cilk b/gcc/ChangeLog.cilk index 523b7ac..d6b28e2 100644 --- a/gcc/ChangeLog.cilk +++ b/gcc/ChangeLog.cilk @@ -1,3 +1,19 @@ +2012-04-24 Balaji V. Iyer balaji.v.i...@intel.com + + * elem-function.c (find_elem_fn_param_type_1): New function. + (find_elem_fn_param_type): Likewise. + (find_elem_fn_name): Likewise. + (is_elem_fn): Make it unstatic. + * tree-data-ref.c (find_data_references_in_stmt): Added support for + functions that can be made to elemental functions. + * tree-vect-stmts.c (vect_get_vec_def_for_operand): Added a check for + the parameters to see whether it is uniform, linear or neither. + (vectorizable_function): Handled code to substitute regular function + with the equivalent elemental function. + (vectorizable_call): Set the function type for substituted elemental + function. + * tree.h (enum elem_fn_parm_type): New enum. + 2012-04-20 Balaji V. Iyer balaji.v.i...@intel.com * final.c (rest_of_handle_final): Moved outputing ZCA data after diff --git a/gcc/cilk.h b/gcc/cilk.h index 27d5dd0..27ccd16 100644 --- a/gcc/cilk.h +++ b/gcc/cilk.h @@ -269,5 +269,7 @@ extern void cilk_remove_annotated_functions (rtx first); extern bool cilk_annotated_function_p (char *); extern void debug_zca_data (void); extern zca_data *get_zca_entry (int); -extern void insert_in_zca_table (zca_data); +extern void insert_in_zca_table (zca_data); +extern bool is_elem_fn (tree); +extern tree find_elem_fn_name (tree, tree, tree); #endif /* GCC_CILK_H */ diff --git a/gcc/elem-function.c b/gcc/elem-function.c index 4cc9035..42f6248 100644 --- a/gcc/elem-function.c +++ b/gcc/elem-function.c @@ -83,6 +83,58 @@ static elem_fn_info *extract_elem_fn_values (tree); static tree create_optimize_attribute (int); static tree create_processor_attribute (elem_fn_info *, tree *); +/* this is an helper function for find_elem_fn_param_type */ +static enum elem_fn_parm_type +find_elem_fn_parm_type_1 (tree fndecl, int parm_no) +{ + int ii = 0; + elem_fn_info *elem_fn_values; + + elem_fn_values = extract_elem_fn_values (fndecl); + if (!elem_fn_values) +return TYPE_NONE; + + for (ii = 0; ii elem_fn_values-no_lvars; ii++) +if (elem_fn_values-linear_location[ii] == parm_no) + return TYPE_LINEAR; + + for (ii = 0; ii elem_fn_values-no_uvars; ii++) +if (elem_fn_values-uniform_location[ii] == parm_no) + return TYPE_UNIFORM; + + return TYPE_NONE; +} + + +/* this function will return the type of a parameter in elemental function. + The choices are UNIFORM or LINEAR. */ +enum elem_fn_parm_type +find_elem_fn_parm_type (gimple stmt, tree op) +{ + tree fndecl, parm = NULL_TREE; + int ii, nargs; + enum elem_fn_parm_type return_type = TYPE_NONE; + + if (gimple_code (stmt) != GIMPLE_CALL) +return TYPE_NONE; + + fndecl = gimple_call_fndecl (stmt); + gcc_assert (fndecl); + + nargs = gimple_call_num_args (stmt); + + for (ii = 0; ii nargs; ii++) +{ + parm = gimple_call_arg (stmt, ii); + if (op == parm) + { + return_type = find_elem_fn_parm_type_1 (fndecl, 1); + return return_type; + } +} + return return_type; +} + /* this function will concatinate the suffix to the existing function decl */ static tree rename_elem_fn (tree decl, const char *suffix) @@ -108,7 +160,7 @@ rename_elem_fn (tree decl, const char *suffix) /* this function will check to see if the node is part of an function that * needs to be converted to its vector equivalent. */ -static bool +bool is_elem_fn (tree fndecl) { tree ii_tree; @@ -349,6 +401,55 @@ find_suffix (elem_fn_info *elem_fn_values, bool masked) return suffix; } +tree +find_elem_fn_name (tree old_fndecl, + tree vectype_out ATTRIBUTE_UNUSED, + tree vectype_in ATTRIBUTE_UNUSED) +{ + elem_fn_info *elem_fn_values = NULL; + tree new_fndecl = NULL_TREE, arg_type = NULL_TREE; + char *suffix = NULL; + + elem_fn_values = extract_elem_fn_values (old_fndecl); + + if (elem_fn_values) +{ + if (elem_fn_values-no_vlengths 0) + { + if (elem_fn_values-vectorlength[0] == + (int)TYPE_VECTOR_SUBPARTS (vectype_out)) + suffix = find_suffix (elem_fn_values, false); + else + return NULL_TREE; + } + else + return
Re: [PATCH] Validate the HLE memory models.
On 04/25/12 01:29, Andi Kleen wrote: gcc/: 2012-04-24 Andi Kleen a...@linux.intel.com * builtins.c (get_memmodel): Add val. Call target.memmodel_check and return new variable. * config/i386/i386.c (ix86_memmodel_check): Add. (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK. * doc/tm.texi (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK.. * doc/tm.texi.in (TARGET_MEMMODEL_MASK): Replace with TARGET_MEMMODEL_CHECK. * target.def (memmodel_mask): Replace with memmodel_check. This looks much better to me than the mask solution. + int val; +ix86_memmodel_check (unsigned val) Please use HOST_WIDE_INT throughout, otherwise you'll fail to warn for truly silly parameters such as 0x1__. r~