Re: [PATCH] Fix PR 33512 Folding of x ((~x) | y) into x y on the tree level
On Mon, Apr 23, 2012 at 11:21 PM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: On Mon, Apr 23, 2012 at 2:41 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Apr 21, 2012 at 6:06 AM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: This time with the patch and describing what the bug was. The problem was defcodefor_name does not always set arg1 and arg2. This fixes it so it is always set to NULL if they don't exist. Ok with ... + if (code1 == SSA_NAME) + { + def = SSA_NAME_DEF_STMT (name); + + if (def is_gimple_assign (def) + can_propagate_from (def)) + { + code1 = gimple_assign_rhs_code (def); + arg11 = gimple_assign_rhs1 (def); + arg21 = gimple_assign_rhs2 (def); + arg31 = gimple_assign_rhs2 (def); + } + } ... recursing here instead. Recursing how? Or do you mean when code1 is a SSA_NAME do a recursive call so that we can get some simple copy-prop happening? The current code does not do that though. Ah, this is all pre-existing code. But yes, to get simple copy-prop happening (which is what this code seems to do, but just a single level). The patch is ok as-is, you can improve ontop of it if you like. Thanks, Richard. Thanks, Andrew Thanks, Richard. Thanks, Andrew On Fri, Apr 20, 2012 at 9:05 PM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: On Thu, Jan 19, 2012 at 3:13 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jan 19, 2012 at 10:00 AM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: On Tue, Jan 17, 2012 at 1:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Jan 17, 2012 at 8:06 AM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: Hi, This adds the folding of x ((~x) | y)) into x y on the tree level via fold-const.c There is already partly done on the RTL level but it would be a good thing for the tree level also. OK for 4.8 (yes I know we have not branched yet but I thought I would send it out so I don't forget about it)? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Can you instead patch tree-ssa-forwprop.c:simplify_bitwise_binary? Yes and here is a new patch which also adds optimizing x | ((~x) y)) to x | y. Also it adds the optimizing x (x | y) to x and x | (x y) to x to tree-ssa-forwprop.c:simplify_bitwise_binary since it was an easy extension on top of the ~x case (well I implemented the one without the ~ first). I did not remove those folding from fold-const.c though. Also I was thinking maybe this belongs in reassociate though I don't see how to do it. I still have plans to create that piecewise gimple_fold (see my proposal from early last year) that would be the container for this kind of pattern matching. It would then be usable from reassoc as well (but reassoc has the issue of only collecting one kind of op, so its simplification wouldn't trigger reliably on these). http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01099.html OK for 4.8, once in stage 1? Again bootstrapped and tested on x86_64-linux-gnu with no regressions. Ok. Here is an updated patch which fixes a bug which I found while doing the treecombine branch. I rewrote defcodefor_name so more than SSA_NAMEs can be passed to it. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: ChangeLog: * tree-ssa-forwprop.c (defcodefor_name): New function. (simplify_bitwise_binary): Use defcodefor_name instead of manually Simplify ( X | Y) X to X and ( X Y) | X to X. Simplify (~X | Y) X to X Y and (~X Y) | X to X | Y. testsuite/ChangeLog: * gcc.dg/tree-ssa/andor-3.c: New testcase. * gcc.dg/tree-ssa/andor-4.c: New testcase. * gcc.dg/tree-ssa/andor-5.c: New testcase. Thanks, Richard. Thanks, Andrew Pinski ChangeLog: * tree-ssa-forwprop.c (defcodefor_name): New function. (simplify_bitwise_binary): Use defcodefor_name. Simplify ( X | Y) X to X and ( X Y) | X to X. Simplify (~X | Y) X to X Y and (~X Y) | X to X | Y. testsuite/ChangeLog: * gcc.dg/tree-ssa/andor-3.c: New testcase. * gcc.dg/tree-ssa/andor-4.c: New testcase. * gcc.dg/tree-ssa/andor-5.c: New testcase. Thanks, Richard. Thanks, Andrew Pinski ChangeLog: * fold-const.c (fold_binary_loc case BIT_AND_EXPR): Add folding of x (~x | y) into x y. testsuite/ChangeLog: * gcc.dg/tree-ssa/andor-3.c: New testcase.
Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058)
On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! My PR52533 patch fixed unsigned comparisons, but not signed ones where the maximum is different. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-04-23 Jakub Jelinek ja...@redhat.com PR tree-optimization/53058 * tree-vrp.c (register_edge_assert_for_2): Compare mask for LE_EXPR or GT_EXPR with TYPE_MAX_VALUE of TREE_TYPE (val) instead of all ones in the precision. * gcc.c-torture/compile/pr53058.c: New test. --- gcc/tree-vrp.c.jj 2012-04-23 11:11:21.0 +0200 +++ gcc/tree-vrp.c 2012-04-23 16:08:43.752462433 +0200 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e INTEGRAL_TYPE_P (TREE_TYPE (name2)) IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1) prec = 2 * HOST_BITS_PER_WIDE_INT + prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val))) live_on_edge (e, name2) !has_single_use (name2)) { @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e new_val = val2; else { + double_int maxval + = tree_to_double_int (TYPE_MAX_VALUE (TREE_TYPE (val))); I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max. For enums this can be not what you expect(?) Please consider adding a double_int_max_value (unsigned prec, bool sign) and double_int_min_value. Thanks, Richard. mask = double_int_ior (tree_to_double_int (val2), mask); - if (double_int_minus_one_p (double_int_sext (mask, prec))) + if (double_int_equal_p (mask, maxval)) new_val = NULL_TREE; else new_val = double_int_to_tree (TREE_TYPE (val2), mask); --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj 2012-04-23 15:53:41.489982650 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c 2012-04-23 15:53:13.0 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/53058 */ + +int a, b, c; + +void +foo () +{ + c = b 16; + if (c 32767) + c = 0; + a = b; +} Jakub
[PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)
On Tue, Apr 24, 2012 at 08:05:35AM +0200, Richard Guenther wrote: On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote: I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max. For enums this can be not what you expect(?) Please consider adding a double_int_max_value (unsigned prec, bool sign) and double_int_min_value. So like this instead? 2012-04-24 Jakub Jelinek ja...@redhat.com PR tree-optimization/53058 * double-int.h (double_int_max_value, double_int_min_value): New functions. * tree-vrp.c (register_edge_assert_for_2): Compare mask for LE_EXPR or GT_EXPR with double_int_max_value instead of double_int_mask. * gcc.c-torture/compile/pr53058.c: New test. --- gcc/double-int.h.jj 2012-04-19 11:09:13.0 +0200 +++ gcc/double-int.h2012-04-24 08:42:42.655756406 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -280,6 +280,26 @@ double_int_equal_p (double_int cst1, dou return cst1.low == cst2.low cst1.high == cst2.high; } +/* Returns a maximum value for signed or unsigned integer + of precision PREC. */ + +static inline double_int +double_int_max_value (unsigned int prec, bool uns) +{ + return double_int_mask (prec - (uns ? 0 : 1)); +} + +/* Returns a minimum value for signed or unsigned integer + of precision PREC. */ + +static inline double_int +double_int_min_value (unsigned int prec, bool uns) +{ + if (uns) +return double_int_zero; + return double_int_lshift (double_int_one, prec - 1, prec, false); +} + /* Legacy interface with decomposed high/low parts. */ --- gcc/tree-vrp.c.jj 2012-04-23 11:11:21.0 +0200 +++ gcc/tree-vrp.c 2012-04-24 09:06:29.116624564 +0200 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e INTEGRAL_TYPE_P (TREE_TYPE (name2)) IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1) prec = 2 * HOST_BITS_PER_WIDE_INT + prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val))) live_on_edge (e, name2) !has_single_use (name2)) { @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e new_val = val2; else { + double_int maxval + = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val))); mask = double_int_ior (tree_to_double_int (val2), mask); - if (double_int_minus_one_p (double_int_sext (mask, prec))) + if (double_int_equal_p (mask, maxval)) new_val = NULL_TREE; else new_val = double_int_to_tree (TREE_TYPE (val2), mask); --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj2012-04-23 15:53:41.489982650 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c 2012-04-23 15:53:13.0 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/53058 */ + +int a, b, c; + +void +foo () +{ + c = b 16; + if (c 32767) +c = 0; + a = b; +} Jakub
RE: [H8300] Use braced strings in MD
Hi, Great. I approved it this morning and it looks like your account was created soon thereafter. Thanks for the approval. My account has been created. add yourself to the MAINTAINTERS file with write-after-approval privileges. Done. http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00705.html Looks good. Please install. Similarly for the v850 changes. The h8300.md and v850.md are committed at the following links respectively on trunk:- http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00696.html http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00699.html Thanks for your help. Regards, Naveen
[PATCH] Fix PR53098
This fixes PR53098. Committed as obvious. Richard. 2012-04-24 Richard Guenther rguent...@suse.de PR tree-optimization/53098 * tree-vect-loop.c (vect_analyze_loop_operations): Fixup comparison sign. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 186751) +++ gcc/tree-vect-loop.c(working copy) @@ -1411,7 +1411,7 @@ vect_analyze_loop_operations (loop_vec_i if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) (LOOP_VINFO_INT_NITERS (loop_vinfo) vectorization_factor)) || ((max_niter = max_stmt_executions_int (loop)) != -1 - max_niter vectorization_factor)) + (unsigned HOST_WIDE_INT) max_niter vectorization_factor)) { if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) fprintf (vect_dump, not vectorized: iteration count too small.);
Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)
On Tue, Apr 24, 2012 at 9:10 AM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Apr 24, 2012 at 08:05:35AM +0200, Richard Guenther wrote: On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote: I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max. For enums this can be not what you expect(?) Please consider adding a double_int_max_value (unsigned prec, bool sign) and double_int_min_value. So like this instead? Yes! Thanks, Richard. 2012-04-24 Jakub Jelinek ja...@redhat.com PR tree-optimization/53058 * double-int.h (double_int_max_value, double_int_min_value): New functions. * tree-vrp.c (register_edge_assert_for_2): Compare mask for LE_EXPR or GT_EXPR with double_int_max_value instead of double_int_mask. * gcc.c-torture/compile/pr53058.c: New test. --- gcc/double-int.h.jj 2012-04-19 11:09:13.0 +0200 +++ gcc/double-int.h 2012-04-24 08:42:42.655756406 +0200 @@ -1,5 +1,5 @@ /* Operations with long integers. - Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -280,6 +280,26 @@ double_int_equal_p (double_int cst1, dou return cst1.low == cst2.low cst1.high == cst2.high; } +/* Returns a maximum value for signed or unsigned integer + of precision PREC. */ + +static inline double_int +double_int_max_value (unsigned int prec, bool uns) +{ + return double_int_mask (prec - (uns ? 0 : 1)); +} + +/* Returns a minimum value for signed or unsigned integer + of precision PREC. */ + +static inline double_int +double_int_min_value (unsigned int prec, bool uns) +{ + if (uns) + return double_int_zero; + return double_int_lshift (double_int_one, prec - 1, prec, false); +} + /* Legacy interface with decomposed high/low parts. */ --- gcc/tree-vrp.c.jj 2012-04-23 11:11:21.0 +0200 +++ gcc/tree-vrp.c 2012-04-24 09:06:29.116624564 +0200 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e INTEGRAL_TYPE_P (TREE_TYPE (name2)) IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1) prec = 2 * HOST_BITS_PER_WIDE_INT + prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val))) live_on_edge (e, name2) !has_single_use (name2)) { @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e new_val = val2; else { + double_int maxval + = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val))); mask = double_int_ior (tree_to_double_int (val2), mask); - if (double_int_minus_one_p (double_int_sext (mask, prec))) + if (double_int_equal_p (mask, maxval)) new_val = NULL_TREE; else new_val = double_int_to_tree (TREE_TYPE (val2), mask); --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj 2012-04-23 15:53:41.489982650 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c 2012-04-23 15:53:13.0 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/53058 */ + +int a, b, c; + +void +foo () +{ + c = b 16; + if (c 32767) + c = 0; + a = b; +} Jakub
[Committed] Fix bootstrap issue in tree-ssa-forwprop.c
I tested my two patches independently and forgot to test together and one used a variable which I removed in the other. Anyways this patch changes the code to be correct and fixes the bootstrap issue. Committed as obvious after building stage1. Thanks, Andrew Pinski ChangeLog: 2012-04-24 Andrew Pinski apin...@cavium.com * tree-ssa-forwprop.c (simplify_bitwise_binary): Don't directly use def1/def2. Index: ChangeLog === --- ChangeLog (revision 186756) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2012-04-24 Andrew Pinski apin...@cavium.com + + * tree-ssa-forwprop.c (simplify_bitwise_binary): + Don't directly use def1/def2. + 2012-04-24 Richard Guenther rguent...@suse.de PR tree-optimization/53098 Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 186755) +++ tree-ssa-forwprop.c (working copy) @@ -1913,10 +1913,10 @@ simplify_bitwise_binary (gimple_stmt_ite /* Simplify (A B) OP0 (C B) to (A OP0 C) B. */ if (def1_code == def2_code def1_code == BIT_AND_EXPR -operand_equal_for_phi_arg_p (gimple_assign_rhs2 (def1), - gimple_assign_rhs2 (def2))) +operand_equal_for_phi_arg_p (def1_arg2, + def2_arg2)) { - tree b = gimple_assign_rhs2 (def1); + tree b = def1_arg2; tree a = def1_arg1; tree c = def2_arg1; tree inner = fold_build2 (code, TREE_TYPE (arg2), a, c);
Re: [Fixinclude]: Fix typo and default to twoprocess on VMS
On Apr 18, 2012, at 8:23 PM, Bruce Korb wrote: Hi, When I approved a patch in 2008, there was a typo. I didn't notice and it was fixed by removing a formatting element. Your patch corrects the error. Please apply your changes to active branches. Thank you! Regards, Bruce Thanks, committed on the trunk. Tristan.
[PATCH] Record vectorized loop epilogue loop max number of iterations
I have two patches who also record the maximum number of loop iterations for the epilogue loop the vectorizer creates. The first one reverts an old patch that made us re-use that epilogue loop also for the unvectorized case (when versioning for alignment or aliasing). Thus it creates one more loop copy (though it then properly separates the epilogue with very few iterations from the unvectorized loops with the original number of iterations). The second patch simply only records the maximum number of iterations when we do not re-use that loop body for the unvectorized version. Numbers on the single testcase I tried this on shows the second version has slightly better code-size behavior. I was testing this on the single-file leslie3d SPEC CPU 2006 testcase whose file-size regressed the most when enabling array-prefetching by default on certain AMD models. The numbers are as follows, base flags -Ofast -funroll-loops -fpeel-loops -march=barcelona. unpatched patch #1 patch #2 564423 391623386151 -fno-prefetch-loop-arrays 366247 308711303783 -fwhole-program 471481 326944322925 the difference is barely noticable but the testcase is surely special. Both patches were bootstrapped and tested on x86_64-unknown-linux-gnu (the first one needs some assembler scan adjustments in the x86 testsuite still). I'm leaning towards using the first patch, as only that will enable us to more easily dis-entangle the various loops generated by the vectorizer and eventually optimize them more (like unroll the epilogue, re-order and re-structure the cost model, alignment and alias checks, etc.). But for wider testing coverage (and also to get some runtime numbers, which should show just noise for both patches ...) I'm going to apply patch two first and later revert to patch one. Thanks, Richard. 2012-04-24 Richard Guenther rguent...@suse.de * tree-vectorizer.h (vect_loop_versioning): Adjust prototype. * tree-vect-loop.c (vect_transform_loop): Adjust. * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Record the maximum number of iterations for the epilogue loop. (vect_loop_versioning): Remove case re-using the peeled epilogue loop. Index: gcc/tree-vectorizer.h === *** gcc/tree-vectorizer.h (revision 186709) --- gcc/tree-vectorizer.h (working copy) *** extern LOC vect_loop_location; *** 807,813 in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); ! extern void vect_loop_versioning (loop_vec_info, bool, tree *, gimple_seq *); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *, tree, gimple_seq); extern void vect_do_peeling_for_alignment (loop_vec_info); --- 807,813 in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); ! extern void vect_loop_versioning (loop_vec_info); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *, tree, gimple_seq); extern void vect_do_peeling_for_alignment (loop_vec_info); Index: gcc/tree-vect-loop.c === *** gcc/tree-vect-loop.c(revision 186709) --- gcc/tree-vect-loop.c(working copy) *** vect_transform_loop (loop_vec_info loop_ *** 5229,5235 unsigned int nunits; tree cond_expr = NULL_TREE; gimple_seq cond_expr_stmt_list = NULL; - bool do_peeling_for_loop_bound; gimple stmt, pattern_stmt; gimple_seq pattern_def_seq = NULL; gimple_stmt_iterator pattern_def_si = gsi_start (NULL); --- 5229,5234 *** vect_transform_loop (loop_vec_info loop_ *** 5244,5260 if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)) vect_do_peeling_for_alignment (loop_vinfo); - do_peeling_for_loop_bound - = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) -|| (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0) -|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); - if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo) || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) ! vect_loop_versioning (loop_vinfo, ! !do_peeling_for_loop_bound, ! cond_expr, cond_expr_stmt_list); /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a compile time constant), or it is a constant that doesn't divide by the --- 5243,5251 if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)) vect_do_peeling_for_alignment
Re: [PATCH] Support for known unknown alignment
Hi, On Mon, Apr 23, 2012 at 03:30:19PM +0200, Richard Guenther wrote: On Mon, 23 Apr 2012, Martin Jambor wrote: Hi, On Mon, Apr 23, 2012 at 12:50:51PM +0200, Richard Guenther wrote: On Fri, 20 Apr 2012, Martin Jambor wrote: Hi, two days ago I talked to Richi on IRC about the functions to determine the expected alignment of objects and pointers we have and he suggested that get_object_alignment_1 and get_pointer_alignment_1 should return whether the alignment is actually known and return the actual alignment in a reference variable (as they currently return bitpos). ... This looks all good apart from + void + set_ptr_info_alignment (struct ptr_info_def *pi, unsigned int align, + unsigned int misalign) + { + gcc_checking_assert (align != 0); Why this assert? If we assert sth then we should assert that misalign ~(align - 1) == 0 and that align is a power-of-two or 0. Because the patch makes zero value special, meaning we do not know, and my intention was that this should be announced using mark_ptr_info_alignment_unknown, mainly to prevent bugs when somebody miscalculated the alignment. Without the assert, the subsystem would silently accept zero as its internal flag which would not ICE or miscompile stuff, but perhaps unnecessarily pessimistic alignment assumptions would be used. Moreover, the smallest sensible alignment is 1, the misalign component is also byte-based (after all, they describe pointers) and zero alignment does not really have any meaning, or does it? Hmm, indeed (given the docs mention ptr (align - 1)). I'll add a FIXME to the place where CCP previously attempted setting alignment zero, but I won't have time to investigate that anytime soon. Well, CCP simply tracks known-bits and derives the alignment value from that. If tem -tem computes as zero that means val-mask.low is all zeros. Thus the if (align == 1) check should have been if (align = 1) from the start. No fixme necessary I think. Asserting alignment is a power of two and that misalign is not bigger than align are good ideas. + pi-align = align; + pi-misalign = misalign; + } + + /* If pointer decribed by PI has known alignment, increase its known +misalignment by INCREMENT modulo its current alignment. */ + + void + increment_ptr_info_misalignment (struct ptr_info_def *pi, + unsigned int increment) + { Hmm, I'd call it adjust_ptr_info_misalignment instead. Fine with me. Ok with that changes. Unfortunately, the testsuite results from ppc64 look like there are some problems. Hopefully they are not too complex or I'll have to postpone this for a few weeks as I have more pressing tasks to do now. The testsuite differences I got on Friday were probably noise, tonight (on an updated svn tree) I did not get any on ppc64-linux, x86_64-linux or i686-linux. Considering that and the OK above, I'm going to bootstrap and test also on sparc64-linux and ia64-linux and if those pass too, I'll commit the patch tomorrow, unless there are any objections. Of course, I have renamed increment_ptr_info_misalignment to adjust_ptr_info_misalignment and changed the asserts at the beginning of set_ptr_info_alignment to: gcc_checking_assert (align != 0); gcc_assert ((align (align - 1)) == 0); gcc_assert ((misalign ~(align - 1)) == 0); Those were the only changes, the whole updated patch is below. Thanks, Martin 2012-04-23 Martin Jambor mjam...@suse.cz * builtins.c (get_object_alignment_1): Return whether we can determine the alignment or conservatively assume byte alignment. Return the alignment by reference. Use get_pointer_alignment_1 for dereference alignment. (get_pointer_alignment_1): Return whether we can determine the alignment or conservatively assume byte alignment. Return the alignment by reference. Use get_ptr_info_alignment to get SSA name alignment. (get_object_alignment): Update call to get_object_alignment_1. (get_object_or_type_alignment): Likewise, fall back to type alignment only when it returned false. (get_pointer_alignment): Update call to get_pointer_alignment_1. * fold-const.c (get_pointer_modulus_and_residue): Update call to get_object_alignment_1. * ipa-prop.c (ipa_modify_call_arguments): Update call to get_pointer_alignment_1. * tree-sra.c (build_ref_for_offset): Likewise, fall back to the type of MEM_REF or TARGET_MEM_REF only when it returns false. * tree-ssa-ccp.c (get_value_from_alignment): Update call to get_object_alignment_1. (ccp_finalize): Use set_ptr_info_alignment. * tree.h (get_object_alignment_1): Update declaration.
Re: [PATCH] genattrtab: avoid NULL-deref on error
Richard Guenther wrote: On Sat, Apr 21, 2012 at 5:53 PM, Jim Meyering j...@meyering.net wrote: I see that no one responded when I posted this in February. Ok to commit, now? Ok. Thanks. Pushed.
[Patch,avr,trunk,4.7]: Fix PR53065
This implements HARD_REGNO_CALL_PART_CLOBBERED. Ok to apply? Johann PR target/53065 * config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): New hook define. Index: config/avr/avr.h === --- config/avr/avr.h (revision 186600) +++ config/avr/avr.h (working copy) @@ -394,6 +394,11 @@ enum reg_class { #define REGNO_OK_FOR_INDEX_P(NUM) 0 +#define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE)\ + (((REGNO) 18 (REGNO) + GET_MODE_SIZE (MODE) 18) \ + || ((REGNO) REG_Y (REGNO) + GET_MODE_SIZE (MODE) REG_Y) \ + || ((REGNO) REG_Z (REGNO) + GET_MODE_SIZE (MODE) REG_Z)) + #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true #define STACK_PUSH_CODE POST_DEC
[PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch
Hello, here is a port of Android support patch (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7 branch. Is it OK? Bootstrap and check passed on linux-x86_64. Thanks, Ilya --- 2012-04-24 Enkovich Ilya ilya.enkov...@intel.com * config/i386/linux-common.h: New. * config.gcc: Add i386/linux-common.h before all i386/linux.h and i386/linux64.h usages. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New. (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC. (GNU_USER_TARGET_CC1_SPEC): New. (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC. (GNU_USER_TARGET_MATHFILE_SPEC): New. (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC. * config/i386/gnu-user64.h: Likewise. mandroid-4_7.patch Description: Binary data
[PATCH] Make sizetypes no longer sign-extending
I've been carrying this patch for quite some while now and really want to go forward with it - the problem is that while all default languages work fine after this patch Ada shows some testsuite regressions. I've had various hacks/workarounds throughout the Ada frontend for them, but lost track of what fixed what and they all felt like hacks anyway. Thus - I know the patch will add Ada testsuite regressions. But it will not break Ada bootstrap. Ada is not in the set of default languages, nor is it considered release critical. Are the Ada folks happy with helping to fix the fallout after-the-fact (I got Eric to fix the bootstrap issues that were first present - thanks for that)? I am happy to revisit my hacks/workarounds and post them, but it will be ultimatively easier to review them if you can see the FAIL for yourself (there are some workarounds/hacks posted on the mailinglist for previous attempts IIRC). Thanks for your consideration. The patch is currently under re-testing (it needs the 2nd patch below, which was already approved but back in time broke Ada bootstrap - I didn't verify if that still occurs). Richard. 2012-03-06 Richard Guenther rguent...@suse.de * fold-const.c (div_if_zero_remainder): sizetypes no longer sign-extend. (int_const_binop_1): New worker for int_const_binop with overflowable parameter. Pass it through to force_fit_type_double. (int_const_binop): Wrap around int_const_binop_1 with overflowable equal to one. (size_binop_loc): Call int_const_binop_1 with overflowable equal to minus one, forcing overflow detection for even unsigned types. (extract_muldiv_1): Remove bogus TYPE_IS_SIZETYPE special-casing. (fold_binary_loc): Call try_move_mult_to_index with signed offset. * stor-layout.c (initialize_sizetypes): sizetypes no longer sign-extend. (layout_type): For zero-sized arrays ignore overflow on the size calculations. * tree-ssa-ccp.c (bit_value_unop_1): Likewise. (bit_value_binop_1): Likewise. * tree.c (double_int_to_tree): Likewise. (double_int_fits_to_tree_p): Likewise. (force_fit_type_double): Likewise. (host_integerp): Likewise. (int_fits_type_p): Likewise. * varasm.c (output_constructor_regular_field): Sign-extend the field-offset to cater for negative offsets produced by the Ada frontend. * omp-low.c (extract_omp_for_data): Convert the loop step to signed for pointer adjustments. * g++.dg/tree-ssa/pr19807.C: Adjust. Index: trunk/gcc/fold-const.c === *** trunk.orig/gcc/fold-const.c 2012-03-06 12:41:10.0 +0100 --- trunk/gcc/fold-const.c 2012-03-06 14:46:25.0 +0100 *** div_if_zero_remainder (enum tree_code co *** 191,199 does the correct thing for POINTER_PLUS_EXPR where we want a signed division. */ uns = TYPE_UNSIGNED (TREE_TYPE (arg2)); - if (TREE_CODE (TREE_TYPE (arg2)) == INTEGER_TYPE -TYPE_IS_SIZETYPE (TREE_TYPE (arg2))) - uns = false; quo = double_int_divmod (tree_to_double_int (arg1), tree_to_double_int (arg2), --- 191,196 *** int_binop_types_match_p (enum tree_code *** 935,942 to produce a new constant. Return NULL_TREE if we don't know how to evaluate CODE at compile-time. */ ! tree ! int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2) { double_int op1, op2, res, tmp; tree t; --- 932,940 to produce a new constant. Return NULL_TREE if we don't know how to evaluate CODE at compile-time. */ ! static tree ! int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree arg2, ! int overflowable) { double_int op1, op2, res, tmp; tree t; *** int_const_binop (enum tree_code code, co *** 1078,1090 return NULL_TREE; } ! t = force_fit_type_double (TREE_TYPE (arg1), res, 1, ((!uns || is_sizetype) overflow) | TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2)); return t; } /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new constant. We assume ARG1 and ARG2 have the same data type, or at least are the same kind of constant and the same machine mode. Return zero if --- 1076,1094 return NULL_TREE; } ! t = force_fit_type_double (TREE_TYPE (arg1), res, overflowable, ((!uns || is_sizetype) overflow) | TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2)); return t; } + tree + int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2) + { + return int_const_binop_1 (code, arg1, arg2, 1); + } + /* Combine two constants ARG1 and ARG2
Re: [fortran, patch] Allow displaying backtraces from user code
Hi guys, (coming back to an old patch proposed by FX some time ago ...) 2012/3/3 FX fxcoud...@gmail.com: I think that this approach is a mistake. The patch starts us down a slippery slope. Why not export all the nonstandard intrinsics? In additions, the _gfortran_ prefix was used to separate the libgfortran namespace from userspace. Providing a means to circumvent this separation seems to asking for more PR. Well, the mean exists. All _gfortran_* functions can already be called, they're part of libgfortran's public (and versioned) API. I'm just saying adding a simple backtrace function to that list is useful, and documenting it too. Yes, I agree that this is useful, and in that sense the patch is ok from my side ... I would rather see a new intrinsic procedure add to gfortran. The standard does not prevent a vendor from offer additional intrinsic procedures not found in the standard. I just think multiplicating vendor intrinsics makes our life harder. I'd like to hear other's opinions on this issue, but I'll implement the new intrinsic if that's the consensus. ... but I also think that having an intrinsic function for it would be useful, so that one can just call it without the detour via ISO_C_BINDING. Note that ifort also has a vendor intrinsic for this, called TRACEBACKQQ, so we're in good company. Cheers, Janus
Re: [google/integration] Extend C++11 UDLs to be compatible with inttypes.h macros (issue6104051)
On Mon, Apr 23, 2012 at 18:16, Ollie Wild a...@google.com wrote: If the trunk change isn't approved today, I'll probably go ahead and check in *that* patch to google/integration. The main difference relative to this is that the option name has been changed to -Wliteral-suffix. *That* part has been approved, so hopefully any additional adjustments will be minor. Sounds good. Thanks. Diego.
Re: [Patch,avr,trunk,4.7]: Fix PR53065
2012/4/24 Georg-Johann Lay a...@gjlay.de: This implements HARD_REGNO_CALL_PART_CLOBBERED. Ok to apply? Johann PR target/53065 * config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): New hook define. Please commit. Denis.
Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch
On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello, here is a port of Android support patch (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7 branch. Is it OK? This kind of re-org is not appropriate for the branch. Thanks, Richard. Bootstrap and check passed on linux-x86_64. Thanks, Ilya --- 2012-04-24 Enkovich Ilya ilya.enkov...@intel.com * config/i386/linux-common.h: New. * config.gcc: Add i386/linux-common.h before all i386/linux.h and i386/linux64.h usages. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New. (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC. (GNU_USER_TARGET_CC1_SPEC): New. (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC. (GNU_USER_TARGET_MATHFILE_SPEC): New. (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC. * config/i386/gnu-user64.h: Likewise.
Re: [PATCH] Make sizetypes no longer sign-extending
On 24/04/12 14:03, Richard Guenther wrote: /* Size types *are* sign extended. */ ! sign_extended_type = !TYPE_UNSIGNED (type); looks like you failed to update the comment. R.
Re: [PATCH] Make sizetypes no longer sign-extending
On Tue, 24 Apr 2012, Richard Earnshaw wrote: On 24/04/12 14:03, Richard Guenther wrote: /* Size types *are* sign extended. */ ! sign_extended_type = !TYPE_UNSIGNED (type); looks like you failed to update the comment. Indeed. Fixed. Thanks, Richard.
Re: [PATCH v2] ARM: Use different linker path for hardfloat ABI
On 23/04/12 22:36, Michael Hope wrote: On 24 April 2012 03:35, Richard Earnshaw rearn...@arm.com wrote: On 22/04/12 23:20, Michael Hope wrote: Change the dynamic linker path for ARM hard float executables. Matches the path discussed and agreed on last week[1]. Carlos will follow up with the matching patch to GLIBC[2]. I'm happy to if he's busy. OK for trunk? -- Michael [1] http://sourceware.org/ml/libc-ports/2012-04/msg00060.html [2] http://sourceware.org/ml/libc-ports/2012-04/msg00064.html 2012-04-23 Michael Hope michael.h...@linaro.org * config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define. (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT): Define. (GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path. diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h index 80bd825..3ddf812 100644 --- a/gcc/config/arm/linux-eabi.h +++ b/gcc/config/arm/linux-eabi.h @@ -62,7 +62,11 @@ /* Use ld-linux.so.3 so that it will be possible to run classic GNU/Linux binaries on an EABI system. */ #undef GLIBC_DYNAMIC_LINKER -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3 +#define GLIBC_DYNAMIC_LINKER \ + %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \ +%{!mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT } /* At this point, bpabi.h will have clobbered LINK_SPEC. We want to use the GNU/Linux version, not the generic BPABI version. */ I think this should handle having the hard-float variant as the default more gracefully, so something like: -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3 +#if TARGET_DEFAULT_FLOAT_ABI == ARM_FLOAT_ABI_HARD +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_HARD_FLOAT +#else +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT +#endif +#define GLIBC_DYNAMIC_LINKER \ + %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \ +%{mfloat-abi=soft*: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT } \ +%{!mfloat-abi=*: GLIBC_DYNAMIC_LINKER_DEFAULT } But I haven't tested any of that. It tests just fine when setting the float ABI at configure time or explicitly. Updated patch is below. I prefer the original, shorter version but the new one is easier for someone else to understand. Your version would work for configuration using --with-float-abi, but not for built-in setting with hard-float as default. OK for trunk? OK. -- Michael 2012-04-24 Michael Hope michael.h...@linaro.org Richard Earnshaw rearn...@arm.com * config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT): Define. (GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define. (GLIBC_DYNAMIC_LINKER_DEFAULT): Define. (GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path. diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h index 80bd825..2ace6f0 100644 --- a/gcc/config/arm/linux-eabi.h +++ b/gcc/config/arm/linux-eabi.h @@ -62,7 +62,17 @@ /* Use ld-linux.so.3 so that it will be possible to run classic GNU/Linux binaries on an EABI system. */ #undef GLIBC_DYNAMIC_LINKER -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3 +#if TARGET_DEFAULT_FLOAT_ABI == ARM_FLOAT_ABI_HARD +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_HARD_FLOAT +#else +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT +#endif +#define GLIBC_DYNAMIC_LINKER \ + %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \ +%{mfloat-abi=soft*: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT } \ +%{!mfloat-abi=*: GLIBC_DYNAMIC_LINKER_DEFAULT } /* At this point, bpabi.h will have clobbered LINK_SPEC. We want to use the GNU/Linux version, not the generic BPABI version. */
Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch
On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello, here is a port of Android support patch (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7 branch. Is it OK? This kind of re-org is not appropriate for the branch. Could you please specify what exactly is wrong with it? Do you mean adding linux-common.h? Ilya Thanks, Richard. Bootstrap and check passed on linux-x86_64. Thanks, Ilya --- 2012-04-24 Enkovich Ilya ilya.enkov...@intel.com * config/i386/linux-common.h: New. * config.gcc: Add i386/linux-common.h before all i386/linux.h and i386/linux64.h usages. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New. (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC. (GNU_USER_TARGET_CC1_SPEC): New. (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC. (GNU_USER_TARGET_MATHFILE_SPEC): New. (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC. * config/i386/gnu-user64.h: Likewise.
Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch
On Tue, Apr 24, 2012 at 3:44 PM, Ilya Enkovich enkovich@gmail.com wrote: On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello, here is a port of Android support patch (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7 branch. Is it OK? This kind of re-org is not appropriate for the branch. Could you please specify what exactly is wrong with it? Do you mean adding linux-common.h? The 4.7 branch is restricted to fixing regression bugs. This does not appear to fix a bug nor a regression but adds a feature. Richard. Ilya Thanks, Richard. Bootstrap and check passed on linux-x86_64. Thanks, Ilya --- 2012-04-24 Enkovich Ilya ilya.enkov...@intel.com * config/i386/linux-common.h: New. * config.gcc: Add i386/linux-common.h before all i386/linux.h and i386/linux64.h usages. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New. (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC. (GNU_USER_TARGET_CC1_SPEC): New. (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC. (GNU_USER_TARGET_MATHFILE_SPEC): New. (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC. * config/i386/gnu-user64.h: Likewise.
Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch
On Tue, Apr 24, 2012 at 3:44 PM, Ilya Enkovich enkovich@gmail.com wrote: On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Hello, here is a port of Android support patch (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7 branch. Is it OK? This kind of re-org is not appropriate for the branch. Could you please specify what exactly is wrong with it? Do you mean adding linux-common.h? The 4.7 branch is restricted to fixing regression bugs. This does not appear to fix a bug nor a regression but adds a feature. OK, thanks! Ilya Richard. Ilya Thanks, Richard. Bootstrap and check passed on linux-x86_64. Thanks, Ilya --- 2012-04-24 Enkovich Ilya ilya.enkov...@intel.com * config/i386/linux-common.h: New. * config.gcc: Add i386/linux-common.h before all i386/linux.h and i386/linux64.h usages. * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New. (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC. (GNU_USER_TARGET_CC1_SPEC): New. (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC. (GNU_USER_TARGET_MATHFILE_SPEC): New. (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC. * config/i386/gnu-user64.h: Likewise.
[PATCH] Fix release checking
Committed as obvious. Richard. 2012-04-12 Richard Guenther rguent...@suse.de * tree-if-conv.c (main_tree_if_conversion): Move bb under ENABLE_CHECKING. Index: gcc/tree-if-conv.c === --- gcc/tree-if-conv.c (revision 186760) +++ gcc/tree-if-conv.c (working copy) @@ -1779,7 +1779,6 @@ main_tree_if_conversion (void) struct loop *loop; bool changed = false; unsigned todo = 0; - basic_block bb; if (number_of_loops () = 1) return 0; @@ -1796,8 +1795,11 @@ main_tree_if_conversion (void) free_dominance_info (CDI_POST_DOMINATORS); #ifdef ENABLE_CHECKING - FOR_EACH_BB (bb) -gcc_assert (!bb-aux); + { +basic_block bb; +FOR_EACH_BB (bb) + gcc_assert (!bb-aux); + } #endif return todo;
Re: [PATCH] Make sizetypes no longer sign-extending
On Tue, 24 Apr 2012, Richard Guenther wrote: I've been carrying this patch for quite some while now and really want to go forward with it - the problem is that while all default languages work fine after this patch Ada shows some testsuite regressions. I've had various hacks/workarounds throughout the Ada frontend for them, but lost track of what fixed what and they all felt like hacks anyway. Thus - I know the patch will add Ada testsuite regressions. But it will not break Ada bootstrap. Ada is not in the set of default languages, nor is it considered release critical. Are the Ada folks happy with helping to fix the fallout after-the-fact (I got Eric to fix the bootstrap issues that were first present - thanks for that)? I am happy to revisit my hacks/workarounds and post them, but it will be ultimatively easier to review them if you can see the FAIL for yourself (there are some workarounds/hacks posted on the mailinglist for previous attempts IIRC). Thanks for your consideration. The patch is currently under re-testing (it needs the 2nd patch below, which was already approved but back in time broke Ada bootstrap - I didn't verify if that still occurs). To followup myself - bootstrap with just the 2nd patch is still broken: /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic -W -Wall -gnatpg -nostdinc -m32 s-secsta.adb -o s-secsta.o s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk' is too large Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size); ^ make[9]: *** [s-secsta.o] Error 1 And the following is the list of regressions introduced by the combined patch set: === acats tests === FAIL: a71004a FAIL: c36204d FAIL: c36205l FAIL: c37404b FAIL: c41107a FAIL: c41204a FAIL: c43204c FAIL: c43204e FAIL: c43204f FAIL: c43204g FAIL: c43204h FAIL: c43204i FAIL: c52102a FAIL: c52102c FAIL: c64103c FAIL: c64103d FAIL: c64106a FAIL: c95087a FAIL: cc1224a FAIL: cc1311a FAIL: cc3106b FAIL: cc3224a FAIL: cd2a31a === acats Summary === # of expected passes2297 # of unexpected failures23 === gnat tests === Running target unix/ FAIL: gnat.dg/array11.adb (test for warnings, line 12) FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) FAIL: gnat.dg/loop_optimization3.adb execution test FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) FAIL: gnat.dg/test_8bitlong_overflow.adb execution test === gnat Summary for unix/ === # of expected passes1089 # of unexpected failures8 # of expected failures 13 # of unsupported tests 2 Running target unix//-m32 FAIL: gnat.dg/array11.adb (test for warnings, line 12) FAIL: gnat.dg/loop_optimization3.adb (test for excess errors) FAIL: gnat.dg/loop_optimization3.adb execution test FAIL: gnat.dg/object_overflow.adb (test for warnings, line 8) FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2 FAIL: gnat.dg/return3.adb scan-assembler loc 1 6 FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors) FAIL: gnat.dg/test_8bitlong_overflow.adb execution test === gnat Summary for unix//-m32 === # of expected passes1089 # of unexpected failures8 # of expected failures 13 # of unsupported tests 2 === gnat Summary === # of expected passes2178 # of unexpected failures16 # of expected failures 26 # of unsupported tests 4 Most of the ACATS errors are raised STORAGE_ERROR : object too large or error: size of variable '...' is too large. The gnat testsuite adds warning: Storage_Error will be raised at run time to this. I remember one workaround (which usually involves re-setting TREE_OVERFLOW at strathegic places) fixes most of ACATS. I'll try to isolate that (but it's a hack and does not feel right). Richard. 2012-03-06 Richard Guenther rguent...@suse.de * fold-const.c (div_if_zero_remainder): sizetypes no longer sign-extend. (int_const_binop_1): New worker for int_const_binop with overflowable parameter. Pass it through to force_fit_type_double. (int_const_binop): Wrap around int_const_binop_1 with overflowable equal to one. (size_binop_loc): Call int_const_binop_1 with overflowable equal to minus one, forcing overflow detection for even unsigned types. (extract_muldiv_1): Remove bogus TYPE_IS_SIZETYPE
[PATCH] Fix PR53085
This fixes redundant store elimination in FRE/PRE to preserve volatile stores. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to the trunk and the branch. Richard. 2012-04-24 Richard Guenther rguent...@suse.de PR tree-optimization/53085 * tree-ssa-pre.c (eliminate): Do not eliminate volatile redundant stores. * g++.dg/torture/pr53085.C: New testcase. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 186761) --- gcc/tree-ssa-pre.c (working copy) *** eliminate (void) *** 4409,4414 --- 4409,4415 has the same value number as its rhs. If so, the store is dead. */ else if (gimple_assign_single_p (stmt) + !gimple_has_volatile_ops (stmt) !is_gimple_reg (gimple_assign_lhs (stmt)) (TREE_CODE (rhs) == SSA_NAME || is_gimple_min_invariant (rhs))) Index: gcc/testsuite/g++.dg/torture/pr53085.C === *** gcc/testsuite/g++.dg/torture/pr53085.C (revision 0) --- gcc/testsuite/g++.dg/torture/pr53085.C (revision 0) *** *** 0 --- 1,17 + // { dg-do compile } + // { dg-skip-if { *-*-* } { -fno-fat-lto-objects } { } } + // { dg-options -fdump-tree-optimized } + + class aa{ + void f(); + private: + volatile int a; + }; + + void aa::f(){ + a=1; + a=1; + } + + // { dg-final { scan-tree-dump-times a ={v} 1; 2 optimized } } + // { dg-final { cleanup-tree-dump optimized } }
Go patch committed: Reject invalid composite literals
This patch to the Go frontend adds a check to reject composite literals that the language syntax forbids. The parser was able to parse some cases unambiguously because it already knew that some name was a type, but since that can not always be known that parse is not permitted. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r 91db663343df go/parse.cc --- a/go/parse.cc Mon Apr 23 22:54:27 2012 -0700 +++ b/go/parse.cc Tue Apr 24 07:50:51 2012 -0700 @@ -2865,7 +2865,16 @@ { if (this-peek_token()-is_op(OPERATOR_LCURLY)) { - if (is_parenthesized) + if (!may_be_composite_lit) + { + Type* t = ret-type(); + if (t-named_type() != NULL + || t-forward_declaration_type() != NULL) + error_at(start_loc, + _(parentheses required around this composite literal + to avoid parsing ambiguity)); + } + else if (is_parenthesized) error_at(start_loc, cannot parenthesize type in composite literal); ret = this-composite_lit(ret-type(), 0, ret-location());
Re: [C/C++] Do not pretty-print expressions with caret diagnostics for wrong function call
On 04/22/2012 03:49 PM, Manuel López-Ibáñez wrote: On 22 April 2012 21:20, Jason Merrillja...@redhat.com wrote: When we aren't printing the expression, please print the type instead; in a template it might not be clear what the type of the expression we're complaining about works out to be. I can do that. Should I also change it in the C FE? I think it wouldn't hurt. Jason
Re: PowerPC prologue and epilogue 4
On Tue, Apr 17, 2012 at 11:13 AM, Alan Modra amo...@gmail.com wrote: This provides some protection against misuse of r0, r11 and r12. I found it useful when enabling out-of-line saves for large frames. ;-) * config/rs6000/rs6000.c (START_USE, END_USE, NOT_INUSE): Define. (rs6000_emit_prologue): Use the above to catch register overlap. This patch is okay. Thanks, David
Re: PowerPC prologue and epilogue 5
On Thu, Apr 19, 2012 at 11:36 AM, Alan Modra amo...@gmail.com wrote: On Thu, Apr 19, 2012 at 08:00:15PM +0930, Alan Modra wrote: On Wed, Apr 18, 2012 at 12:45:16AM +0930, Alan Modra wrote: This enables out-of-line save and restore for large frames, and for ABI_AIX when using the static chain. Further testing revealed two problems when compiling nested functions. 1) The logic I had for cr_save_regno is wrong, resulting in one of my NOT_INUSE asserts triggering. Fixed in this revised patch. Bootstrapped etc. powerpc-linux. 2) In some cases the prologue uses in-line saves while the epilogue uses out-of-line restores. This can lead to restoring regs that haven't been saved. This turned out to be a pre-existing problem, patch in PR50340. * config/rs6000/rs6000.c (rs6000_savres_strategy): Allow out-of-line save/restore for large frames. Don't disable out-of-line saves on ABI_AIX when using static chain reg. (rs6000_emit_prologue): Adjust cr_save_regno on ABI_AIX to not clobber static chain reg, and tweak for out-of-line gpr saves that use r1. This patch is okay. Thanks, David
Re: [C++ Patch] PR 52363
OK, thanks! Jason
Re: [PATCH, powerpc] Fix PR47197
2012-04-24 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/47197 * config/rs6000/rs6000-c.c (fully_fold_convert): New function. (altivec_build_resolved_builtin): Call fully_fold_convert. 2012-04-24 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/47197 * gcc.target/powerpc/pr47197.c: New test. Okay. Thanks, David
[PATCH, powerpc] Fix PR47197
This fixes an error wherein a nontrivial expression oassed to an Altivec built-in results in an ICE, following Joseph Myers's suggested approach in the bugzilla. Bootstrapped and tested with no new regressions on powerpc64-unknown-linux-gnu. Ok for trunk? Thanks, Bill gcc: 2012-04-24 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/47197 * config/rs6000/rs6000-c.c (fully_fold_convert): New function. (altivec_build_resolved_builtin): Call fully_fold_convert. gcc/testsuite: 2012-04-24 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/47197 * gcc.target/powerpc/pr47197.c: New test. Index: gcc/testsuite/gcc.target/powerpc/pr47197.c === --- gcc/testsuite/gcc.target/powerpc/pr47197.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr47197.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -maltivec } */ + +/* Compile-only test to ensure that expressions can be passed to + Altivec builtins without error. */ + +#include altivec.h + +void func(unsigned char *buf, unsigned len) +{ +vec_dst(buf, (len = 256 ? 0 : len) | 512, 2); +} Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 186761) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -3421,6 +3421,22 @@ rs6000_builtin_type_compatible (tree t, int id) } +/* In addition to calling fold_convert for EXPR of type TYPE, also + call c_fully_fold to remove any C_MAYBE_CONST_EXPRs that could be + hiding there (PR47197). */ + +static tree +fully_fold_convert (tree type, tree expr) +{ + tree result = fold_convert (type, expr); + bool maybe_const = true; + + if (!c_dialect_cxx ()) +result = c_fully_fold (result, false, maybe_const); + + return result; +} + /* Build a tree for a function call to an Altivec non-overloaded builtin. The overloaded builtin that matched the types and args is described by DESC. The N arguments are given in ARGS, respectively. @@ -3470,18 +3486,18 @@ altivec_build_resolved_builtin (tree *args, int n, break; case 1: call = build_call_expr (impl_fndecl, 1, - fold_convert (arg_type[0], args[0])); + fully_fold_convert (arg_type[0], args[0])); break; case 2: call = build_call_expr (impl_fndecl, 2, - fold_convert (arg_type[0], args[0]), - fold_convert (arg_type[1], args[1])); + fully_fold_convert (arg_type[0], args[0]), + fully_fold_convert (arg_type[1], args[1])); break; case 3: call = build_call_expr (impl_fndecl, 3, - fold_convert (arg_type[0], args[0]), - fold_convert (arg_type[1], args[1]), - fold_convert (arg_type[2], args[2])); + fully_fold_convert (arg_type[0], args[0]), + fully_fold_convert (arg_type[1], args[1]), + fully_fold_convert (arg_type[2], args[2])); break; default: gcc_unreachable ();
Re: default to strict dwarf-2 on powerpc-vxworks
On Fri, Apr 20, 2012 at 09:57:24AM +0200, Jakub Jelinek wrote: Looks good to me, but IMHO we should go further at this point, and default to dwarf_version 4 unless overridden (and override also on Darwin to too due to its tools not capable of handling it), probably together with making -fno-debug-types-section the default (since while DWARF 4 DW_FORM_* etc. support is in gdb/libdw/valgrind/etc. for quite some time, e.g. .debug_types support is very recent, and furthermore on lots of programs/DSOs .debug_types actually increases size instead of decreases. If a post-linking utility is used undoing the extra duplication caused by .debug_types (yeah, I know Cary said he'll work on it) is going to be very hard. I've noticed also lots of intra-CU redundancies with current state of .debug_types, e.g. lots of DW_TAG_{const,pointer,reference}_type DIEs with the same DW_FORM_ref_sig8 hash). Jason, are you ok with Olivier's patch? Talked to Jason about this on IRC last night, he is ok with that, so your patch is ok for trunk. Jakub
Re: PR c/44774 -Werror=edantic
On 04/22/2012 03:38 PM, Manuel López-Ibáñez wrote: Then, let's say we have one of such options. For example, let's call it -Wx for this example. If the behaviour is consistent with other group options like -Wall, then: -Wx is enabled by default (like now) -Wno-pedantic does not disable -Wx (like now) -Wno-all doesn't disable the group elements? I'd think it should, or we should warn that it's a useless option. -Werror=pedantic implies -Werror=x ? (to mimic -pedantic-errors) I would think so. For example, -Wmain is enabled by default but also by -Wall and -pedantic. However, -Werror=all does not enable -Werror=main. Is this a bug or the desired behaviour? Sounds like a bug to me. -Wno-error=pedantic -Werror implies -Werror=x or -Wno-error=x? The documentation only says that -Wno-error= does not imply anything. This should work the same as -Wno-error=all does for errors that are part of the all group. I think the bit about -Wno-error not implying anything just means that -Wno-error=foo does not imply either -Wfoo or -Wno-foo. Jason
Re: PR c/44774 -Werror=edantic
On 24 April 2012 17:53, Jason Merrill ja...@redhat.com wrote: On 04/22/2012 03:38 PM, Manuel López-Ibáñez wrote: Then, let's say we have one of such options. For example, let's call it -Wx for this example. If the behaviour is consistent with other group options like -Wall, then: -Wx is enabled by default (like now) -Wno-pedantic does not disable -Wx (like now) -Wno-all doesn't disable the group elements? I'd think it should, or we should warn that it's a useless option. -Wall -Wno-all = no effect. However, -Wno-all -Wuninitialized = -Wuninitialized (I think this is intended) -Wuninitialized -Wno-all = -Wno-uninitialized (I think this is a bug) For example, -Wmain is enabled by default but also by -Wall and -pedantic. However, -Werror=all does not enable -Werror=main. Is this a bug or the desired behaviour? Sounds like a bug to me. It seems there are a lot of bugs to be fixed before we can even consider to make -Werror=pedantic equivalent to -pedantic-errors. -Wno-error=pedantic -Werror implies -Werror=x or -Wno-error=x? The documentation only says that -Wno-error= does not imply anything. This should work the same as -Wno-error=all does for errors that are part of the all group. -Wuninitialized -Wno-error=all -Werror = -Werror=uninitialized but I think this may be a bug and -Wno-error=all is simply ignored as -Werror=all is. Anyway, the only reliable way to solve all these bugs is to encode group options in the .opt files, and this seems to be non-trivial. http://gcc.gnu.org/PR53072 If anyone wishes to give it a try, you are very welcome. Cheers, Manuel.
[Patch, ARM][0/2] Prepartion for Epilogue in RTL
The following patches perform code reorganization in preparation for epilogue generation in RTL. [1/2] move the code of the special predicates load_multiple_operation and store_multiple_operation into a separate function ldm_stm_operation_p [2/2] generalize ldm_stm_operation_p No regression on qemu for arm-none-eabi neon softfp arm/thumb. Ok for trunk? Thanks, Greta
Go patch committed: Fix order of evaluation
This patch to the Go frontend fixes the order of evaluation for a statement like m[0] = len(m) when m is a map. I added a test case for this to the master testsuite. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r 5b5853844f00 go/gogo.cc --- a/go/gogo.cc Tue Apr 24 07:52:37 2012 -0700 +++ b/go/gogo.cc Tue Apr 24 09:22:09 2012 -0700 @@ -2216,12 +2216,18 @@ Expression::traverse(init, find_eval_ordering); } - if (find_eval_ordering.size() = 1) -{ - // If there is only one expression with a side-effect, we can - // leave it in place. - return TRAVERSE_CONTINUE; -} + size_t c = find_eval_ordering.size(); + if (c == 0) +return TRAVERSE_CONTINUE; + + // If there is only one expression with a side-effect, we can + // usually leave it in place. However, for an assignment statement, + // we need to evaluate an expression on the right hand side before + // we evaluate any index expression on the left hand side, so for + // that case we always move the expression. Otherwise we mishandle + // m[0] = len(m) where m is a map. + if (c == 1 s-classification() != Statement::STATEMENT_ASSIGNMENT) +return TRAVERSE_CONTINUE; bool is_thunk = s-thunk_statement() != NULL; for (Find_eval_ordering::const_iterator p = find_eval_ordering.begin();
[Patch, ARM][1/2] add ldm_stm_operation_p
Move the code of the special predicates load_multiple_operation and store_multiple_operation into a separate function. No change in functionality. gcc/ChangeLog 2012-04-24 Ian Bolton ian.bolton at arm.com Sameera Deshpande sameera.deshpande at arm.com Greta Yorsh greta.yorsh at arm.com * config/arm/arm-protos.h (ldm_stm_operation_p): New declaration. * config/arm/arm.c (ldm_stm_operation_p): New function. * config/arm/predicates.md (load_multiple_operation): Update predicate. (store_multiple_operation): Likewise. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 900d09a..7da0e90 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -62,6 +62,7 @@ extern bool arm_legitimize_reload_address (rtx *, enum machine_mode, int, int, extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int, int); extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int); +extern bool ldm_stm_operation_p (rtx, bool); extern int arm_const_double_rtx (rtx); extern int neg_const_double_rtx_ok_for_fpa (rtx); extern int vfp3_const_double_rtx (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e5779ce..74f4abf 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -10138,6 +10138,150 @@ adjacent_mem_locations (rtx a, rtx b) return 0; } +/* Return true if OP is a valid load or store multiple operation. LOAD is true + for load operations, false for store operations. + The pattern we are trying to match for load is: + [(SET (R_d0) (MEM (PLUS (addr) (offset + (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment + : + : + (SET (R_dn) (MEM (PLUS (addr) (offset + n * reg_increment + ] + where + 1. If offset is 0, first insn should be (SET (R_d0) (MEM (src_addr))). + 2. REGNO (R_d0) REGNO (R_d1) ... REGNO (R_dn). + 3. If consecutive is TRUE, then for kth register being loaded, + REGNO (R_dk) = REGNO (R_d0) + k. + The pattern for store is similar. */ +bool +ldm_stm_operation_p (rtx op, bool load) +{ + HOST_WIDE_INT count = XVECLEN (op, 0); + rtx reg, mem, addr; + unsigned regno; + HOST_WIDE_INT i = 1, base = 0, offset = 0; + rtx elt; + bool addr_reg_in_reglist = false; + bool update = false; + int reg_increment; + int offset_adj; + + reg_increment = 4; + offset_adj = 0; + + if (count = 1 + || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET + || (load !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj) +return false; + + /* Check if this is a write-back. */ + elt = XVECEXP (op, 0, offset_adj); + if (GET_CODE (SET_SRC (elt)) == PLUS) +{ + i++; + base = 1; + update = true; + + /* The offset adjustment must be the number of registers being + popped times the size of a single register. */ + if (!REG_P (SET_DEST (elt)) + || !REG_P (XEXP (SET_SRC (elt), 0)) + || (REGNO (SET_DEST (elt)) != REGNO (XEXP (SET_SRC (elt), 0))) + || !CONST_INT_P (XEXP (SET_SRC (elt), 1)) + || INTVAL (XEXP (SET_SRC (elt), 1)) != + ((count - 1 - offset_adj) * reg_increment)) +return false; +} + + i = i + offset_adj; + base = base + offset_adj; + /* Perform a quick check so we don't blow up below. */ + if (count = i) +return false; + + elt = XVECEXP (op, 0, i - 1); + if (GET_CODE (elt) != SET) +return false; + + if (load) +{ + reg = SET_DEST (elt); + mem = SET_SRC (elt); +} + else +{ + reg = SET_SRC (elt); + mem = SET_DEST (elt); +} + + if (!REG_P (reg) || !MEM_P (mem)) +return false; + + regno = REGNO (reg); + addr = XEXP (mem, 0); + if (GET_CODE (addr) == PLUS) +{ + if (!CONST_INT_P (XEXP (addr, 1))) + return false; + + offset = INTVAL (XEXP (addr, 1)); + addr = XEXP (addr, 0); +} + + if (!REG_P (addr)) +return false; + + for (; i count; i++) +{ + elt = XVECEXP (op, 0, i); + if (GET_CODE (elt) != SET) +return false; + + if (load) +{ + reg = SET_DEST (elt); + mem = SET_SRC (elt); +} + else +{ + reg = SET_SRC (elt); + mem = SET_DEST (elt); +} + + if (!REG_P (reg) + || GET_MODE (reg) != SImode + || REGNO (reg) = regno + || !MEM_P (mem) + || GET_MODE (mem) != SImode + || ((GET_CODE (XEXP (mem, 0)) != PLUS + || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr) + || !CONST_INT_P (XEXP (XEXP (mem, 0), 1)) + || (INTVAL (XEXP (XEXP (mem, 0), 1)) != + offset + (i - base) * reg_increment)) + (!REG_P (XEXP (mem, 0)) + || offset + (i - base) * reg_increment != 0))) +return false; + + regno = REGNO (reg); + if (regno ==
[Patch, ARM][2/2] generalize ldm_stm_operation_p
Generalize ldm_stm_operation_p with additional parameters that will be used by epilogue patterns: * machine mode to support both SImode and DFmode registers * flag to request consecutive registers in the register list * flag to indicate whether PC in the register list gcc/ChangeLog 2012-04-24 Ian Bolton ian.bolton at arm.com Sameera Deshpande sameera.deshpande at arm.com Greta Yorsh greta.yorsh at arm.com * config/arm/arm-protos.h (ldm_stm_operation_p): New parameters. * config/arm/arm.c (ldm_stm_operation_p): New parameters. * config/arm/predicates.md (load_multiple_operation): Add arguments. (store_multiple_operation): Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 753e109..efb5b9f 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -62,7 +62,8 @@ extern bool arm_legitimize_reload_address (rtx *, enum machine_mode, int, int, extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int, int); extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int); -extern bool ldm_stm_operation_p (rtx, bool); +extern bool ldm_stm_operation_p (rtx, bool, enum machine_mode mode, + bool, bool); extern int arm_const_double_rtx (rtx); extern int neg_const_double_rtx_ok_for_fpa (rtx); extern int vfp3_const_double_rtx (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4216d05..5477de2 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -10139,7 +10139,9 @@ adjacent_mem_locations (rtx a, rtx b) } /* Return true if OP is a valid load or store multiple operation. LOAD is true - for load operations, false for store operations. + for load operations, false for store operations. CONSECUTIVE is true + if the register numbers in the operation must be consecutive in the register + bank. RETURN_PC is true if value is to be loaded in PC. The pattern we are trying to match for load is: [(SET (R_d0) (MEM (PLUS (addr) (offset (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment @@ -10154,20 +10156,31 @@ adjacent_mem_locations (rtx a, rtx b) REGNO (R_dk) = REGNO (R_d0) + k. The pattern for store is similar. */ bool -ldm_stm_operation_p (rtx op, bool load) +ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode, + bool consecutive, bool return_pc) { HOST_WIDE_INT count = XVECLEN (op, 0); rtx reg, mem, addr; unsigned regno; + unsigned first_regno; HOST_WIDE_INT i = 1, base = 0, offset = 0; rtx elt; bool addr_reg_in_reglist = false; bool update = false; int reg_increment; int offset_adj; + int regs_per_val; - reg_increment = 4; - offset_adj = 0; + /* If not in SImode, then registers must be consecutive + (e.g., VLDM instructions for DFmode). */ + gcc_assert ((mode == SImode) || consecutive); + /* Setting return_pc for stores is illegal. */ + gcc_assert (!return_pc || load); + + /* Set up the increments and the regs per val based on the mode. */ + reg_increment = GET_MODE_SIZE (mode); + regs_per_val = reg_increment / 4; + offset_adj = return_pc ? 1 : 0; if (count = 1 || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET @@ -10195,9 +10208,11 @@ ldm_stm_operation_p (rtx op, bool load) i = i + offset_adj; base = base + offset_adj; - /* Perform a quick check so we don't blow up below. */ - if (count = i) -return false; + /* Perform a quick check so we don't blow up below. If only one reg is loaded, + success depends on the type: VLDM can do just one reg, + LDM must do at least two. */ + if ((count = i) (mode == SImode)) + return false; elt = XVECEXP (op, 0, i - 1); if (GET_CODE (elt) != SET) @@ -10218,6 +10233,7 @@ ldm_stm_operation_p (rtx op, bool load) return false; regno = REGNO (reg); + first_regno = regno; addr = XEXP (mem, 0); if (GET_CODE (addr) == PLUS) { @@ -10249,10 +10265,13 @@ ldm_stm_operation_p (rtx op, bool load) } if (!REG_P (reg) - || GET_MODE (reg) != SImode + || GET_MODE (reg) != mode || REGNO (reg) = regno + || (consecutive + (REGNO (reg) != + (unsigned int) (first_regno + regs_per_val * (i - base || !MEM_P (mem) - || GET_MODE (mem) != SImode + || GET_MODE (mem) != mode || ((GET_CODE (XEXP (mem, 0)) != PLUS || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr) || !CONST_INT_P (XEXP (XEXP (mem, 0), 1)) diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 20a64ec..428f9e0 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -380,13 +380,17 @@ (define_special_predicate load_multiple_operation (match_code parallel) { - return ldm_stm_operation_p (op,
move default_tree_printer out of toplev.c
It seems an obvious cleanup to me. OK? 2012-04-24 Manuel López-Ibáñez m...@gcc.gnu.org gcc/ * tree-pretty-print.h (default_tree_printer): Do not declare. * tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and intl.h. (default_tree_diagnostic_starter): Make static. (default_tree_printer): Move to here. Make static. (tree_diagnostics_defaults): New. * tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare. * tree.c (free_lang_data): Use tree_diagnostics_defaults. * toplev.c: Do not include tree-pass.h. (default_tree_printer): Move from here. (general_init): Use tree_diagnostics_defaults. move-tree-default-printer.diff Description: Binary data
[Patch, ARM] rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue
Rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue. In preparation for epilogue generation in RTL and anyway it's the right name for this function. Ok for trunk? Thanks, Greta gcc/ChangeLog 2012-04-24 Ian Bolton ian.bolton at arm.com Sameera Deshpande sameera.deshpande at arm.com Greta Yorsh greta.yorsh at arm.com * config/arm/arm-protos.h (thumb_unexpanded_epilogue): Rename to... (thumb1_unexpanded_epilogue): ...this. * config/arm/arm.c (thumb_unexpanded_epilogue): Rename to... (thumb1_unexpanded_epilogue): ...this. * config/arm/arm.md (thumb_unexpanded_epilogue): Rename to... (thumb1_unexpanded_epilogue): ...this.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 21a89aa..0c4bb96 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -178,7 +178,7 @@ extern int arm_float_words_big_endian (void); /* Thumb functions. */ extern void arm_init_expanders (void); -extern const char *thumb_unexpanded_epilogue (void); +extern const char *thumb1_unexpanded_epilogue (void); extern void thumb1_expand_prologue (void); extern void thumb1_expand_epilogue (void); extern const char *thumb1_output_interwork (void); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5d9cbc5..0b7f3c8 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22032,7 +22032,7 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offsets, bool for_prologue) /* The bits which aren't usefully expanded as rtl. */ const char * -thumb_unexpanded_epilogue (void) +thumb1_unexpanded_epilogue (void) { arm_stack_offsets *offsets; int regno; diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4f6d965..ed33c9b 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -10673,7 +10673,7 @@ if (TARGET_32BIT) return arm_output_epilogue (NULL); else /* TARGET_THUMB1 */ -return thumb_unexpanded_epilogue (); +return thumb1_unexpanded_epilogue (); ; Length is absolute worst case [(set_attr length 44)
[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
Hello, PR 52633 is caused by bad interaction between two different vectorizer pattern recognition passed. A minimal test case is: void test (unsigned short *x, signed char *y) { int i; for (i = 0; i 32; i++) x[i] = (short) (y[i] 5); } built with cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp on a arm-linux-gnueabi target. Before the vectorizer, we have something like: short unsigned int D.4976; int D.4975; int D.4974; signed char D.4973; signed char * D.4972; short unsigned int * D.4970; [snip] D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; D.4975_10 = D.4974_9 5; D.4976_11 = (short unsigned int) D.4975_10; *D.4970_5 = D.4976_11; The pattern recognizer now goes through its list of patterns it tries to detect. The first successful one is vect_recog_over_widening_pattern. This will annotate the statements (via related_stmt fields): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; -- D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 5; -- patt.16_31 = D.4992_26 5 D.4976_11 = (short unsigned int) D.4975_10; -- D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; In the next step, vect_recog_widen_shift_pattern *also* matches, and creates a new annotation for the shift statement (using a widening shift): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; -- D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 5; [-- patt.16_31 = D.4992_26 5] -- patt.17_33 = D.4973_8 w 5 D.4976_11 = (short unsigned int) D.4975_10; -- D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; Since the original statement can only point to a single related_stmt, the statement setting patt.16_31 is now longer refered to as related_stmt by any other statement. This causes it to no longer be considered relevant for the vectorizer. However, the statement: -- D.4994_32 = (short unsigned int) patt.16_31 *is* still considered relevant. While analysing it, however, the vectorizer follows through to the def statement for patt.16_31, and gets quite confused to find that it doesn't have a vectype (because it wasn't considered by the vectorizer). The symptom is a failing assertion gcc_assert (*vectype != NULL_TREE); in vect_is_simple_use_1. Now, it seems quite unusual for multiple patterns to match for a single original statement. In fact, most pattern recognizers explicitly refuse to even consider statements that were already recognized. However, vect_recog_widen_shift_pattern makes an exception: /* This statement was also detected as over-widening operation (it can't be any other pattern, because only over-widening detects shifts). LAST_STMT is the final type demotion statement, but its related statement is shift. We analyze the related statement to catch cases: orig code: type a_t; itype res; TYPE a_T, res_T; S1 a_T = (TYPE) a_t; S2 res_T = a_T CONST; S3 res = (itype)res_T; (size of type * 2 = size of itype and size of itype * 2 = size of TYPE) code after over-widening pattern detection: S1 a_T = (TYPE) a_t; -- a_it = (itype) a_t; S2 res_T = a_T CONST; S3 res = (itype)res_T; --- LAST_STMT -- res = a_it CONST; after widen_shift: S1 a_T = (TYPE) a_t; -- a_it = (itype) a_t; - redundant S2 res_T = a_T CONST; S3 res = (itype)res_T; -- res = a_t w CONST; i.e., we replace the three statements with res = a_t w CONST. */ If everything were indeed as described in that comment, things would work out fine. However, what is described above as code after over-widening pattern detection is only one of two possible outcomes of that latter pattern; the other is the one that happens in the current test case, where we still have a final type conversion left after applying the over-widening pattern. I guess one could try to distiguish the two cases somehow and handle both; but the overall approach seems quite fragile to me; it doesn't look really maintainable to have to rely on so many details of the operation of one particular pattern detection function while writing another one, or else risk creating subtle problems like the one described above. So I was wondering why vect_recog_widen_shift_pattern tries to take advantage of an already recognized over-widening pattern. But indeed, if it does not, it will generate less efficient code in cases like the above test case: by itself vect_recog_widen_shift_pattern, would generate code to expand the char to a short, then do a widening shift resulting in an int, followed by narrowing back down to a short. However, even so, it might actually be preferable to just handle such cases within vect_recog_widen_shift_pattern itself. Indeed, the routine already looks for another
[PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions
Hello! Back to converting x86 to post-reload compare elimination pass. Arithmetic operations in x86_64 can implicitly zero extend the result, and set flags according to the non-extended result. Following testcase should exercise both features: --cut here-- void foo (long int, int); void test (unsigned int a, unsigned int b) { long int x = a + b; int f = a + b 0; foo (x, f); } --cut here-- However, with pre-reload implementation, gcc was able to merge operation + compare, while ignoring implicit extension: test: addl%esi, %edi # 8 *addsi_2/3 setne %sil# 19*setcc_qi movl%edi, %edi # 11*zero_extendsidi2_rex64/1 movzbl %sil, %esi # 20*zero_extendqisi2 jmp foo # 14*sibcall Unpatched post-reload compare elimination was able to merge operation + zero extension (please note that attached WIP target patch is needed): test: addl%esi, %edi # 7 addsi_1_zext/2 xorl%esi, %esi # 22*movsi_xor testl %edi, %edi # 23*cmpsi_ccno_1/1 setne %sil# 24*setcc_qi_slp jmp foo # 14*sibcall And patched post-reload compare finally produces optimal code: test: addl%esi, %edi # 7 *addsi_2_zext/2 setne %sil# 19*setcc_qi movzbl %sil, %esi # 20*zero_extendqisi2 jmp foo # 14*sibcall Where the operation looks like: #(insn:TI 7 9 19 2 (parallel [ #(set (reg:DI 5 di) #(zero_extend:DI (plus:SI (reg/v:SI 4 si [orig:64 b ] [64]) #(reg/v:SI 5 di [orig:63 a ] [63] #(set (reg:CCZ 17 flags) #(compare:CCZ (plus:SI (reg/v:SI 4 si [orig:64 b ] [64]) #(reg/v:SI 5 di [orig:63 a ] [63])) #(const_int 0 [0]))) #]) cmpadd.c:5 261 {*addsi_2_zext} # (expr_list:REG_DEAD (reg/v:SI 4 si [orig:64 b ] [64]) #(nil))) addl%esi, %edi # 7 *addsi_2_zext/2 [length = 2] Attached patch teaches post-reload compare optimization how to handle arithmetic operations with implicit extensions. The input is: (insn 7 4 8 2 (parallel [ (set (reg:DI 5 di) (zero_extend:DI (plus:SI (reg/v:SI 4 si [orig:64 b ] [64]) (reg/v:SI 5 di [orig:63 a ] [63] (clobber (reg:CC 17 flags)) ]) cmpadd.c:5 253 {addsi_1_zext} (nil)) (insn 8 7 9 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 5 di [orig:59 D.1714 ] [59]) (const_int 0 [0]))) cmpadd.c:6 2 {*cmpsi_ccno_1} (nil)) It simply checks if REGNOs of registers are the same, the mode of the compared register (against zero!) is checked to the mode of the inner part of the extension instruction. 2012-04-24 Uros Bizjak ubiz...@gmail.com * compare-elim.c (try_eliminate_compare): Also handle operands with implicit extensions. Patch is lightly tested on x86_64-pc-linux-gnu, together with attached WIP patch. Opinions? Since it looks quite safe, is it OK for mainline? Uros. Index: compare-elim.c === --- compare-elim.c (revision 186721) +++ compare-elim.c (working copy) @@ -563,10 +563,26 @@ Validate that PREV_CLOBBER itself does in fact refer to IN_A. Do recall that we've already validated the shape of PREV_CLOBBER. */ x = XVECEXP (PATTERN (insn), 0, 0); - if (!rtx_equal_p (SET_DEST (x), in_a)) + if (rtx_equal_p (SET_DEST (x), in_a)) +cmp_src = SET_SRC (x); + + /* Also check operations with implicit extensions, e.g.: + [(set (reg:DI) + (zero_extend:DI (plus:SI (reg:SI)(reg:SI + (set (reg:CCZ flags) + (compare:CCZ +(plus:SI (reg:SI)(reg:SI)) +(const_int 0)))] */ + else if (REG_P (SET_DEST (x)) + REG_P (in_a) + REGNO (SET_DEST (x)) == REGNO (in_a) + (GET_CODE (SET_SRC (x)) == ZERO_EXTEND + || GET_CODE (SET_SRC (x)) == SIGN_EXTEND) + GET_MODE (XEXP (SET_SRC (x), 0)) == GET_MODE (in_a)) +cmp_src = XEXP (SET_SRC (x), 0); + else return false; - cmp_src = SET_SRC (x); - + /* Determine if we ought to use a different CC_MODE here. */ flags = maybe_select_cc_mode (cmp, cmp_src, cmp-in_b); if (flags == NULL) Index: config/i386/i386.c === --- config/i386/i386.c (revision 186721) +++ config/i386/i386.c (working copy) @@ -17861,19 +17861,31 @@ emit_insn (gen_rtx_SET (VOIDmode, dest, x)); } -/* Return TRUE or FALSE depending on whether the first SET in INSN - has source and destination with matching CC modes, and that the +/* Return TRUE or FALSE depending on whether the first SET from COMPARE + in INSN has source and destination with matching CC modes, and that the
[testsuite,4.7,committed]: fix gcc.c-torture/compile/pr52891-2.c
Committed as obvious. http://gcc.gnu.org/viewcvs?view=revisionrevision=186775 Johann PR testsuite/52641 PR tree-optimizations/52891 * gcc.c-torture/compile/pr52891-2.c: Fix test for 16-bit int.
Re: [PR tree-optimization/52558]: RFC: questions on store data race
On 04/13/12 03:46, Richard Guenther wrote: On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandezal...@redhat.com wrote: Richard. Thanks so much for reviewing and providing an alternative approach, which AFAICT provides superior results. A similar effect could be obtained by keeping a flag whether we entered the path that stored the value before the transform. Thus, do lsm = g2; // technically not needed, if you also handle loads flag = false; for (...) { if (g1) { if (flag) g2 = lsm; } else { lsm = 0; flag = true; } } if (flag) g2 = lsm; I have implemented this in the attached patch, and it seems to be generating better code than my other if-changed approach. It also avoids generating unnecessary loads that may trap. For the original testcase: int g_1 = 1; int g_2 = 0; int func_1(void) { int l; for (l = 0; l 1234; l++) { if (g_1) return l; else g_2 = 0; } return 999; } After all optimizations are done, we are now generating the following for the flags approach: new: func_1: movlg_1(%rip), %edx xorl%eax, %eax testl %edx, %edx je .L5 rep ret .L5: movl$0, g_2(%rip) movw$999, %ax ret Which is significantly better than the unmodified trunk of: func_1: movlg_1(%rip), %edx movlg_2(%rip), %eax testl %edx, %edx je .L5 movl%eax, g_2(%rip) xorl%eax, %eax ret .L5: movl$0, g_2(%rip) movl$999, %eax ret And in other less contrived cases, we generate correct code that avoids writes on code paths that did not have it. For example, in Hans register promotion example: int count; struct obj { int data; struct obj *next; } *q; void func() { struct obj *p; for (p = q; p; p = p-next) if (p-data 0) count++; } Under the new memory model we should avoid promoting count to a register and unilaterally writing to it upon exiting the loop. With the attached patch, we now generate: func: .LFB0: movqq(%rip), %rax xorl%ecx, %ecx movlcount(%rip), %edx testq %rax, %rax je .L1 .L9: movl(%rax), %esi testl %esi, %esi jle .L3 addl$1, %edx movl$1, %ecx .L3: movq8(%rax), %rax testq %rax, %rax jne .L9 testb %cl, %cl jne .L12 .L1: rep ret .L12: movl%edx, count(%rip) ret Which is as expected. I don't understand your suggestion of: lsm = g2; // technically not needed, if you also handle loads that would allow to extend this to cover the cases where the access may eventually trap (if you omit the load before the loop). Can't I just remove the lsm=g2? What's this about also handling loads? Not sure which is more efficient (you can eventually combine flag variables for different store locations, combining the if-changed compare is not so easy I guess). So far, I see better code generated with the flags approach, so... 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2. In the PR, Richard mentions that both FRE/PRE can figure this out, but they are not run after store motion. DOM should also be able to, but apparently does not :(. Tips? Well. Schedule FRE after loop opts ... DOM is not prepared to handle loads/stores with differing VOPs - it was never updated really after the alias-improvements merge. 2. The GIMPLE_CONDs I create in this patch are currently causing problems with complex floats/doubles. do_jump is complaining that it can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex values since complex lowering has already happened (??). Is there a more canonical way of creating a GIMPLE_COND that may contain complex floats at this stage? As you are handling redundant stores you want a bitwise comparison anyway, but yes, complex compares need to be lowered. But as said, you want a bitwise comparison, not a value-comparison. You can try using I can ignore all this. I have implemented both alternatives, with a target hook as suggested, but I'm thinking of removing it altogether and just leaving the flags approach. Few comments on the patch itself. + new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops new_bb-loop_father) + add_bb_to_loop (then_bb, new_bb-loop_father); + gsi = gsi_start_bb (new_bb); + t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL); Hmm, why do you need to re-load the value? Don't you have both the value as it was loaded before the loop and the new value (the lsm tmp reg) already? Why not simply remember the SSA
Re: move default_tree_printer out of toplev.c
OK. Jason
Re: [PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions
On Tue, Apr 24, 2012 at 6:56 PM, Uros Bizjak ubiz...@gmail.com wrote: Back to converting x86 to post-reload compare elimination pass. Arithmetic operations in x86_64 can implicitly zero extend the result, and set flags according to the non-extended result. Following testcase should exercise both features: 2012-04-24 Uros Bizjak ubiz...@gmail.com * compare-elim.c (try_eliminate_compare): Also handle operands with implicit extensions. Patch is lightly tested on x86_64-pc-linux-gnu, together with attached WIP patch. Opinions? Since it looks quite safe, is it OK for mainline? The full bootstrap and regression test passed on x86_64-pc-linux-gnu {,-m32} with attached x86 WIP patch (the updated patch disables some interfering peephole2s). I am confident that compare-elim.c change doesn't cause regressions, so this message is a formal request for a patch inclusion. BTW: This is also the first time x86 bootstrapped with enabled post-reload compare elimination pass. Uros. Index: i386.c === --- i386.c (revision 186721) +++ i386.c (working copy) @@ -17861,19 +17861,32 @@ ix86_split_copysign_var (rtx operands[]) emit_insn (gen_rtx_SET (VOIDmode, dest, x)); } -/* Return TRUE or FALSE depending on whether the first SET in INSN - has source and destination with matching CC modes, and that the +/* Return TRUE or FALSE depending on whether the first SET from COMPARE + in INSN has source and destination with matching CC modes, and that the CC mode is at least as constrained as REQ_MODE. */ bool ix86_match_ccmode (rtx insn, enum machine_mode req_mode) { - rtx set; + rtx pat, set; enum machine_mode set_mode; + int i; - set = PATTERN (insn); - if (GET_CODE (set) == PARALLEL) -set = XVECEXP (set, 0, 0); + pat = PATTERN (insn); + if (GET_CODE (pat) == PARALLEL) +{ + set = NULL_RTX; + for (i = 0; i XVECLEN (pat, 0); i++) + { + set = XVECEXP (pat, 0, i); + if (GET_CODE (set) == SET + GET_CODE (SET_SRC (set)) == COMPARE) + break; + } +} + else +set = pat; + gcc_assert (GET_CODE (set) == SET); gcc_assert (GET_CODE (SET_SRC (set)) == COMPARE); @@ -39090,6 +39103,8 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs #undef TARGET_CC_MODES_COMPATIBLE #define TARGET_CC_MODES_COMPATIBLE ix86_cc_modes_compatible +#undef TARGET_FLAGS_REGNUM +#define TARGET_FLAGS_REGNUM FLAGS_REG #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG ix86_reorg Index: i386.md === --- i386.md (revision 186769) +++ i386.md (working copy) @@ -5808,14 +5808,14 @@ (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]) (define_insn *addmode_2 - [(set (reg FLAGS_REG) + [(set (match_operand:SWI 0 nonimmediate_operand =r,rm,r) + (plus:SWI + (match_operand:SWI 1 nonimmediate_operand %0,0,r) + (match_operand:SWI 2 general_operand g,ri,0))) + (set (reg FLAGS_REG) (compare - (plus:SWI - (match_operand:SWI 1 nonimmediate_operand %0,0,r) - (match_operand:SWI 2 general_operand g,ri,0)) - (const_int 0))) - (set (match_operand:SWI 0 nonimmediate_operand =r,rm,r) - (plus:SWI (match_dup 1) (match_dup 2)))] + (plus:SWI (match_dup 1) (match_dup 2)) + (const_int 0)))] ix86_match_ccmode (insn, CCGOCmode) ix86_binary_operator_ok (PLUS, MODEmode, operands) { @@ -5857,13 +5857,14 @@ ;; See comment for addsi_1_zext why we do use nonimmediate_operand (define_insn *addsi_2_zext - [(set (reg FLAGS_REG) + [(set (match_operand:DI 0 register_operand =r,r) + (zero_extend:DI + (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r) + (match_operand:SI 2 x86_64_general_operand rme,0 + (set (reg FLAGS_REG) (compare - (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r) - (match_operand:SI 2 x86_64_general_operand rme,0)) - (const_int 0))) - (set (match_operand:DI 0 register_operand =r,r) - (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2] + (plus:SI (match_dup 1) (match_dup 2)) + (const_int 0)))] TARGET_64BIT ix86_match_ccmode (insn, CCGOCmode) ix86_binary_operator_ok (PLUS, SImode, operands) { @@ -6090,7 +6091,7 @@ (match_operand:SWI 2 general_operand g,0)) (const_int 0))) (clobber (match_scratch:SWI 0 =r,r))] - ix86_match_ccmode (insn, CCGOCmode) + 0 ix86_match_ccmode (insn, CCGOCmode) !(MEM_P (operands[1]) MEM_P (operands[2])) { switch (get_attr_type (insn)) @@ -17257,7 +17258,7 @@ (clobber (reg:CC FLAGS_REG))]) (set (match_dup 1) (match_dup 0)) (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))] -
Re: thoughts on libatomic
On Mon, 2012-04-23 at 12:29 -0700, Richard Henderson wrote: On 04/16/12 11:15, Torvald Riegel wrote: - for acq/rel memorders, we don't need seq_cst fences whenever the atomicity is implemented with a lock (hostconfig.c:pre_barrier/post_barrier) Err.. where? This also seems to conflict with... I thought on the fallback paths that use locks. Locks typically use acq/rel semantics, so for the particular location, we don't need pre/post barrier code to be run. Or maybe it didn't happen, and the one below is correct. Sorry for the potential mix-up. I can look at this again after PTO. - protect_start|end: - BUG: those are called without additional post|pre_barrier calls in cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c The lock membar guarantees are too weak if memorder for the atomic access is seq_cst, and if you use more than one lock. (assumes that lock implementation uses typical acq/rel barriers). ... this. I don't see what multiple locks has to do with seq vs acqrel. If we don't run seq_cst membars when falling back to locks, then seq_cst operations on two locations ordered by acq/rel locks will not be forced to be in one total order (which seq_cst requires). They will be ordered per lock, but not globally. - config/posix/lock.c: - I'm not sure that WATCH_SIZE should almost certainly be the same as the cacheline size. Why? Which assumptions about the program? See above. Plus I assumed that if we have to take a lock, that's got to dominate over any cache ping-pong that would be played. Perhaps the cache ping-pong dominates (and this assumes that the lock is contended). But perhaps the cache ping-pong wouldn't be worse than the contention potentially introduced from a too-coarse-grained lock? - Do locks need to care about priority inversion? At the moment I'm leaving this with the pthread implementation. pthreads offers PI-safe locks too. But I also don't know whether we need this.
Re: [PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions
On 04/24/12 09:56, Uros Bizjak wrote: 2012-04-24 Uros Bizjak ubiz...@gmail.com * compare-elim.c (try_eliminate_compare): Also handle operands with implicit extensions. Ok. r~
Re: RFC reminder: an alternative -fsched-pressure implementation
Vladimir Makarov vmaka...@redhat.com writes: On 04/23/2012 11:42 AM, Ulrich Weigand wrote: Vladimir Makarov wrote: I have a mixed feeling with the patch. I've tried it on SPEC2000 on x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5% (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in comparison with the current algorithm. It is slower too. Although the difference is quite insignificant on Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200) and practically the same average code on x86-64 and ARM (I've tried only SPECINT on ARM). Was this using NEON on ARM? The biggest benefits we saw involved vector code ... Sorry, No. On 04/17/2012 04:29 AM, Richard Sandiford wrote: Anyway, given those results and your mixed feelings, I think it would be best to drop the patch. It's a lot of code to carry around, and its ad-hoc nature would make it hard to change in future. There must be better ways of achieving the same thing. It is up to you, Richard. I don't mind if you commit it into the trunk. There is something in your approach too. If it would be possible to get the best of the two methods, we could see a big improvement. On s390, the patch is pretty much a straight improvement across the line: 0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement in GemsFDTD (9.3%), and no regression beyond usual noise levels. That is a really big improvement. Given that, and the impressive improvements we saw on some ARM benchmarks involving vector code (up to 70%), it would really be a pity to see the patch going nowhere ... Is there anything we can do to help make the patch more acceptable? Any suggestions welcome ... I did not object the patch. As I wrote Richard, it is up to him to decide to commit the patch or not (it still is). I had mixed results on x86-64 -- some better and some worse (but with average the same) with pretty big variations. Those big variations mean that people could use this for fine-tuning their programs. Taking your results for S390 and ARM with Neon into account, I guess it should be included and probably made by default for these 2 targets (for sure for s390). OK, thanks to both of you. Ulrich and Andreas: would you be happy for s390 to use this by default? I'll update the patch and install if so. Richard
libgo patch committed: Only define PathMax if PATH_MAX is defined
This patch to libgo only defines syscall.PathMax if PATH_MAX is defined in the system header files. Bootstrapped on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r 4692bc350a0a libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh Tue Apr 24 09:23:40 2012 -0700 +++ b/libgo/mksysinfo.sh Tue Apr 24 12:11:02 2012 -0700 @@ -275,8 +275,9 @@ sed -e 's/^\(const \)__\(PC[^= ]*\)\(.*\)$/\1\2 = __\2/' ${OUT} # The PATH_MAX constant. -grep '^const _PATH_MAX ' gen-sysinfo.go | +if grep '^const _PATH_MAX ' gen-sysinfo.go /dev/null 21; then echo 'const PathMax = _PATH_MAX' ${OUT} +fi # epoll constants. grep '^const _EPOLL' gen-sysinfo.go |
Re: [Patch, Fortran] PR 52196 add -Wrealloc-lhs(-all)
On 19/04/2012 14:02, Tobias Burnus wrote: Updated patch enclosed. On 02/14/2012 12:42 PM, Tobias Burnus wrote: in order to gain an overview for our code whether the recent RESHAPE (and friends) bug affects us and to determine for which assignment a reallocation happens, useful to mitigate performance issues, I added -Wrealloc-lhs and -Wrealloc-lhs-all. The flag -Wrealloc-lhs is the more useful flag: It's about arrays of intrinsic types, which are more likely to appear in hot loops than other types of reallocatable variables such as derived types or (scalar) character variables with deferred length. On 02/27/2012 09:59 PM, Mikael Morin wrote: In turn, the warning might be printed even if at the end no realloc code is generated or present with -O1. Can it be caused by the frontend not going in the realloc-lhs functions in some cases? Especially, it seems that there is a missing codimension/coindexed condition guarding the warning if I compare to the flag_realloc_lhs conditions in trans-expr.c I would rather move the warnings to a function and call it in the places where we really generate the extra code, like it's done for -Warray-temporaries. Two months later I have finally worked on the patch again and followed the suggestion and added the checks to trans-expr.c. Build and regtested on x86-64-linux. OK for the trunk? Looks good. OK. Thanks. Mikael
Re: [Patch, Fortran] PR52864 - fix actual/formal checks
On 12/04/2012 17:23, Tobias Burnus wrote: This patch is a kind of follow up to the other one for the same PR - though this one is for a separate test case, it is not a regression and it's about actual/formal checks. When trying to fix the rejects-valid bug, I realized that one function was never accessed as a call to expr.c's gfc_check_vardef_context is done before. I made some cleanup and added some code to ensure pointer CLASS are correctly handled. I am not positive that the removed code is unreachable, but I failed to produce reachable code and also the test suit passed. Thus, this patch removed a rejects-valid bug, an accepts-invalid bug, cleans up the code a bit and adds a test case for existing checks (besides testing the bug fixes). Build and regtested on x86-64-linux. OK for the trunk? Hello, is there a reason to guard the class_pointer condition with attr.class_ok in the first conditional and with CLASS_DATA(...) != NULL in the two other ones? Not that it matters much, and in fact, I think the patch as is is good enough for committal (yes, it is a OK). I'm asking as I never know myself what is the correct, canonical way to handle the class_* hell... Thanks Mikael
libgo patch committed: Remove race in use of ../testdata
This patch to the libgo testsuite driver removes a race in the use of the ../testdata directory. This is PR go/52462. Ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r 0b5618e33848 libgo/testsuite/gotest --- a/libgo/testsuite/gotest Tue Apr 24 12:11:37 2012 -0700 +++ b/libgo/testsuite/gotest Tue Apr 24 13:10:27 2012 -0700 @@ -136,18 +136,20 @@ mkdir $DIR cd $DIR +mkdir test +cd test if test $keep = false; then - trap cd ..; rm -rf $DIR 0 1 2 3 14 15 + trap cd ../..; rm -rf $DIR 0 1 2 3 14 15 else - trap cd ..; echo Keeping $DIR 0 1 2 3 14 15 + trap cd ../..; echo Keeping $DIR 0 1 2 3 14 15 fi case $srcdir in /*) ;; *) - srcdir=../$srcdir + srcdir=../../$srcdir ;; esac @@ -158,7 +160,7 @@ /*) ;; *) - basedir=../$basedir + basedir=../../$basedir ;; esac @@ -189,10 +191,10 @@ b=`basename $f` rm -f $b cp $basedir/$f $b - elif test -f ../$f; then + elif test -f ../../$f; then b=`basename $f` rm -f $b - cp ../$f $b + cp ../../$f $b else echo file $f not found 12 exit 1 @@ -224,10 +226,10 @@ b=`basename $f` rm -f $b cp $basedir/$f $b - elif test -f ../$f; then + elif test -f ../../$f; then b=`basename $f` rm -f $b - cp ../$f $b + cp ../../$f $b else echo file $f not found 12 exit 1 @@ -455,27 +457,27 @@ exit $status ;; xyes) - rm -rf ../testsuite/*.o + rm -rf ../../testsuite/*.o files=`echo *` for f in $files; do if test $f = _obj || test $f = _test; then continue fi - rm -rf ../testsuite/$f + rm -rf ../../testsuite/$f if test -f $f; then - cp $f ../testsuite/ + cp $f ../../testsuite/ else - ln -s ../$DIR/$f ../testsuite/ + ln -s ../$DIR/test/$f ../../testsuite/ fi done - cd ../testsuite + cd ../../testsuite rm -rf _obj _test mkdir _obj _test if test $testname != ; then GOTESTNAME=$testname export GOTESTNAME fi - $MAKE check RUNTESTFLAGS=$RUNTESTFLAGS GOTEST_TMPDIR=$DIR + $MAKE check RUNTESTFLAGS=$RUNTESTFLAGS GOTEST_TMPDIR=$DIR/test # Useful when using make check-target-libgo cat libgo.log libgo-all.log cat libgo.sum libgo-all.sum
[C++ Patch] PR 39970
Hi, I'm trying to resolve this rather old PR, where we don't reject uses of the dot operator in default template arguments, eg: class blah { int member; }; blah global; templateint param = global.member class template_blah; I'm enforcing the check by adding to cp_parser a new in_template_default_argument_p which can be set in cp_parser_default_argument. Then it's just matter of reading it (together with integral_constant_expression_p, otherwise we can make mistakes in the argument of eg, sizeof or decltype) exactly when parsing . - .* and -* in order to have a precise location for the error. Is this the way we can fix this or we can do better? Tested x86_64-linux. Thanks, Paolo. // /cp 2012-04-24 Paolo Carlini paolo.carl...@oracle.com PR c++/39970 * parser.h (cp_parser::in_template_default_argument_p): Add. * parser.c (cp_parser_new): Initialize the latter. (cp_parser_default_argument): Set the latter to true when parsing the default argument for a non-type template parameter. (cp_parser_primary_expression): When in_template_default_argument_p integral_constant_expression_p reject pointer-to-member operators. (cp_parser_postfix_expression): Likewise for class member access operators. /testsuite 2012-04-24 Paolo Carlini paolo.carl...@oracle.com PR c++/39970 * g++.dg/parse/template27.C: New. Index: testsuite/g++.dg/parse/template27.C === --- testsuite/g++.dg/parse/template27.C (revision 0) +++ testsuite/g++.dg/parse/template27.C (revision 0) @@ -0,0 +1,22 @@ +// PR c++/39970 + +struct blah { int member; }; +blah global; +blah* pglobal; + +template int param = global.member // { dg-error class member access operator } +class template1_blah; + +template int* param = global.member // { dg-error class member access operator } +class template2_blah; + +template int param = pglobal-member // { dg-error class member access operator } +class template3_blah; + +int blah::* pm = blah::member; + +template int param = global.*pm // { dg-error pointer-to-member operator } +class template4_blah; + +template int param = pglobal-*pm// { dg-error pointer-to-member operator } +class template5_blah; Index: cp/parser.c === --- cp/parser.c (revision 186780) +++ cp/parser.c (working copy) @@ -503,6 +503,8 @@ cp_debug_parser (FILE *file, cp_parser *parser) cp_debug_print_flag (file, Local names and 'this' forbidden in current context, parser-local_variables_forbidden_p); + cp_debug_print_flag (file, In template default argument, + parser-in_template_default_argument_p); cp_debug_print_flag (file, In unbraced linkage specification, parser-in_unbraced_linkage_specification_p); cp_debug_print_flag (file, Parsing a declarator, @@ -3278,6 +3280,10 @@ cp_parser_new (void) /* Local variable names are not forbidden. */ parser-local_variables_forbidden_p = false; + /* We are not processing the default argument for + a non-type template parameter. */ + parser-in_template_default_argument_p = false; + /* We are not processing an `extern C' declaration. */ parser-in_unbraced_linkage_specification_p = false; @@ -4266,6 +4272,14 @@ cp_parser_primary_expression (cp_parser *parser, return error_mark_node; id_expr_token = token; token = cp_lexer_peek_token (parser-lexer); + if (parser-in_template_default_argument_p +parser-integral_constant_expression_p +(token-type == CPP_DOT_STAR || token-type == CPP_DEREF_STAR)) + { + error_at (token-location, + pointer-to-member operator is not allowed here); + return error_mark_node; + } done = (token-type != CPP_OPEN_SQUARE token-type != CPP_OPEN_PAREN token-type != CPP_DOT @@ -5776,6 +5790,11 @@ cp_parser_postfix_expression (cp_parser *parser, b postfix-expression - template [opt] id-expression postfix-expression - pseudo-destructor-name */ + if (parser-in_template_default_argument_p + parser-integral_constant_expression_p) + error_at (token-location, + class member access operator is not allowed here); + /* Consume the `.' or `-' operator. */ cp_lexer_consume_token (parser-lexer); @@ -17372,6 +17391,7 @@ cp_parser_default_argument (cp_parser *parser, boo tree default_argument = NULL_TREE; bool saved_greater_than_is_operator_p; bool saved_local_variables_forbidden_p; + bool saved_in_template_default_argument_p; bool non_constant_p, is_direct_init; /* Make sure that PARSER-GREATER_THAN_IS_OPERATOR_P is
Re: Propose to add a new argument MULTILIB_REQUIRED in multilib framework
On Tue, 17 Apr 2012, Terry Guo wrote: 2012-04-17 Terry Guo terry@arm.com * Makefile.in (s-mlib): Add new argument MULTILIB_REQUIRED. * genmultilib (MULTILIB_REQUIRED): New. You need to add documentation in fragments.texi for MULTILIB_REQUIRED. -- Joseph S. Myers jos...@codesourcery.com
Re: default to strict dwarf-2 on powerpc-vxworks
On Apr 24, 2012, at 17:49 , Jakub Jelinek wrote: Talked to Jason about this on IRC last night, he is ok with that, so your patch is ok for trunk. Just committed - Thanks :-)
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 17/04/12 14:24, Richard Guenther wrote: On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote: On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1717_9 = VDEF .MEMD.1717_6(D) # USE =
Re: [C++ Patch] PR 39970
Hi, We should already reject that because it isn't a constant expression; I don't think we want to bother checking specifically for this case in the parser as well. Sure we reject this kind of code at instantiation time. I suspect submitter bothered filing a PR because other front-end also enforce the check in the parser, eg, EDG, but certainly I don't have a strong opinion. Perhaps we aren't checking default arguments unless they're actually used; a could change that if they aren't dependent. Your reply reached the mailing list a bit mangled, could you please clarify? Thanks! Paolo.
[PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) +{ + bb = body[i]; + + FOR_BB_INSNS (bb, insn) +{ + if (CALL_P (insn)) +{ + free (body); + return true; +} +} +} + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) +{ + bb = body[i]; + + FOR_BB_INSNS (bb, insn) +{ + set = single_set (insn); + if (!set) +continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) +{ + free (body); + return true; +} +} +} + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) +return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) +{ + if (EDGE_COUNT (body[i]-succs) = 2) +{ + /* If this block is executed less frequently than the header (loop + entry), then it is weighted based on the ratio of times it is + executed compared to the header. */ + if (body[i]-count header_count) +n += ((float)body[i]-count)/header_count; + + /* When it is executed more frequently than the header (i.e. it is + in a nested inner loop), simply weight the branch at 1.0. */ + else +n += 1.0; +} +} + free (body); + + return n; +} + +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which can increase branch mispredictions. The number of + branches is computed by weighting each branch with its expected execution + probability through the loop based on profile data. If no profile feedback + data exists, simply return the current NUNROLL factor. */ +static unsigned +max_unroll_with_branches(struct loop *loop, unsigned nunroll) +{ + struct loop *outer; + struct niter_desc *outer_desc; + int outer_niters = 1; + float weighted_outer_branches = 0.0; + float weighted_num_branches = compute_weighted_branches (loop); + + /* If there was no profile feedback data, weighted_num_branches will be 0.0 + and we won't limit unrolling. If the weighted_num_branches is at most 1.0, + also don't limit unrolling as the back-edge branch will not be duplicated. */ + if (weighted_num_branches = 1.0) +return nunroll; + + /* Walk up the loop tree until we find a hot outer loop in which the current + loop is nested. At that point we will compute the number
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 2:26 PM, Teresa Johnson tejohn...@google.com wrote: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) + return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) + { + if (EDGE_COUNT (body[i]-succs) = 2) + { + /* If this block is executed less frequently than the header (loop + entry), then it is weighted based on the ratio of times it is + executed compared to the header. */ + if (body[i]-count header_count) + n += ((float)body[i]-count)/header_count; Please don't introduce more floating point usage into the compiler since it could change between different hosts (sse vs x87 for an example). Maybe use a fixed point multiply of 1000 (note use a macro for this special value though) like what is used in the rest of the predict code. Thanks, Andrew Pinski + + /* When it is executed more frequently than the header (i.e. it is + in a nested inner loop), simply weight the branch at 1.0. */ + else + n += 1.0; + } + } + free (body); + + return n; +} + +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which can increase branch mispredictions. The number of + branches is computed by weighting each branch with its expected execution + probability through the loop based on profile data. If no profile feedback + data exists, simply return the current NUNROLL factor. */ +static unsigned +max_unroll_with_branches(struct loop *loop, unsigned nunroll) +{ + struct loop *outer; + struct niter_desc *outer_desc; + int outer_niters = 1; + float weighted_outer_branches =
Re: [C PATCH] Fix -Woverride-init (PR c/52880)
On Fri, 20 Apr 2012, Jakub Jelinek wrote: 2012-04-19 Jakub Jelinek ja...@redhat.com PR c/52880 * c-typeck.c (set_nonincremental_init, set_nonincremental_init_from_string): Pass true instead of false as IMPLICIT to add_pending_init. * gcc.dg/pr52880.c: New test. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: Go patch committed: Add explicit checks for division zero and overflow
On Fri, 20 Apr 2012, Ian Lance Taylor wrote: In Go, a division overflow is supposed to wrap. On x86 processors it would raise a SIGFPE signal leading to a panic. This patch fixes that problem as well. It's a pity that this change is Go-specific. For -fwrapv, INT_MIN / -1 and INT_MIN % -1 should both have defined modulo semantics for C and C++ (similarly of course for all signed integer types, not just int). Ideally of course the semantics for this would be encoded properly in GENERIC / GIMPLE, but until then it would make sense to have a global flag, set by both -fwrapv and the Go front end, to enable those checks in the language-independent compiler. See bug 30484. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
Resending my response in plain text so it will go through to gcc-patches... On Tue, Apr 24, 2012 at 2:36 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Apr 24, 2012 at 2:30 PM, Andrew Pinski pins...@gmail.com wrote: On Tue, Apr 24, 2012 at 2:26 PM, Teresa Johnson tejohn...@google.com wrote: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? Thanks, Teresa 2012-04-24 Teresa Johnson tejohn...@google.com * loop-unroll.c (loop_has_call): New function. (loop_has_FP_comp): Ditto. (compute_weighted_branches): Ditto. (max_unroll_with_branches): Ditto. (decide_unroll_constant_iterations): Add heuristic to avoid increasing branch mispredicts when unrolling. (decide_unroll_runtime_iterations): Ditto. * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) +{ + int header_count = loop-header-count; + unsigned i; + float n; + basic_block * body; + + /* If no profile feedback data exists, don't limit unrolling */ + if (header_count == 0) + return 0.0; + + gcc_assert (loop-latch != EXIT_BLOCK_PTR); + + body = get_loop_body (loop); + n = 0.0; + for (i = 0; i loop-num_nodes; i++) + { + if (EDGE_COUNT (body[i]-succs) = 2) + { + /* If this block is executed less frequently than the header (loop + entry), then it is weighted based on the ratio of times it is + executed compared to the header. */ + if (body[i]-count header_count) + n += ((float)body[i]-count)/header_count; Please don't introduce more floating point usage into the compiler since it could change between different hosts (sse vs x87 for an example). Maybe use a fixed point multiply of 1000 (note use a macro for this special value though) like what is used in the rest of the predict code. Ok, got it. I will address this in the next version of the patch. Thanks, Teresa Thanks, Andrew Pinski + + /* When it is executed more frequently than the header (i.e. it is + in a nested inner loop), simply weight the branch at 1.0. */ + else + n += 1.0; + } + } + free (body); + + return n; +} + +/* Compute the maximum number of times LOOP can be unrolled without exceeding + a branch budget, which
Re: [C] improve missing initializers diagnostics
On Sat, 21 Apr 2012, Manuel L?pez-Ib??ez wrote: 2012-04-19 Manuel L?pez-Ib??ez m...@gcc.gnu.org * c-typeck.c (pop_init_level): Improve diagnostics. testsuite/ * gcc.dg/m-un-2.c: Update. * gcc.dg/20011021-1.c: Update. OK. The strange near initialization formulation came in with a gcc2 merge in 1999 (r24879), replacing previous diagnostics that did identify the field directly in the message, and never really made sense to me; I think naming the field directly in the diagnostic is better for all such diagnostics. -- Joseph S. Myers jos...@codesourcery.com
Re: Symbol table 14/many: remove rest of reachability code
On Mon, Apr 23, 2012 at 2:07 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch removes remaining traces of the code handling needed and reachability flags and replaces last use of it in cgraph_mark_reachable_node by pointer set. It is nice to have 60k of patch that essentially just removes things. Bootstrapped/regtested x86_64-linux, will commit it after rebuilding Mozilla. Honza * lto-symtab.c (lto_cgraph_replace_node): Do not call mark_reahcable_node. * cgraph.c (cgraph_remove_node): Do not clear reachable. (cgraph_mark_reachable_node): Remove. (cgraph_mark_force_output_node): Do not set reachable. (dump_cgraph_node): Do not dump reachable. (cgraph_create_virtual_clone): Do not set reachable. * cgraph.h (cgraph_node): Remove reachable flag. (varpool_node): Remove reachable flag. (cgraph_mark_if_needed): Remove. (cgraph_mark_reachable_node): Remove. * tree-emutls.c (ipa_lower_emutls): Do not check reachable. * cgraphunit.c (cgraph_finalize_function): Do not mark node as reachable. (cgraph_add_new_function): Likewise. (cgraph_mark_if_needed): Remove. (cgraph_analyze_function): Do not set target as reachable. (process_function_and_variable_attributes): Do not care about dllexport. (cgraph_analyze_functions): Do not set reachable flags. (cgraph_mark_functions_to_output): Do not check reachability. (cgraph_copy_node_for_versioning): Do not set reachable flag. (dbxout_expand_expr): Update. * c-decl.c (merge_decls): Do not track changed externs. * ipa.c: Include pointer-set.h (enqueue_cgraph_node): Use reachable pointer set. (process_references): Likewise. (cgraph_remove_unreachable_nodes): Likewise. (whole_program_function_and_variable_visibility): Do not recompute reachable. * trans-mem.c (ipa_tm_execute): Do not check reachable flag. This patch caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53106 -- H.J.
set the correct block info for the call stmt in fnsplit (issue6111050)
Hi, In split_function() (ipa-split.c), for the newly created call stmt, its block is set to the outermost scope, i.e. DECL_INITIAL(current_function_decl). When we inline this partially outlined function, we create the new block based on the block for the call stmt (in expand_call_inline()). So the new block for the inlined body is in parallel to the function top level scope. This bad block structure will late result in a bad stack layout. This patch fixes the issue by setting the correct block number. It starts with the split_point entry bb to find the block stmt in the outlined region. The entry_bb maybe an empty BB so we need to follow the CFG until we find a non-empty bb. My early patch tried to use the block info from the first non-empty bb in the outlined regision. But that failed bootstrap as some of the stmts (assignment stmt) do not have the block info (bug?). So in this patch I traverse all the stmts in the bb until getting the block info. Tested with gcc bootstap. Thanks, 2012-04-24 Rong Xu x...@google.com * ipa-split.c (split_function): set the correct block for the call statement. Index: ipa-split.c === --- ipa-split.c (revision 186634) +++ ipa-split.c (working copy) @@ -948,7 +948,7 @@ int num = 0; struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); basic_block return_bb = find_return_bb (); - basic_block call_bb; + basic_block call_bb, bb; gimple_stmt_iterator gsi; gimple call; edge e; @@ -958,6 +958,7 @@ gimple last_stmt = NULL; unsigned int i; tree arg; + tree split_loc_block = NULL; if (dump_file) { @@ -1072,6 +1073,33 @@ } } + /* Find the block location of the split region. */ + bb = split_point-entry_bb; + do +{ + bool found = false; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + if (is_gimple_debug(gsi_stmt(gsi))) +continue; + if ((split_loc_block = gimple_block (gsi_stmt (gsi +{ + found = true; + break; +} +} + if (found) +break; + + /* If we reach here, this bb should be an empty unconditional + or fall-throuth branch. We continue with the succ bb. */ + bb = single_succ (bb); +} + while (bb bitmap_bit_p (split_point-split_bbs, bb-index)); + + gcc_assert (split_loc_block); + /* Now create the actual clone. */ rebuild_cgraph_edges (); node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, @@ -1115,7 +1143,7 @@ VEC_replace (tree, args_to_pass, i, arg); } call = gimple_build_call_vec (node-decl, args_to_pass); - gimple_set_block (call, DECL_INITIAL (current_function_decl)); + gimple_set_block (call, split_loc_block); /* We avoid address being taken on any variable used by split part, so return slot optimization is always possible. Moreover this is -- This patch is available for review at http://codereview.appspot.com/6111050
Re: PR c/53066 Wshadow should not warn for shadowing an extern function
On Sun, 22 Apr 2012, Manuel L?pez-Ib??ez wrote: Wshadow warns whenever any declaration shadows a global function declaration. This is almost always noise, since most (always?) of the time one cannot mistakenly replace a function by another variable. The false positives are too common (Linus mentions using the name 'index' when including string.h). I think the correct rule would be: warn if a variable *with pointer-to-function type* shadows a function, and warn if a nested function shadows another function, but don't warn for variables shadowing functions if the variable has any other type (because if the variable has some type that isn't a pointer-to-function, no confusion is likely without another error being given). -- Joseph S. Myers jos...@codesourcery.com
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu x...@google.com wrote: Hi, In split_function() (ipa-split.c), for the newly created call stmt, its block is set to the outermost scope, i.e. DECL_INITIAL(current_function_decl). When we inline this partially outlined function, we create the new block based on the block for the call stmt (in expand_call_inline()). So the new block for the inlined body is in parallel to the function top level scope. This bad block structure will late result in a bad stack layout. We do not depend on the block structure any more when dealing with stack layout for variables in GCC 4.7.0 and above. I am not saying your patch is incorrect or not needed. Just it will not have an effect on variable stack layout. Did you have a testcase for the bad stack layout issue? Or was it too hard to produce one because the size matters? This patch fixes the issue by setting the correct block number. It starts with the split_point entry bb to find the block stmt in the outlined region. The entry_bb maybe an empty BB so we need to follow the CFG until we find a non-empty bb. My early patch tried to use the block info from the first non-empty bb in the outlined regision. But that failed bootstrap as some of the stmts (assignment stmt) do not have the block info (bug?). So in this patch I traverse all the stmts in the bb until getting the block info. Tested with gcc bootstap. On which target? Thanks, Andrew Pinski Thanks, 2012-04-24 Rong Xu x...@google.com * ipa-split.c (split_function): set the correct block for the call statement. Index: ipa-split.c === --- ipa-split.c (revision 186634) +++ ipa-split.c (working copy) @@ -948,7 +948,7 @@ int num = 0; struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); basic_block return_bb = find_return_bb (); - basic_block call_bb; + basic_block call_bb, bb; gimple_stmt_iterator gsi; gimple call; edge e; @@ -958,6 +958,7 @@ gimple last_stmt = NULL; unsigned int i; tree arg; + tree split_loc_block = NULL; if (dump_file) { @@ -1072,6 +1073,33 @@ } } + /* Find the block location of the split region. */ + bb = split_point-entry_bb; + do + { + bool found = false; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + if (is_gimple_debug(gsi_stmt(gsi))) + continue; + if ((split_loc_block = gimple_block (gsi_stmt (gsi + { + found = true; + break; + } + } + if (found) + break; + + /* If we reach here, this bb should be an empty unconditional + or fall-throuth branch. We continue with the succ bb. */ + bb = single_succ (bb); + } + while (bb bitmap_bit_p (split_point-split_bbs, bb-index)); + + gcc_assert (split_loc_block); + /* Now create the actual clone. */ rebuild_cgraph_edges (); node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, @@ -1115,7 +1143,7 @@ VEC_replace (tree, args_to_pass, i, arg); } call = gimple_build_call_vec (node-decl, args_to_pass); - gimple_set_block (call, DECL_INITIAL (current_function_decl)); + gimple_set_block (call, split_loc_block); /* We avoid address being taken on any variable used by split part, so return slot optimization is always possible. Moreover this is -- This patch is available for review at http://codereview.appspot.com/6111050
Re: set the correct block info for the call stmt in fnsplit (issue6111050)
please the find test case in the attachment. It shows the issue in google-4_6 branch -c -O2 -fno-strict-aliasing ss.C -fdump-rtl-expand-all in the rtl-expand dump, trianglevertices and one the gtest_ar are in the same partition. the problem is found in arm compiler, but we manager to reproduce in x86. -Rong On Tue, Apr 24, 2012 at 3:02 PM, Andrew Pinski pins...@gmail.com wrote: On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu x...@google.com wrote: Hi, In split_function() (ipa-split.c), for the newly created call stmt, its block is set to the outermost scope, i.e. DECL_INITIAL(current_function_decl). When we inline this partially outlined function, we create the new block based on the block for the call stmt (in expand_call_inline()). So the new block for the inlined body is in parallel to the function top level scope. This bad block structure will late result in a bad stack layout. We do not depend on the block structure any more when dealing with stack layout for variables in GCC 4.7.0 and above. I am not saying your patch is incorrect or not needed. Just it will not have an effect on variable stack layout. Did you have a testcase for the bad stack layout issue? Or was it too hard to produce one because the size matters? This patch fixes the issue by setting the correct block number. It starts with the split_point entry bb to find the block stmt in the outlined region. The entry_bb maybe an empty BB so we need to follow the CFG until we find a non-empty bb. My early patch tried to use the block info from the first non-empty bb in the outlined regision. But that failed bootstrap as some of the stmts (assignment stmt) do not have the block info (bug?). So in this patch I traverse all the stmts in the bb until getting the block info. Tested with gcc bootstap. On which target? Thanks, Andrew Pinski Thanks, 2012-04-24 Rong Xu x...@google.com * ipa-split.c (split_function): set the correct block for the call statement. Index: ipa-split.c === --- ipa-split.c (revision 186634) +++ ipa-split.c (working copy) @@ -948,7 +948,7 @@ int num = 0; struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); basic_block return_bb = find_return_bb (); - basic_block call_bb; + basic_block call_bb, bb; gimple_stmt_iterator gsi; gimple call; edge e; @@ -958,6 +958,7 @@ gimple last_stmt = NULL; unsigned int i; tree arg; + tree split_loc_block = NULL; if (dump_file) { @@ -1072,6 +1073,33 @@ } } + /* Find the block location of the split region. */ + bb = split_point-entry_bb; + do + { + bool found = false; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + if (is_gimple_debug(gsi_stmt(gsi))) + continue; + if ((split_loc_block = gimple_block (gsi_stmt (gsi + { + found = true; + break; + } + } + if (found) + break; + + /* If we reach here, this bb should be an empty unconditional + or fall-throuth branch. We continue with the succ bb. */ + bb = single_succ (bb); + } + while (bb bitmap_bit_p (split_point-split_bbs, bb-index)); + + gcc_assert (split_loc_block); + /* Now create the actual clone. */ rebuild_cgraph_edges (); node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, @@ -1115,7 +1143,7 @@ VEC_replace (tree, args_to_pass, i, arg); } call = gimple_build_call_vec (node-decl, args_to_pass); - gimple_set_block (call, DECL_INITIAL (current_function_decl)); + gimple_set_block (call, split_loc_block); /* We avoid address being taken on any variable used by split part, so return slot optimization is always possible. Moreover this is -- This patch is available for review at http://codereview.appspot.com/6111050 namespace testing { class Test { virtual void TestBody() = 0; }; class AssertionResult { public: AssertionResult(const AssertionResult other); operator bool() const { return success_; } private: bool success_; void operator=(AssertionResult const ); }; namespace internal { class TestFactoryBase { public: virtual ::testing::Test* CreateTest() = 0; }; template class TestClass class TestFactoryImpl : public TestFactoryBase { public: virtual ::testing::Test* CreateTest() { return new TestClass; } }; char* MakeAndRegisterTestInfo2( const char* test_case_name, TestFactoryBase* factory); AssertionResult CmpHelperEQ(const char* expected_expression, const char* actual_expression, long long expected, long long actual); AssertionResult CmpHelperNE( const char* expr, const char* expr2, long long val1, long long val2);
Re: strengthen protection against REG_EQUIV/EQUAL on !REG set
Olivier Hainque hain...@adacore.com writes: *** /tmp/rkQ7Ep_emit-rtl.c2012-04-12 11:19:51.830104512 +0200 --- gcc/emit-rtl.c2012-04-11 22:39:11.323103686 +0200 *** set_unique_reg_note (rtx insn, enum reg_ *** 4955,4960 --- 4955,4975 if (GET_CODE (datum) == ASM_OPERANDS) return NULL_RTX; + /* Don't add REG_EQUAL/REG_EQUIV notes on a single_set destination which + isn't a REG or a SUBREG of REG. Such notes are invalid, could lead + to bogus assumptions downstream (e.g. that a (set MEM) couldn't trap), + and many callers just don't care checking. Note that we might have + single_set (insn) == 0 here, typically from reload attaching REG_EQUAL + to USEs for inheritance processing purposes. */ + { + rtx set = single_set (insn); + + if (set ! (REG_P (SET_DEST (set)) + || (GET_CODE (SET_DEST (set)) == SUBREG +REG_P (SUBREG_REG (SET_DEST (set)) + return NULL_RTX; + } + STRICT_LOW_PART is OK too. I like the idea, but I think we're in danger of having too many functions check the set. Further up set_unique_reg_note we have: /* Don't add REG_EQUAL/REG_EQUIV notes if the insn has multiple sets (some callers assume single_set means the insn only has one set, when in fact it means the insn only has one * useful * set). */ if (GET_CODE (PATTERN (insn)) == PARALLEL multiple_sets (insn)) { gcc_assert (!note); return NULL_RTX; } And set_dst_reg_note has: rtx set = single_set (insn); if (set SET_DEST (set) == dst) return set_unique_reg_note (insn, kind, datum); Would be nice to use a single function that knows about the extra contraints here. Maybe something like the attached? I'm deliberately requiring the SET to the first rtx in the PARALLEL. I'm not entirely happy with the optabs.c code: if (! rtx_equal_p (SET_DEST (set), target) /* For a STRICT_LOW_PART, the REG_NOTE applies to what is inside it. */ (GET_CODE (SET_DEST (set)) != STRICT_LOW_PART || ! rtx_equal_p (XEXP (SET_DEST (set), 0), target))) return 1; either; the note is always GET_MODE (target), so surely this should be checking for STRICT_LOW_PART before applying rtx_equal_p? Maybe set_for_reg_notes should return the reg too, and we'd just apply rtx_equal_p to that. Richard gcc/ * rtl.h (set_for_reg_notes): Declare. * emit-rtl.c (set_for_reg_notes): New function. (set_unique_reg_note): Use it. * optabs.c (add_equal_note): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2012-04-24 22:55:36.002967164 +0100 +++ gcc/rtl.h 2012-04-24 22:59:34.150966581 +0100 @@ -1879,6 +1879,7 @@ extern enum machine_mode choose_hard_reg bool); /* In emit-rtl.c */ +extern rtx set_for_reg_notes (rtx); extern rtx set_unique_reg_note (rtx, enum reg_note, rtx); extern rtx set_dst_reg_note (rtx, enum reg_note, rtx, rtx); extern void set_insn_deleted (rtx); Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2012-04-24 22:55:36.002967164 +0100 +++ gcc/emit-rtl.c 2012-04-24 23:00:13.677966484 +0100 @@ -4944,6 +4944,45 @@ force_next_line_note (void) last_location = -1; } +/* Notes like REG_EQUAL and REG_EQUIV refer to a set in an instruction. + Return the set in INSN that such notes describe, or NULL if the notes + have no meaning for INSN. */ + +rtx +set_for_reg_notes (rtx insn) +{ + rtx pat, reg; + + if (!INSN_P (insn)) +return NULL_RTX; + + pat = PATTERN (insn); + if (GET_CODE (pat) == PARALLEL) +{ + /* We do not use single_set because that ignores SETs of unused +registers. REG_EQUAL and REG_EQUIV notes really do require the +PARALLEL to have a single SET. */ + if (multiple_sets (insn)) + return NULL_RTX; + pat = XVECEXP (pat, 0, 0); +} + + if (GET_CODE (pat) != SET) +return NULL_RTX; + + reg = SET_DEST (pat); + + /* Notes apply to the contents of a STRICT_LOW_PART. */ + if (GET_CODE (reg) == STRICT_LOW_PART) +reg = XEXP (reg, 0); + + /* Check that we have a register. */ + if (!(REG_P (reg) || GET_CODE (reg) == SUBREG)) +return NULL_RTX; + + return pat; +} + /* Place a note of KIND on insn INSN with DATUM as the datum. If a note of this type already exists, remove it first. */ @@ -4956,39 +4995,26 @@ set_unique_reg_note (rtx insn, enum reg_ { case REG_EQUAL: case REG_EQUIV: - /* Don't add REG_EQUAL/REG_EQUIV notes if the insn -has multiple sets (some callers assume single_set -means the insn only has one set, when in fact it -means the insn only has one * useful * set). */ - if (GET_CODE (PATTERN (insn)) == PARALLEL
Re: Added better handling of outdated profile data. (issue 5989046)
Ping? Here is the formatted ChangeLog in case you want a summary of what I did: * gcc/ipa-inline.c (edge_badness): Make sure profile is valid before using it to compute badness. * gcc/predict.c (maybe_hot_frequency_p): Ditto. (cgraph_maybe_hot_edge_p): Ditto. (maybe_hot_edge_p): Ditto. (probably_never_executed_bb_p): Ditto. (compute_function_frequency): Ditto. * gcc/profile.c (compute_branch_probabilities): Return without setting the profile read flag if get_exec_counts returns NULL. * gcc/tree.h: Added macro for accessing profile status. The patch is in the original email and on the codereview issue (http://codereview.appspot.com/5989046/).
Re: PowerPC prologue and epilogue 6
On Sat, Apr 21, 2012 at 2:48 AM, Alan Modra amo...@gmail.com wrote: This patch adds out-of-line vector saves and restores. To do this I made some infrastructure changes to various functions like rs6000_emit_savres_rtx that currently take boolean parameters (savep, gpr, and lr). Rather than add yet another boolean to specify vector regs, I chose to lump them all together in a bitmask. This made the patch a little larger but overall is a better interface, I think. I also revert a change I made in http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01014.html to always use r11 as a frame reg whenever abiv4 emits out-of-line saves. Code quality in functions with small frames is better without that particular change. This however meant some changes are required later when setting up pointer regs for gpr and fpr out-of-line saves. What else is here? Improved register selection when saving vrsave in the prologue and when restoring cr in the epilogue, allowing better scheduling. A fix to rs6000_output_function_prologue to output the correct .extern for ELF, then deciding we don't need such things anyway. And various other little code cleanups. Bootstrapped and regression tested powerpc-linux. gcc/ * config/rs6000/rs6000 (SAVE_INLINE_VRS, REST_INLINE_VRS, V_SAVE_INLINE, SAVRES_LR, SAVRES_SAVE, SAVRES_REG, SAVRES_GPR, SAVRES_FPR, SAVRES_VR): Define. (no_global_regs_above): Delete. (no_global_regs): New function. (rs6000_savres_strategy): Handle vector regs. Use proper lr_save_p value for load multiple test. (savres_routine_syms): Increase size. (rs6000_savres_routine_name, rs6000_savres_routine_sym, ptr_regno_for_savres, rs6000_emit_savres_rtx): Pass in int selector rather than a number of boolean flags. Update all callers. (rs6000_savres_routine_name): Generate vector save/restore names. (rs6000_savres_routine_sym): Handle vector regs. Delete forward decl. (ptr_regno_for_savres, rs6000_emit_savres_rtx): Likewise. (rs6000_emit_prologue): Delete saving_FPRs_inline, saving_GPRs_inline and using_store_multiple. Expand uses. Don't always use r11 as frame reg when needed for out-of-line saves. Set up initial offset for out-of-line vector saves when buying stack frame. Handle pointer reg setup for out-of-line fp save. Emit call to out-of-line vector save function. Choose r11 or r12 for vrsave reg when available for better scheduling. (rs6000_output_function_prologue): Don't emit .extern for ELF. (rs6000_emit_epilogue): Choose a better frame reg when restoring from back-chain to suit out-of-line vector restore functions. Emit call to out-of-line vector restore function. Adjust register used for cr restore. Tweak pointer register setup for gpr restore. * config/rs6000/rs6000.h (FIRST_SAVED_GP_REGNO): Take into account FIXED_R13. * config/rs6000/sysv4.h (FP_SAVE_INLINE, GP_SAVE_INLINE): Simplify. (V_SAVE_INLINE): Define. * config/rs6000/altivec.md (save_vregs_*, restore_vregs_*): New insns. libgcc/ * config/rs6000/crtsavevr.S: New file. * config/rs6000/crtrestvr.S: New file. * config/rs6000/t-savresfgpr: Build the above. * config/rs6000/t-netbsd: Likewise. This patch is okay with the macro usage fix. Thanks, David
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote: * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param. (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto. You should add documentation for these new PARAMs to doc/invoke.texi. I don't really like these new PARAMs: All other loop PARAMs are based on the number of insns in a loop, or the maximum number of times a transformation is applied. Your new PARAM_MIN_ITER_UNROLL_WITH_BRANCHES is completely different, because it is a number of iterations. This makes the PARAM value feel even more arbitrary than all the other PARAMs to some extend already do... (The only other PARAM like that is PARAM_ALIGN_LOOP_ITERATIONS, and its default value also looks quite arbitrary...) Index: loop-unroll.c === --- loop-unroll.c (revision 186783) +++ loop-unroll.c (working copy) @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc basic_block); static rtx get_expansion (struct var_to_expand *); +/* Determine whether LOOP contains call. */ +static bool +loop_has_call(struct loop *loop) +{ + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + if (CALL_P (insn)) + { + free (body); + return true; + } + } + } + free (body); + return false; +} + +/* Determine whether LOOP contains floating-point computation. */ +static bool +loop_has_FP_comp(struct loop *loop) +{ + rtx set, dest; + basic_block *body, bb; + unsigned i; + rtx insn; + + body = get_loop_body (loop); + for (i = 0; i loop-num_nodes; i++) + { + bb = body[i]; + + FOR_BB_INSNS (bb, insn) + { + set = single_set (insn); + if (!set) + continue; + + dest = SET_DEST (set); + if (FLOAT_MODE_P (GET_MODE (dest))) + { + free (body); + return true; So you only detect single-set FP operations where some insns stores in a float mode. It wouldn't be very difficult to just walk over all sets and look for float modes. This is also necessary e.g. for x87 sincos, as well as various insns on other machines. Your comments say you don't want to apply the new heuristic to loops containing FP operations because these loops usually benefit more from unrolling. Therefore, you should IMHO look at non-single_set() insns also here, to avoid applying the heuristics to loops containing non-single_set() FP insns. + } + } + } + free (body); + return false; +} Nit: You are calling loop_has_call and loop_has_FP_comp() twice on each loop (first for constant iterations and next for runtime iterations), maybe you can fuse the functions and cache the results (e.g. with two bitmaps, or put it in the loop description and retrieve it with get_simple_loop_desc). Actually num_loop_branches() could/should also be cached. I realize that the loop body walks are probably not very expensive (and compile time probably isn't a concern if you're using profile driven optimizations) but they do all add up... +/* Compute the number of branches in LOOP, weighted by execution counts. */ +static float +compute_weighted_branches(struct loop *loop) The floating point thing was already mentioned by Andrew. You can use integer math instead (for examples, look for BB_FREQ_MAX e.g. in average_num_loop_insns()). + while (loop_outer(outer)) + { + outer_desc = get_simple_loop_desc (outer); + + if (outer_desc-const_iter) + outer_niters *= outer_desc-niter; + else if (outer-header-count) + outer_niters *= expected_loop_iterations (outer); + + weighted_outer_branches = compute_weighted_branches (outer); Can you delay this computation of weighted_outer_branches call to ... + /* Should have been checked by caller. */ + gcc_assert(PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES) != -1); Should never even happen. You have set the minimum acceptable value to 0. If you managed to test this code with PARAM_MIN_ITER_UNROLL_WITH_BRANCHES==-1, I'd like to know how (if you can do it from the command line, there is a bug in the handling of acceptable PARAM values :-) + /* If the outer loop has enough iterations to be considered hot, then + we can stop our upwards loop tree traversal and examine the current + outer loop. */ + if (outer_niters = PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES)) + { + /* Assume that any call will cause the branch budget to be exceeded, + and that we can't unroll the current loop without increasing + mispredicts. */ +
Re: [C++ Patch] PR 39970
On 04/24/2012 05:24 PM, Paolo Carlini wrote: Perhaps we aren't checking default arguments unless they're actually used; a could change that if they aren't dependent. Your reply reached the mailing list a bit mangled, could you please clarify? If the default argument isn't dependent on other template parameters, we can check immediately whether it is a suitable template argument for its parameter, including being a constant expression. That ought to give an appropriate diagnostic for this testcase. Jason
Re: Support for Runtime CPU type detection via builtins (issue5754058)
Hi, Thanks for all the comments. I have made all the changes as mentioned and submiited the patch. Summary of changes made: * Add support for AVX * Fix documentation in extend.texi * Make it thread-safe according to H.J.'s comments. I have attached the patch. Boot-strapped and checked for test parity with pristine build. * config/i386/i386.c (build_processor_model_struct): New function. (make_var_decl): New function. (fold_builtin_cpu): New function. (ix86_fold_builtin): New function. (make_cpu_type_builtin): New function. (ix86_init_platform_type_builtins): New function. (ix86_expand_builtin): Expand new builtins by folding them. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * doc/extend.texi: Document builtins. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model. Thanks, -Sri. On Mon, Apr 23, 2012 at 5:46 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Apr 23, 2012 at 1:43 PM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Apr 23, 2012 at 12:16 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Apr 23, 2012 at 6:59 PM, Sriraman Tallam tmsri...@google.com wrote: i386 maintainers - Is this patch ok? Has the community reached the consensus on how this kind of functionality has to be implemented? I have followed the discussion a bit, but IIRC, there was no clear decision. Without this decision, I am not able to review the _implementation_ of agreed functionality for x86 target. (I apologize if I have missed the decision, please point me to the discussion in this case.) The discussions are here: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01446.html and follow-ups to this. I am not sure about consensus, but the important points raised were: 1) Constructor ordering: What if some constructors fire before cpu_indicator_init?, which determines the CPU. I addressed this problem by making the priority of cpu_indicator_init to be the highest possible. Still, IFUNC initializers will fire before and they have to explicitly call __builtin_cpu_init() before checking the CPU type. 2) Reducing the number of builtins : It is only two now. * config/i386/i386.c (build_processor_features_struct): New function. (build_processor_model_struct): New function. (make_var_decl): New function. (get_field_from_struct): New function. (fold_builtin_target): New function. (ix86_fold_builtin): New function. (ix86_expand_builtin): Expand new builtins by folding them. (make_cpu_type_builtin): New functions. (ix86_init_platform_type_builtins): Make the new builtins. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * doc/extend.texi: Document builtins. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbols __cpu_model and __cpu_features. The patch is OK. I guess that AVX is left as an exercise for a x86 maintainer ;) I will add AVX, and make all the changes. 3 more comments: 1. Your implementation isn't thread-safe. With my suggestion of if (_cpu_model.__cpu_vendor) return 0; you should change get_intel_cpu to return processor_vendor and set _cpu_model.__cpu_vendor after setting up all other bits. 2. Bitfields struct __processor_features +{ + unsigned int __cpu_cmov : 1; + unsigned int __cpu_mmx : 1; + unsigned int __cpu_popcnt : 1; + unsigned int __cpu_sse : 1; + unsigned int __cpu_sse2 : 1; + unsigned int __cpu_sse3 : 1; + unsigned int __cpu_ssse3 : 1; + unsigned int __cpu_sse4_1 : 1; + unsigned int __cpu_sse4_2 : 1; +} __cpu_features; may not be thread-safe since only char/short/int/long long stores are atomic and it isn't extensible. You can use unsigned int __cpu_features[1]; to make it extensible and store atomic. 3. You may move unsigned int __cpu_features[1]; into struct __processor_model to only export one symbol instead of 2. -- H.J. 2012-04-24 Sriraman Tallam tmsri...@google.com This patch adds new
Re: Support for Runtime CPU type detection via builtins (issue5754058)
On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I have made all the changes as mentioned and submiited the patch. Summary of changes made: * Add support for AVX * Fix documentation in extend.texi * Make it thread-safe according to H.J.'s comments. I have attached the patch. Boot-strapped and checked for test parity with pristine build. * config/i386/i386.c (build_processor_model_struct): New function. (make_var_decl): New function. (fold_builtin_cpu): New function. (ix86_fold_builtin): New function. (make_cpu_type_builtin): New function. (ix86_init_platform_type_builtins): New function. (ix86_expand_builtin): Expand new builtins by folding them. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * doc/extend.texi: Document builtins. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model. + /* This function needs to run just once. */ + if (__cpu_model.__cpu_vendor) +return 0; + + /* Assume cpuid insn present. Run in level 0 to get vendor id. */ + if (!__get_cpuid_output (0, eax, ebx, ecx, edx)) +return -1; If __get_cpuid_output returns non-zero, it will be called repeatedly. I think you should set __cpu_model.__cpu_vendor to non-zero in this case. Otherwise, it looks good to me. Thanks. -- H.J.
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
tejohn...@google.com (Teresa Johnson) writes: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? One problem with any unrolling heuristics is currently that gcc has both the tree level and the rtl level unroller. The tree one is even on at -O3. So if you tweak anything for one you have to affect both, otherwise the other may still do the wrong thing(tm). For some other tweaks I looked into a shared cost model some time ago. May be still needed. -Andi -- a...@linux.intel.com -- Speaking for myself only
[v3] libstdc++/52689 testcase
Noticed that this testcase wasn't put in as part of the patch. Fixed as follows. tested x86/linux -benjamin2012-04-24 Benjamin Kosnik b...@redhat.com PR libstdc++/52689 * testsuite/17_intro/static.cc: New. diff --git a/libstdc++-v3/testsuite/17_intro/static.cc b/libstdc++-v3/testsuite/17_intro/static.cc new file mode 100644 index 000..99362f5 --- /dev/null +++ b/libstdc++-v3/testsuite/17_intro/static.cc @@ -0,0 +1,31 @@ +// { dg-do link } +// { dg-require-effective-target static } +// { dg-options -static -std=gnu++11 } + +// Copyright (C) 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/ + +// libstdc++/52689 static linking fails +#include iostream +#include locale + +int main() +{ + std::locale c = std::locale::global(); + std::cout i am old-skool\n; + return 0; +}
Re: [C++ Patch] PR 39970
On 04/25/2012 01:41 AM, Jason Merrill wrote: On 04/24/2012 05:24 PM, Paolo Carlini wrote: Perhaps we aren't checking default arguments unless they're actually used; a could change that if they aren't dependent. Your reply reached the mailing list a bit mangled, could you please clarify? If the default argument isn't dependent on other template parameters, we can check immediately whether it is a suitable template argument for its parameter, including being a constant expression. That ought to give an appropriate diagnostic for this testcase. Ok, thanks, I''ll see what I can do. Paolo.
Re: Support for Runtime CPU type detection via builtins (issue5754058)
On Tue, Apr 24, 2012 at 5:24 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I have made all the changes as mentioned and submiited the patch. Summary of changes made: * Add support for AVX * Fix documentation in extend.texi * Make it thread-safe according to H.J.'s comments. I have attached the patch. Boot-strapped and checked for test parity with pristine build. * config/i386/i386.c (build_processor_model_struct): New function. (make_var_decl): New function. (fold_builtin_cpu): New function. (ix86_fold_builtin): New function. (make_cpu_type_builtin): New function. (ix86_init_platform_type_builtins): New function. (ix86_expand_builtin): Expand new builtins by folding them. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * doc/extend.texi: Document builtins. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model. + /* This function needs to run just once. */ + if (__cpu_model.__cpu_vendor) + return 0; + + /* Assume cpuid insn present. Run in level 0 to get vendor id. */ + if (!__get_cpuid_output (0, eax, ebx, ecx, edx)) + return -1; If __get_cpuid_output returns non-zero, it will be called repeatedly. I think you should set __cpu_model.__cpu_vendor to non-zero in this case. Done now. 2012-04-24 Sriraman Tallam tmsri...@google.com * libgcc/config/i386/i386-cpuinfo.c: Set __cpu_vendor always. Index: libgcc/config/i386/i386-cpuinfo.c === --- libgcc/config/i386/i386-cpuinfo.c (revision 186789) +++ libgcc/config/i386/i386-cpuinfo.c (working copy) @@ -256,16 +256,25 @@ __cpu_indicator_init (void) /* Assume cpuid insn present. Run in level 0 to get vendor id. */ if (!__get_cpuid_output (0, eax, ebx, ecx, edx)) -return -1; +{ + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; +} vendor = ebx; max_level = eax; if (max_level 1) -return -1; +{ + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; +} if (!__get_cpuid_output (1, eax, ebx, ecx, edx)) -return -1; +{ + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; +} model = (eax 4) 0x0f; family = (eax 8) 0x0f; Thanks, -Sri. Otherwise, it looks good to me. Thanks. -- H.J.
Re: Support for Runtime CPU type detection via builtins (issue5754058)
On Tue, Apr 24, 2012 at 7:06 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Apr 24, 2012 at 5:24 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I have made all the changes as mentioned and submiited the patch. Summary of changes made: * Add support for AVX * Fix documentation in extend.texi * Make it thread-safe according to H.J.'s comments. I have attached the patch. Boot-strapped and checked for test parity with pristine build. * config/i386/i386.c (build_processor_model_struct): New function. (make_var_decl): New function. (fold_builtin_cpu): New function. (ix86_fold_builtin): New function. (make_cpu_type_builtin): New function. (ix86_init_platform_type_builtins): New function. (ix86_expand_builtin): Expand new builtins by folding them. (ix86_init_builtins): Make new builtins to detect CPU type. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_INIT): New enum value. (IX86_BUILTIN_CPU_IS): New enum value. (IX86_BUILTIN_CPU_SUPPORTS): New enum value. * config/i386/i386-builtin-types.def: New function type. * testsuite/gcc.target/builtin_target.c: New testcase. * doc/extend.texi: Document builtins. * libgcc/config/i386/i386-cpuinfo.c: New file. * libgcc/config/i386/t-cpuinfo: New file. * libgcc/config.host: Include t-cpuinfo. * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model. + /* This function needs to run just once. */ + if (__cpu_model.__cpu_vendor) + return 0; + + /* Assume cpuid insn present. Run in level 0 to get vendor id. */ + if (!__get_cpuid_output (0, eax, ebx, ecx, edx)) + return -1; If __get_cpuid_output returns non-zero, it will be called repeatedly. I think you should set __cpu_model.__cpu_vendor to non-zero in this case. Done now. 2012-04-24 Sriraman Tallam tmsri...@google.com * libgcc/config/i386/i386-cpuinfo.c: Set __cpu_vendor always. Index: libgcc/config/i386/i386-cpuinfo.c === --- libgcc/config/i386/i386-cpuinfo.c (revision 186789) +++ libgcc/config/i386/i386-cpuinfo.c (working copy) @@ -256,16 +256,25 @@ __cpu_indicator_init (void) /* Assume cpuid insn present. Run in level 0 to get vendor id. */ if (!__get_cpuid_output (0, eax, ebx, ecx, edx)) - return -1; + { + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; + } vendor = ebx; max_level = eax; if (max_level 1) - return -1; + { + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; + } if (!__get_cpuid_output (1, eax, ebx, ecx, edx)) - return -1; + { + __cpu_model.__cpu_vendor = VENDOR_OTHER; + return -1; + } model = (eax 4) 0x0f; family = (eax 8) 0x0f; Thanks, Should you also handle AVX2? -- H.J.
libgo patch committed: Fix net package for Solaris
This patch to libgo fixes the net package to work correctly on Solaris. The main fix is to the select code, to fix the case where one goroutine closes a file descriptor while another goroutine is waiting for I/O on the descriptor. Bootstrapped and ran Go testsuite on sparc-sun-solaris2.11. Committed to mainline and 4.7 branch. Ian diff -r 19c80c28161e libgo/Makefile.am --- a/libgo/Makefile.am Tue Apr 24 13:11:46 2012 -0700 +++ b/libgo/Makefile.am Tue Apr 24 20:10:20 2012 -0700 @@ -654,9 +654,9 @@ else if LIBGO_IS_SOLARIS go_net_cgo_file = go/net/cgo_linux.go -go_net_sock_file = go/net/sock_linux.go -go_net_sockopt_file = go/net/sockopt_linux.go -go_net_sockoptip_file = go/net/sockoptip_linux.go +go_net_sock_file = go/net/sock_solaris.go +go_net_sockopt_file = go/net/sockopt_bsd.go +go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_solaris.go else if LIBGO_IS_FREEBSD go_net_cgo_file = go/net/cgo_bsd.go diff -r 19c80c28161e libgo/Makefile.in --- a/libgo/Makefile.in Tue Apr 24 13:11:46 2012 -0700 +++ b/libgo/Makefile.in Tue Apr 24 20:10:20 2012 -0700 @@ -1019,17 +1019,17 @@ @LIBGO_IS_LINUX_TRUE@go_net_cgo_file = go/net/cgo_linux.go @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sock_file = go/net/sock_bsd.go @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sock_file = go/net/sock_bsd.go -@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sock_file = go/net/sock_linux.go +@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sock_file = go/net/sock_solaris.go @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sock_file = go/net/sock_linux.go @LIBGO_IS_LINUX_TRUE@go_net_sock_file = go/net/sock_linux.go @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockopt_file = go/net/sockopt_bsd.go @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockopt_file = go/net/sockopt_bsd.go -@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockopt_file = go/net/sockopt_linux.go +@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockopt_file = go/net/sockopt_bsd.go @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sockopt_file = go/net/sockopt_linux.go @LIBGO_IS_LINUX_TRUE@go_net_sockopt_file = go/net/sockopt_linux.go @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_netbsd.go @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_freebsd.go -@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockoptip_file = go/net/sockoptip_linux.go +@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_solaris.go @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sockoptip_file = go/net/sockoptip_linux.go @LIBGO_IS_LINUX_TRUE@go_net_sockoptip_file = go/net/sockoptip_linux.go @LIBGO_IS_LINUX_FALSE@go_net_sendfile_file = go/net/sendfile_stub.go diff -r 19c80c28161e libgo/go/net/dial_test.go --- a/libgo/go/net/dial_test.go Tue Apr 24 13:11:46 2012 -0700 +++ b/libgo/go/net/dial_test.go Tue Apr 24 20:10:20 2012 -0700 @@ -130,7 +130,7 @@ n = 1000 } switch runtime.GOOS { - case darwin, freebsd, openbsd, windows: + case darwin, freebsd, openbsd, solaris, windows: // Non-Linux systems take a long time to figure // out that there is nothing listening on localhost. n = 100 diff -r 19c80c28161e libgo/go/net/fd_select.go --- a/libgo/go/net/fd_select.go Tue Apr 24 13:11:46 2012 -0700 +++ b/libgo/go/net/fd_select.go Tue Apr 24 20:10:20 2012 -0700 @@ -102,7 +102,27 @@ break } } - if e != nil { + if e == syscall.EBADF { + // Some file descriptor has been closed. + tmpReadFds = syscall.FdSet{} + tmpWriteFds = syscall.FdSet{} + n = 0 + for i := 0; i p.maxFd+1; i++ { +if syscall.FDIsSet(i, p.readFds) { + var s syscall.Stat_t + if syscall.Fstat(i, s) == syscall.EBADF { + syscall.FDSet(i, tmpReadFds) + n++ + } +} else if syscall.FDIsSet(i, p.writeFds) { + var s syscall.Stat_t + if syscall.Fstat(i, s) == syscall.EBADF { + syscall.FDSet(i, tmpWriteFds) + n++ + } +} + } + } else if e != nil { return -1, 0, os.NewSyscallError(select, e) } if n == 0 { diff -r 19c80c28161e libgo/go/net/multicast_test.go --- a/libgo/go/net/multicast_test.go Tue Apr 24 13:11:46 2012 -0700 +++ b/libgo/go/net/multicast_test.go Tue Apr 24 20:10:20 2012 -0700 @@ -46,7 +46,7 @@ // listener with same address family, same group address and same port. func TestMulticastListener(t *testing.T) { switch runtime.GOOS { - case netbsd, openbsd, plan9, windows: + case netbsd,
libgo patch committed: Fix crypto/rand test to read all data
This patch to libgo fixes the crypto/rand test to read all the data that it needs. This should fix PR 52341. Bootstrapped and ran crypto/rand test on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian diff -r c4931780d05a libgo/go/crypto/rand/rand_test.go --- a/libgo/go/crypto/rand/rand_test.go Tue Apr 24 21:24:41 2012 -0700 +++ b/libgo/go/crypto/rand/rand_test.go Tue Apr 24 21:38:23 2012 -0700 @@ -7,6 +7,7 @@ import ( bytes compress/flate + io testing ) @@ -16,9 +17,9 @@ n = 1e5 } b := make([]byte, n) - n, err := Read(b) + n, err := io.ReadFull(Reader, b) if n != len(b) || err != nil { - t.Fatalf(Read(buf) = %d, %s, n, err) + t.Fatalf(ReadFull(buf) = %d, %s, n, err) } var z bytes.Buffer
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen a...@firstfloor.org wrote: tejohn...@google.com (Teresa Johnson) writes: This patch adds heuristics to limit unrolling in loops with branches that may increase branch mispredictions. It affects loops that are not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions. Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems. This improves performance of an internal search indexing benchmark by close to 2% on all the tested Intel platforms. It also consistently improves 445.gobmk (with FDO feedback where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are neutral. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is this ok for trunk? One problem with any unrolling heuristics is currently that gcc has both the tree level and the rtl level unroller. The tree one is even on at -O3. So if you tweak anything for one you have to affect both, otherwise the other may still do the wrong thing(tm). Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2, both of them are turned on, but gcc does not allow any code growth -- which makes them pretty useless at O2 (very few loops qualify). The default max complete peel iteration is also too low compared with both icc and llvm. This needs to be tuned. David For some other tweaks I looked into a shared cost model some time ago. May be still needed. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: PowerPC prologue and epilogue 6
On Tue, Apr 24, 2012 at 07:19:42PM -0400, David Edelsohn wrote: This patch is okay with the macro usage fix. Thanks, series 2 to 6 committed as 186796, 186797, 186798, 186799, 186800. I noticed after I committed the lot that 186797 has some duplicated lines (harmless), corrected in 186798, and 186799 kept the old cr_save_regno assignment (again harmless), corrected in 186800. A result of merge conflicts. I normally start from a clean source tree, apply patch as posted, commit, repeat. This time I had a series of directories with the cumulative patches applied. Bad idea unless you use mf to resolve conflicts.. This patch adds a testcase to verify register saves and restores. I tried to write it so that it will run on all powerpc targets. From past experience it probably won't. OK to apply anyway, and fix fallout later? * gcc.target/powerpc/savres.c: New test. * gcc.target/powerpc/powerpc.exp: Run it. Index: gcc/testsuite/gcc.target/powerpc/powerpc.exp === --- gcc/testsuite/gcc.target/powerpc/powerpc.exp(revision 186800) +++ gcc/testsuite/gcc.target/powerpc/powerpc.exp(working copy) @@ -37,5 +37,14 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \ $DEFAULT_CFLAGS +set SAVRES_TEST_OPTS [list -Os -O2 {-Os -mno-multiple} {-O2 -mno-multiple}] +set alti +if [check_vmx_hw_available] { +set alti -maltivec +} +torture-init +set-torture-options $SAVRES_TEST_OPTS +gcc-dg-runtest [list $srcdir/$subdir/savres.c] $alti + # All done. dg-finish Index: gcc/testsuite/gcc.target/powerpc/savres.c === --- gcc/testsuite/gcc.target/powerpc/savres.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/savres.c (revision 0) @@ -0,0 +1,1158 @@ +/* { dg-do run } */ +/* { dg-options -fno-inline -fomit-frame-pointer } */ + +/* -fno-inline -maltivec -m32/-m64 -mmultiple/no-multiple -Os/-O2. */ +#ifndef NO_BODY +#define abort() __builtin_abort () +#define vec_all_eq(v1,v2) __builtin_vec_vcmpeq_p (2, v1, v2) +#define SET(T,R,V) register T R __asm__ (#R) = V +#define SET_GPR(R,V) SET (long, R, V) +#define SET_FPR(R,V) SET (double, R, V) +#define SET_VR(R,V) SET (__attribute__ ((vector_size (16))) int, R, V) +#define SET_CR(R,V) __asm__ __volatile__ (mtcrf %0,%1 : : n (1(7-R)), r (V(4*(7-R))) : cr #R) +#define TRASH_GPR(R) SET_GPR (R, 0) +#define TRASH_FPR(R) SET_FPR (R, 0) +#define TRASH_VR(R) SET_VR (R, val0) +#define TRASH_CR(R) SET_CR (R, 0) +#define TRASH_SOME_GPR TRASH_GPR (r30); TRASH_GPR (r31) +#define TRASH_SOME_FPR TRASH_FPR (fr28); TRASH_FPR (fr31) +#define TRASH_SOME_VR TRASH_VR (v26); TRASH_VR (v27); TRASH_VR (v31) +#define TRASH_SOME_CR TRASH_CR (2) +#define TRASH_ALL_GPR TRASH_GPR (r14); TRASH_GPR (r15); TRASH_GPR (r16); TRASH_GPR (r17); TRASH_GPR (r18); TRASH_GPR (r19); TRASH_GPR (r20); TRASH_GPR (r21); TRASH_GPR (r22); TRASH_GPR (r23); TRASH_GPR (r24); TRASH_GPR (r25); TRASH_GPR (r26); TRASH_GPR (r27); TRASH_GPR (r28); TRASH_GPR (r29); TRASH_GPR (r30); TRASH_GPR (r31) +#define TRASH_ALL_FPR TRASH_FPR (fr14); TRASH_FPR (fr15); TRASH_FPR (fr16); TRASH_FPR (fr17); TRASH_FPR (fr18); TRASH_FPR (fr19); TRASH_FPR (fr20); TRASH_FPR (fr21); TRASH_FPR (fr22); TRASH_FPR (fr23); TRASH_FPR (fr24); TRASH_FPR (fr25); TRASH_FPR (fr26); TRASH_FPR (fr27); TRASH_FPR (fr28); TRASH_FPR (fr29); TRASH_FPR (fr30); TRASH_FPR (fr31) +#define TRASH_ALL_VR TRASH_VR (v20); TRASH_VR (v21); TRASH_VR (v22); TRASH_VR (v23); TRASH_VR (v24); TRASH_VR (v25); TRASH_VR (v26); TRASH_VR (v27); TRASH_VR (v28); TRASH_VR (v29); TRASH_VR (v30); TRASH_VR (v31) +#define TRASH_ALL_CR TRASH_CR (2); TRASH_CR (3); TRASH_CR (4) +#define USE_SOME_GPR __asm__ __volatile__ (#%0 %1 : : r (r30), r (r31)) +#define USE_SOME_FPR __asm__ __volatile__ (#%0 %1 : : f (fr28), f (fr31)) +#define USE_SOME_VR __asm__ __volatile__ (#%0 %1 %2 : : v (v26), v (v27), v (v31)) +#define USE_SOME_CR +#define USE_ALL_GPR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17 : : r (r14), r (r15), r (r16), r (r17), r (r18), r (r19), r (r20), r (r21), r (r22), r (r23), r (r24), r (r25), r (r26), r (r27), r (r28), r (r29), r (r30), r (r31)) +#define USE_ALL_FPR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17 : : f (fr14), f (fr15), f (fr16), f (fr17), f (fr18), f (fr19), f (fr20), f (fr21), f (fr22), f (fr23), f (fr24), f (fr25), f (fr26), f (fr27), f (fr28), f (fr29), f (fr30), f (fr31)) +#define USE_ALL_VR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 : : v (v20), v (v21), v (v22), v (v23), v (v24), v (v25), v (v26), v (v27), v (v28), v (v29), v (v30), v (v31)) +#define USE_ALL_CR + +#define INIT_GPR SET_GPR (r14, 14); SET_GPR (r15, 15); SET_GPR (r16, 16); SET_GPR (r17, 17); SET_GPR (r18, 18); SET_GPR (r19, 19); SET_GPR (r20, 20); SET_GPR (r21, 21); SET_GPR (r22, 22); SET_GPR (r23, 23);
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue 6099055)
http://codereview.appspot.com/6099055/diff/1/loop-unroll.c File loop-unroll.c (right): http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode156 loop-unroll.c:156: static bool An empty line here. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode182 loop-unroll.c:182: static bool add empty line after comment. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode213 loop-unroll.c:213: /* Compute the number of branches in LOOP, weighted by execution counts. */ same here. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode215 loop-unroll.c:215: compute_weighted_branches(struct loop *loop) Missing space. Similarly for other functions. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode255 loop-unroll.c:255: data exists, simply return the current NUNROLL factor. */ same here. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode281 loop-unroll.c:281: while (loop_outer(outer)) Should the check start from the current loop? Or the handling of the current loop should be peeled out -- max_unroll_factor = branch_budget/weighted_num_branches http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode317 loop-unroll.c:317: return (PARAM_VALUE (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET) - The number of branches may be larger than budget -- leading to overflow -- need to guard against it. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode318 loop-unroll.c:318: weighted_outer_branches)/(weighted_num_branches - 1) + 1; Should it continue walking up the loop tree and find the minmal unroll factor? http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode747 loop-unroll.c:747: loop_has_FP_comp(loop) missing space http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode749 loop-unroll.c:749: desc-niter (unsigned) PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES)) line too long. http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode1057 loop-unroll.c:1057: loop_has_FP_comp(loop) Refactor the check into a helper function? http://codereview.appspot.com/6099055/diff/1/params.def File params.def (right): http://codereview.appspot.com/6099055/diff/1/params.def#newcode314 params.def:314: Missing comment. http://codereview.appspot.com/6099055/diff/1/params.def#newcode319 params.def:319: missing comment. http://codereview.appspot.com/6099055/