Re: [Google] Recompute function frequency after calculating branch probability
Ok. thanks, David On Sun, Apr 7, 2013 at 8:07 PM, Dehao Chen de...@google.com wrote: Hi, This patch updates the function frequency after calculating branch probability. This is important because cold function could be promoted to hot after ipa-inline. Bootstrapped and passed gcc regression tests. Okay for google-4_7? Thanks, Dehao --- a/gcc/predict.c +++ b/gcc/predict.c @@ -2877,7 +2877,10 @@ rebuild_frequencies (void) else if (profile_status == PROFILE_READ) { if (flag_auto_profile) - afdo_calculate_branch_prob (); + { + afdo_calculate_branch_prob (); + compute_function_frequency (); + } counts_to_freqs (); } else
Re: [PATCH] Fix PR48182
On Fri, Apr 05, 2013 at 03:00:43PM -0600, Jeff Law wrote: On 04/05/2013 02:50 PM, Jakub Jelinek wrote: On Fri, Apr 05, 2013 at 02:42:19PM -0600, Jeff Law wrote: ? I must be missing something, the change causes an early bail out from try_crossjump_to_edge. We don't want to raise the min to 0 as that doesn't allow the user to turn on this specific transformation. The condition is if (nmatch PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)) return false; // aka don't crossjump So, the smaller the N in --param min-crossjump-insns=N is, the more likely we crossjump. Thus N=0 should mean that it is most likely we crossjump, and as N=1 requires that at least one insn matches, N=0 would mean that even zero insns can match. If we for --param min-crossjump-insns=0 always return false, it means we never crossjump, so it is least likely that we crossjump, which corresponds to largest possible N, not smallest one. Yes the smaller the N, the more likely we are to crossjump, of course the value 0 would make no sense (I'm clearly out of practice on reviews :-). Yea, changing the min value in params.def to 1 would be a better way to fix. Consider that patch pre-approved. Ok, thanks. I'll apply this one. Regtest/bootstrap pending. 2013-04-08 Marek Polacek pola...@redhat.com PR rtl-optimization/48182 * params.def (PARAM_MIN_CROSSJUMP_INSNS): Increase the minimum value to 1. --- gcc/params.def.mp 2013-04-08 08:38:48.515263034 +0200 +++ gcc/params.def 2013-04-08 08:39:10.444340238 +0200 @@ -433,7 +433,7 @@ DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES, DEFPARAM(PARAM_MIN_CROSSJUMP_INSNS, min-crossjump-insns, The minimum number of matching instructions to consider for crossjumping, - 5, 0, 0) + 5, 1, 0) /* The maximum number expansion factor when copying basic blocks. */ DEFPARAM(PARAM_MAX_GROW_COPY_BB_INSNS, Marek
Re: [PATCH] Fix PR48182
On Mon, Apr 08, 2013 at 08:48:22AM +0200, Marek Polacek wrote: Yea, changing the min value in params.def to 1 would be a better way to fix. Consider that patch pre-approved. Ok, thanks. I'll apply this one. Regtest/bootstrap pending. Thanks. Also ok for 4.8. 2013-04-08 Marek Polacek pola...@redhat.com PR rtl-optimization/48182 * params.def (PARAM_MIN_CROSSJUMP_INSNS): Increase the minimum value to 1. --- gcc/params.def.mp 2013-04-08 08:38:48.515263034 +0200 +++ gcc/params.def2013-04-08 08:39:10.444340238 +0200 @@ -433,7 +433,7 @@ DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES, DEFPARAM(PARAM_MIN_CROSSJUMP_INSNS, min-crossjump-insns, The minimum number of matching instructions to consider for crossjumping, - 5, 0, 0) + 5, 1, 0) /* The maximum number expansion factor when copying basic blocks. */ DEFPARAM(PARAM_MAX_GROW_COPY_BB_INSNS, Jakub
Re: [Patch, fortran, 4.9] Use bool type instead gfc_try
PING (now in plain text mode so that the lists will accept the message, hopefully. $#% gmail improvements.) On Fri, Mar 22, 2013 at 8:58 AM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: On Thu, Mar 21, 2013 at 11:31 PM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Updated patch which in addition does the above transformations as well. .. and here is the actual patch (thanks Bernhard!) -- Janne Blomqvist -- Janne Blomqvist
[Committed] S/390: Fix pr48335 testsuite fails
Hi, I've committed the attached patch fixing the following testsuite fails on s390 (-march=z196): FAIL: gcc.dg/pr48335-2.c (internal compiler error) FAIL: gcc.dg/pr48335-2.c (test for excess errors) FAIL: gcc.dg/pr48335-3.c (internal compiler error) FAIL: gcc.dg/pr48335-3.c (test for excess errors) I've also committed the fix to 4.8 branch since it is a regression from 4.7. Bye, -Andreas- 2013-04-08 Andreas Krebbel andreas.kreb...@de.ibm.com * config/s390/s390.c (s390_expand_insv): Only accept insertions within mode size. --- gcc/config/s390/s390.c |3 +++ 1 file changed, 3 insertions(+) Index: gcc/config/s390/s390.c === *** gcc/config/s390/s390.c.orig --- gcc/config/s390/s390.c *** s390_expand_insv (rtx dest, rtx op1, rtx *** 4648,4653 --- 4648,4656 int smode_bsize, mode_bsize; rtx op, clobber; + if (bitsize + bitpos GET_MODE_SIZE (mode)) + return false; + /* Generate INSERT IMMEDIATE (IILL et al). */ /* (set (ze (reg)) (const_int)). */ if (TARGET_ZARCH
[PATCH] Another ldist testcase
Hi! I was curious whether we don't miscompile the following testcase on 4.8 branch (-1+0i matches integer_all_onesp), but apparently we don't, because TYPE_PRECISION on the COMPLEX_TYPE is 0. Anyway, I'd like to check this into trunk/4.8 branch, ok? 2013-04-08 Jakub Jelinek ja...@redhat.com * gcc.c-torture/execute/pr56837.c: New test. --- gcc/testsuite/gcc.c-torture/execute/pr56837.c.jj2013-02-13 21:50:57.150673158 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr56837.c 2013-04-08 10:23:44.941870778 +0200 @@ -0,0 +1,21 @@ +extern void abort (void); +_Complex int a[1024]; + +__attribute__((noinline, noclone)) void +foo (void) +{ + int i; + for (i = 0; i 1024; i++) +a[i] = -1; +} + +int +main () +{ + int i; + foo (); + for (i = 0; i 1024; i++) +if (a[i] != -1) + abort (); + return 0; +} Jakub
Re: [Patch, fortran, 4.9] Use bool type instead gfc_try
Janne Blomqvist wrote: On Thu, Mar 21, 2013 at 11:31 PM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Updated patch which in addition does the above transformations as well. .. and here is the actual patch (thanks Bernhard!) http://gcc.gnu.org/ml/fortran/2013-03/msg00108.html Thanks for the update and sorry for the delay. The patch idea as such is okay. However, the patch isn't. + if (!gfc_notify_std(GFC_STD_F2003, Noninteger exponent in an initialization expression at %L, op2-where)) Missing before the ( additionally, the line is way too long. That's actually an issue throughout the whole file. Additionally, the reformating caused code like: Noninteger exponent in The is quite ugly. If you fix those issues, and update the patch for the newly added code (which presumably added a few FAILUREs), the patch is okay. It is, indeed, most of the time helpful as it shortens the code without loosing its clearness. (Only at a few places, 'I found FAILURE/SUCCESS a tad clearer.) Thanks for the patch. For nicer looking code, you could also do: * Remove the trailing for + return false; (That's the only modified line with a trailing space) * Change + if (!gfc_resolve_expr(e) + || !gfc_specification_expr(e)) +return false; to if (!gfc_resolve_expr(e) || !gfc_specification_expr(e)) * Ditto for: + if (t b-expr1 != NULL and a few more. Tobias
[PATCH] Adjust g++.dg/vect/slp-pr56812.cc
This adjusts g++.dg/vect/slp-pr56812.cc for targets that cannot handle HW misaligned vector loads. Tested on x86_64-unknown-linux-gnu, confirmed by Andreas that it helps ia64 / powerpc. Richard. 2013-04-08 Richard Biener rguent...@suse.de * g++.dg/vect/slp-pr56812.cc: Adjust. Index: gcc/testsuite/g++.dg/vect/slp-pr56812.cc === --- gcc/testsuite/g++.dg/vect/slp-pr56812.cc(revision 197480) +++ gcc/testsuite/g++.dg/vect/slp-pr56812.cc(working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target vect_hw_misalign } */ /* { dg-additional-options -O3 -funroll-loops -fvect-cost-model } */ class mydata {
Re: [committed] Fix GCC bootstrap on hppa*-*-hpux* using HP cat
On Sat, 6 Apr 2013, John David Anglin wrote: The patch fixes PR other/55274 and we now generate a non empty map file. As noted in the PR, this problem causes a hang when bootstrap is done using HP cat. Tested on hppa64-hp-hpux11.11 and hppa2.0w-hp-hpux11.11. Committed to trunk and 4.8. Richard, would it be ok to apply to the 4.7 branch? This is a 4.7 regression. Sure. Thanks, Richard.
[PATCH] Adjust gfortran.dg/vect/fast-math-pr37021.f90
To require vect_double. Committed. Richard. 2013-04-08 Richard Biener rguent...@suse.de * gfortran.dg/vect/fast-math-pr37021.f90: Adjust. Index: gcc/testsuite/gfortran.dg/vect/fast-math-pr37021.f90 === --- gcc/testsuite/gfortran.dg/vect/fast-math-pr37021.f90(revision 197568) +++ gcc/testsuite/gfortran.dg/vect/fast-math-pr37021.f90(working copy) @@ -1,4 +1,5 @@ ! { dg-do compile } +! { dg-require-effective-target vect_double } subroutine to_product_of(self,a,b,a1,a2) complex(kind=8) :: self (:)
Re: [PATCH] Another ldist testcase
On Mon, 8 Apr 2013, Jakub Jelinek wrote: Hi! I was curious whether we don't miscompile the following testcase on 4.8 branch (-1+0i matches integer_all_onesp), but apparently we don't, because TYPE_PRECISION on the COMPLEX_TYPE is 0. Anyway, I'd like to check this into trunk/4.8 branch, ok? Ok. Thanks, Richard. 2013-04-08 Jakub Jelinek ja...@redhat.com * gcc.c-torture/execute/pr56837.c: New test. --- gcc/testsuite/gcc.c-torture/execute/pr56837.c.jj 2013-02-13 21:50:57.150673158 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr56837.c 2013-04-08 10:23:44.941870778 +0200 @@ -0,0 +1,21 @@ +extern void abort (void); +_Complex int a[1024]; + +__attribute__((noinline, noclone)) void +foo (void) +{ + int i; + for (i = 0; i 1024; i++) +a[i] = -1; +} + +int +main () +{ + int i; + foo (); + for (i = 0; i 1024; i++) +if (a[i] != -1) + abort (); + return 0; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH] MEM_REF clobber handling fixes/improvements (PR c++/34949, PR c++/50243)
On Thu, 4 Apr 2013, Jakub Jelinek wrote: Hi! The vt3.C testcase (from PR34949) ICEd, because sink_clobbers sunk a MEM_REF[SSA_NAME] clobber from a landing pad which that SSA_NAME definition dominated to an outer one which wasn't dominated by the definition. As the ehcleanup nor ehdisp passes keep dominance info current and perform changes that invalidate it, I can't unfortunately do cheaply a dominated_by_p check, so the patch just throws away all MEM_REF[SSA_NAME] clobbers if SSA_NAME isn't a default def which is valid everywhere. As sink_clobbers is only done on otherwise empty bb's and typically the clobbers are preceeded by some stores which are to be DSEd if unneeded and after DSEing aren't really needed anymore, this doesn't seem to hurt much. The patch also improves optimize_clobbers, so that it only removes any clobbers if the bb is actually empty (except for clobbers, resx, maybe debug stmts or __builtin_stack_restore), that way needed clobbers are kept around until they are used by DSE. Also, MEM_REF[SSA_NAME] clobbers aren't useful very late in the optimization pipeline, but could cause some SSA_NAMEs to be considered unnecessarily live (especially if they are considered live across EH edges it is undesirable), such clobbers are mainly useful during DSE1/DSE2, but at expansion are completely ignored (unlike VAR_DECL clobbers, which are also used for the stack layout decisions), so the patch removes all those MEM_REF[SSA_NAME] clobbers shortly after dse2 (in fab pass). Bootstrapped/regtested on x86_64-linux and i686-linux, on libstdc++ I saw some code size reduction with the patch. Ok for trunk? Ok. Thanks, Richard. 2013-04-04 Jakub Jelinek ja...@redhat.com PR c++/34949 PR c++/50243 * tree-eh.c (optimize_clobbers): Only remove clobbers if bb doesn't contain anything but clobbers, at most one __builtin_stack_restore, optionally debug stmts and final resx, and if it has at least one incoming EH edge. Don't check for SSA_NAME on LHS of a clobber. (sink_clobbers): Don't check for SSA_NAME on LHS of a clobber. Instead of moving clobbers with MEM_REF LHS with SSA_NAME address which isn't defaut definition, remove them. (unsplit_eh, cleanup_empty_eh): Use single_{pred,succ}_{p,edge} instead of EDGE_COUNT comparisons or EDGE_{PRED,SUCC}. * tree-ssa-ccp.c (execute_fold_all_builtins): Remove clobbers with MEM_REF LHS with SSA_NAME address. * g++.dg/opt/vt3.C: New test. * g++.dg/opt/vt4.C: New test. --- gcc/tree-eh.c.jj 2013-03-26 10:03:55.0 +0100 +++ gcc/tree-eh.c 2013-04-04 13:44:27.718982776 +0200 @@ -3230,14 +3230,48 @@ static void optimize_clobbers (basic_block bb) { gimple_stmt_iterator gsi = gsi_last_bb (bb); + bool any_clobbers = false; + bool seen_stack_restore = false; + edge_iterator ei; + edge e; + + /* Only optimize anything if the bb contains at least one clobber, + ends with resx (checked by caller), optionally contains some + debug stmts or labels, or at most one __builtin_stack_restore + call, and has an incoming EH edge. */ for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) { gimple stmt = gsi_stmt (gsi); if (is_gimple_debug (stmt)) continue; - if (!gimple_clobber_p (stmt) - || TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) - return; + if (gimple_clobber_p (stmt)) + { + any_clobbers = true; + continue; + } + if (!seen_stack_restore +gimple_call_builtin_p (stmt, BUILT_IN_STACK_RESTORE)) + { + seen_stack_restore = true; + continue; + } + if (gimple_code (stmt) == GIMPLE_LABEL) + break; + return; +} + if (!any_clobbers) +return; + FOR_EACH_EDGE (e, ei, bb-preds) +if (e-flags EDGE_EH) + break; + if (e == NULL) +return; + gsi = gsi_last_bb (bb); + for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (!gimple_clobber_p (stmt)) + continue; unlink_stmt_vdef (stmt); gsi_remove (gsi, true); release_defs (stmt); @@ -3278,8 +3312,7 @@ sink_clobbers (basic_block bb) continue; if (gimple_code (stmt) == GIMPLE_LABEL) break; - if (!gimple_clobber_p (stmt) - || TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) + if (!gimple_clobber_p (stmt)) return 0; any_clobbers = true; } @@ -3292,11 +3325,27 @@ sink_clobbers (basic_block bb) for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) { gimple stmt = gsi_stmt (gsi); + tree lhs; if (is_gimple_debug (stmt)) continue; if (gimple_code (stmt) == GIMPLE_LABEL) break; unlink_stmt_vdef (stmt); + lhs = gimple_assign_lhs (stmt); + /* Unfortunately we don't have
Re: [patch] Fix node weight updates during ipa-cp (issue7812053)
On Fri, Apr 5, 2013 at 4:18 PM, Teresa Johnson tejohn...@google.com wrote: On Thu, Mar 28, 2013 at 2:27 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Mar 27, 2013 at 6:22 PM, Teresa Johnson tejohn...@google.com wrote: I found that the node weight updates on cloned nodes during ipa-cp were leading to incorrect/insane weights. Both the original and new node weight computations used truncating divides, leading to a loss of total node weight. I have fixed this by making both rounding integer divides. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? I'm sure we can outline a rounding integer divide inline function on gcov_type. To gcov-io.h, I suppose. Otherwise this looks ok to me. Thanks. I went ahead and worked on outlining this functionality. In the process of doing so, I discovered that there was already a method in basic-block.h to do part of this: apply_probability(), which does the rounding divide by REG_BR_PROB_BASE. There is a related function combine_probabilities() that takes 2 int probabilities instead of a gcov_type and an int probability. I decided to use apply_probability() in ipa-cp, and add a new macro GCOV_COMPUTE_SCALE to basic-block.h to compute the scale factor/probability via a rounding divide. So the ipa-cp changes I made use both GCOV_COMPUTE_SCALE and apply_probability. I then went through all the code to look for instances where we were computing scale factors/probabilities and performing scaling. I found a mix of existing uses of apply/combine_probabilities, uses of RDIV, inlined rounding divides, and truncating divides. I think it would be good to unify all of this. As a first step, I replaced all inline code sequences that were already doing rounding divides to compute scale factors/probabilities or do the scaling, to instead use the appropriate helper function/macro described above. For these locations, there should be no change to behavior. There are a number of places where there are truncating divides right now. Since changing those may impact the resulting behavior, for this patch I simply added a comment as to which helper they should use. As soon as this patch goes in I am planning to change those to use the appropriate helper and test performance, and then will send that patch for review. So for this patch, the only place where behavior is changed is in ipa-cp which was my original change. New patch is attached. Bootstrapped (both bootstrap and profiledbootstrap) and tested on x86-64-unknown-linux-gnu. Ok for trunk? Ok. Thanks, Richard. Thanks, Teresa Thanks, Richard. 2013-03-27 Teresa Johnson tejohn...@google.com * ipa-cp.c (update_profiling_info): Perform rounding integer division when updating weights instead of truncating. (update_specialized_profile): Ditto. Index: ipa-cp.c === --- ipa-cp.c(revision 197118) +++ ipa-cp.c(working copy) @@ -2588,14 +2588,18 @@ update_profiling_info (struct cgraph_node *orig_no for (cs = new_node-callees; cs ; cs = cs-next_callee) if (cs-frequency) - cs-count = cs-count * (new_sum * REG_BR_PROB_BASE - / orig_node_count) / REG_BR_PROB_BASE; + cs-count = (cs-count + * ((new_sum * REG_BR_PROB_BASE + orig_node_count/2) + / orig_node_count) + + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE; else cs-count = 0; for (cs = orig_node-callees; cs ; cs = cs-next_callee) -cs-count = cs-count * (remainder * REG_BR_PROB_BASE -/ orig_node_count) / REG_BR_PROB_BASE; +cs-count = (cs-count + * ((remainder * REG_BR_PROB_BASE + orig_node_count/2) +/ orig_node_count) + + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE; if (dump_file) dump_profile_updates (orig_node, new_node); @@ -2627,14 +2631,19 @@ update_specialized_profile (struct cgraph_node *ne for (cs = new_node-callees; cs ; cs = cs-next_callee) if (cs-frequency) - cs-count += cs-count * redirected_sum / new_node_count; + cs-count += (cs-count +* ((redirected_sum * REG_BR_PROB_BASE ++ new_node_count/2) / new_node_count) ++ REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE; else cs-count = 0; for (cs = orig_node-callees; cs ; cs = cs-next_callee) { - gcov_type dec = cs-count * (redirected_sum * REG_BR_PROB_BASE - / orig_node_count) / REG_BR_PROB_BASE; + gcov_type dec = (cs-count + * ((redirected_sum * REG_BR_PROB_BASE + + orig_node_count/2) / orig_node_count) + + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE; if (dec cs-count) cs-count -= dec; else
Re: [patch tree-ssa-structalias.c]: Small finding in find_func_aliases function
On Fri, Apr 5, 2013 at 9:30 PM, Jeff Law l...@redhat.com wrote: On 04/05/2013 02:29 AM, Kai Tietz wrote: Hello, while debugging I made the finding that in find_func_aliases rhsop might be used as NULL for gimple_assign_single_p items. It should be using for the gimple_assign_single_p instead directly the rhs1-item as argument to pass to get_constraint_for_rhs function. ChangeLog 2013-04-05 Kai Tietz * tree-ssa-structalias.c (find_func_aliases): Special-case gimple_assign_single_p handling. Ok for apply? Yes. OK for the trunk. Do you have a testcase? He can't because the analysis is wrong. GIMPLE_SINGLE_RHS have exactly two operands thus rhsop is always gimple_assign_rhs1 (). So the patch only un-CSEs gimple_assign_rhs1 (). The is_gimple_assign () case can surely be re-worked to be easier to read but the patch doesn't improve things. Please revert it. Thanks, Richard. jeff
Re: [patch tree-ssa-structalias.c]: Small finding in find_func_aliases function
I haven't even applied it. Kai
Re: [RFA] [PATCH] Minor improvement to canonicalization of COND_EXPR for gimple
On Sat, Apr 6, 2013 at 1:13 PM, Jeff Law l...@redhat.com wrote: The tree combiner/forward propagator is missing opportunities to collapse sequences like this: _15 = _12 ^ _14; if (_15 != 0) Into: if (_12 != _14) The tree combiner/forward propagator builds this tree: x ^ y Then passes it to canonicalize_cond_expr_cond That is not suitable for the condition in a gimple COND_EXPR. So canonicalize_cond_expr_cond returns NULL. Thus combine_cond_expr_cond decides the tree it created isn't useful and throws it away. This patch changes canonicalize_cond_expr to rewrite x ^ y into x != y. The net result being the tree combiner/forward propagator is able to perform the desired simplification, eliminating the BIT_XOR_EXPR. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. As you can see from the testcase, these kinds of sequences show up when compiling gcc itself. OK for the trunk? Nice. Ok. Thanks, Richard. commit 809408a4bde6dfbaf62c5bda9ab7ae6c4447d984 Author: Jeff Law l...@redhat.com Date: Sat Apr 6 05:11:17 2013 -0600 * gimple.c (canonicalize_cond_expr_cond): Rewrite x ^ y into x != y. * gcc.dg/tree-ssa/forwprop-25.c: New test diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b8a6900..44797cc 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-04-06 Jeff Law l...@redhat.com + + * gimple.c (canonicalize_cond_expr_cond): Rewrite x ^ y into + x != y. + 2013-04-03 Jeff Law l...@redhat.com * Makefile.in (lra-constraints.o): Depend on $(OPTABS_H). diff --git a/gcc/gimple.c b/gcc/gimple.c index 785c2f0..cdb6f24 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2958,7 +2958,11 @@ canonicalize_cond_expr_cond (tree t) t = build2 (TREE_CODE (top0), TREE_TYPE (t), TREE_OPERAND (top0, 0), TREE_OPERAND (top0, 1)); } - + /* For x ^ y use x != y. */ + else if (TREE_CODE (t) == BIT_XOR_EXPR) +t = build2 (NE_EXPR, TREE_TYPE (t), + TREE_OPERAND (t, 0), TREE_OPERAND (t, 1)); + if (is_gimple_condexpr (t)) return t; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index dc0b745..601ca66 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-04-06 Jeff Law l...@redhat.com + + * gcc.dg/tree-ssa/forwprop-25.c: New test + 2013-04-03 Jeff Law l...@redhat.com PR tree-optimization/56799 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-25.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-25.c new file mode 100644 index 000..cf0c504 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-25.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-forwprop1 } */ + +struct rtx_def; +typedef struct rtx_def *rtx; +typedef const struct rtx_def *const_rtx; +enum machine_mode +{ + MAX_MACHINE_MODE, + NUM_MACHINE_MODES = MAX_MACHINE_MODE +}; +extern const char *const mode_name[NUM_MACHINE_MODES]; +enum mode_class +{ MODE_RANDOM, MODE_CC, MODE_INT, MODE_PARTIAL_INT, MODE_FRACT, MODE_UFRACT, +MODE_ACCUM, MODE_UACCUM, MODE_FLOAT, MODE_DECIMAL_FLOAT, MODE_COMPLEX_INT, +MODE_COMPLEX_FLOAT, MODE_VECTOR_INT, MODE_VECTOR_FRACT, +MODE_VECTOR_UFRACT, MODE_VECTOR_ACCUM, MODE_VECTOR_UACCUM, +MODE_VECTOR_FLOAT, MAX_MODE_CLASS }; +extern const unsigned char mode_class[NUM_MACHINE_MODES]; +extern const unsigned short mode_precision[NUM_MACHINE_MODES]; +struct rtx_def +{ + __extension__ enum machine_mode mode:8; +}; +void +convert_move (rtx to, rtx from, int unsignedp) +{ + enum machine_mode to_mode = ((enum machine_mode) (to)-mode); + enum machine_mode from_mode = ((enum machine_mode) (from)-mode); + ((void) + (!((mode_precision[from_mode] != mode_precision[to_mode]) + || enum mode_class) mode_class[from_mode]) == MODE_DECIMAL_FLOAT) != + (((enum mode_class) mode_class[to_mode]) == + MODE_DECIMAL_FLOAT))) ? +fancy_abort (/home/gcc/virgin-gcc/gcc/expr.c, 380, __FUNCTION__), +0 : 0)); +} + +/* { dg-final { scan-tree-dump Replaced.*!=.*with.*!=.* forwprop1} } */ +/* { dg-final { cleanup-tree-dump forwprop1 } } */ + + +
[PATCH, libstdc++]: Update alpha baseline_symbols.txt
Hello! 2013-04-08 Uros Bizjak ubiz...@gmail.com * config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Update. Tested on alphaev68-pc-linux-gnu. OK for mainline SVN? Uros. Index: config/abi/post/alpha-linux-gnu/baseline_symbols.txt === --- config/abi/post/alpha-linux-gnu/baseline_symbols.txt(revision 197551) +++ config/abi/post/alpha-linux-gnu/baseline_symbols.txt(working copy) @@ -543,6 +543,7 @@ FUNC:_ZNKSt17__gnu_cxx_ldbl1289money_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE8__do_putES4_bRSt8ios_basewd@@GLIBCXX_LDBL_3.4 FUNC:_ZNKSt17__gnu_cxx_ldbl1289money_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE9_M_insertILb0EEES4_S4_RSt8ios_basewRKSbIwS3_SaIwEE@@GLIBCXX_LDBL_3.4 FUNC:_ZNKSt17__gnu_cxx_ldbl1289money_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE9_M_insertILb1EEES4_S4_RSt8ios_basewRKSbIwS3_SaIwEE@@GLIBCXX_LDBL_3.4 +FUNC:_ZNKSt17bad_function_call4whatEv@@GLIBCXX_3.4.18 FUNC:_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv@@GLIBCXX_3.4 FUNC:_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE5rdbufEv@@GLIBCXX_3.4 FUNC:_ZNKSt18basic_stringstreamIwSt11char_traitsIwESaIwEE3strEv@@GLIBCXX_3.4 @@ -732,6 +733,8 @@ FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewm@@GLIBCXX_3.4 FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewx@@GLIBCXX_3.4 FUNC:_ZNKSt7num_putIwSt19ostreambuf_iteratorIwSt11char_traitsIwEEE6do_putES3_RSt8ios_basewy@@GLIBCXX_3.4 +FUNC:_ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm@@GLIBCXX_3.4.18 +FUNC:_ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm@@GLIBCXX_3.4.18 FUNC:_ZNKSt8bad_cast4whatEv@@GLIBCXX_3.4.9 FUNC:_ZNKSt8ios_base7failure4whatEv@@GLIBCXX_3.4 FUNC:_ZNKSt8messagesIcE18_M_convert_to_charERKSs@@GLIBCXX_3.4 @@ -1353,6 +1356,7 @@ FUNC:_ZNSt11regex_errorD0Ev@@GLIBCXX_3.4.15 FUNC:_ZNSt11regex_errorD1Ev@@GLIBCXX_3.4.15 FUNC:_ZNSt11regex_errorD2Ev@@GLIBCXX_3.4.15 +FUNC:_ZNSt11this_thread11__sleep_forENSt6chrono8durationIlSt5ratioILl1ELl1NS1_IlS2_ILl1ELl10@@GLIBCXX_3.4.18 FUNC:_ZNSt12__basic_fileIcE2fdEv@@GLIBCXX_3.4 FUNC:_ZNSt12__basic_fileIcE4fileEv@@GLIBCXX_3.4.1 FUNC:_ZNSt12__basic_fileIcE4openEPKcSt13_Ios_Openmodei@@GLIBCXX_3.4 @@ -1635,6 +1639,11 @@ FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEt@@GLIBCXX_3.4 FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEx@@GLIBCXX_3.4 FUNC:_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEy@@GLIBCXX_3.4 +FUNC:_ZNSt13random_device14_M_init_pretr1ERKSs@@GLIBCXX_3.4.18 +FUNC:_ZNSt13random_device16_M_getval_pretr1Ev@@GLIBCXX_3.4.18 +FUNC:_ZNSt13random_device7_M_finiEv@@GLIBCXX_3.4.18 +FUNC:_ZNSt13random_device7_M_initERKSs@@GLIBCXX_3.4.18 +FUNC:_ZNSt13random_device9_M_getvalEv@@GLIBCXX_3.4.18 FUNC:_ZNSt13runtime_errorC1ERKSs@@GLIBCXX_3.4 FUNC:_ZNSt13runtime_errorC2ERKSs@@GLIBCXX_3.4 FUNC:_ZNSt13runtime_errorD0Ev@@GLIBCXX_3.4 @@ -2393,14 +2402,17 @@ FUNC:_ZNVSt9__atomic011atomic_flag5clearESt12memory_order@@GLIBCXX_3.4.11 FUNC:_ZSt10unexpectedv@@GLIBCXX_3.4 FUNC:_ZSt11_Hash_bytesPKvmm@@CXXABI_1.3.5 +FUNC:_ZSt13get_terminatev@@GLIBCXX_3.4.19 FUNC:_ZSt13set_terminatePFvvE@@GLIBCXX_3.4 FUNC:_ZSt14__convert_to_vIdEvPKcRT_RSt12_Ios_IostateRKP15__locale_struct@@GLIBCXX_3.4 FUNC:_ZSt14__convert_to_vIeEvPKcRT_RSt12_Ios_IostateRKP15__locale_struct@@GLIBCXX_3.4 FUNC:_ZSt14__convert_to_vIfEvPKcRT_RSt12_Ios_IostateRKP15__locale_struct@@GLIBCXX_3.4 FUNC:_ZSt14__convert_to_vIgEvPKcRT_RSt12_Ios_IostateRKP15__locale_struct@@GLIBCXX_LDBL_3.4 +FUNC:_ZSt14get_unexpectedv@@GLIBCXX_3.4.19 FUNC:_ZSt14set_unexpectedPFvvE@@GLIBCXX_3.4 FUNC:_ZSt15_Fnv_hash_bytesPKvmm@@CXXABI_1.3.5 FUNC:_ZSt15future_categoryv@@GLIBCXX_3.4.15 +FUNC:_ZSt15get_new_handlerv@@GLIBCXX_3.4.19 FUNC:_ZSt15set_new_handlerPFvvE@@GLIBCXX_3.4 FUNC:_ZSt15system_categoryv@@GLIBCXX_3.4.11 FUNC:_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@@GLIBCXX_3.4.9 @@ -2678,6 +2690,7 @@ FUNC:__cxa_guard_release@@CXXABI_1.3 FUNC:__cxa_pure_virtual@@CXXABI_1.3 FUNC:__cxa_rethrow@@CXXABI_1.3 +FUNC:__cxa_thread_atexit@@CXXABI_1.3.7 FUNC:__cxa_throw@@CXXABI_1.3 FUNC:__cxa_tm_cleanup@@CXXABI_TM_1 FUNC:__cxa_vec_cctor@@CXXABI_1.3 @@ -2724,6 +2737,7 @@ OBJECT:0:CXXABI_1.3.4 OBJECT:0:CXXABI_1.3.5 OBJECT:0:CXXABI_1.3.6 +OBJECT:0:CXXABI_1.3.7 OBJECT:0:CXXABI_LDBL_1.3 OBJECT:0:CXXABI_TM_1 OBJECT:0:GLIBCXX_3.4 @@ -2736,6 +2750,8 @@ OBJECT:0:GLIBCXX_3.4.15 OBJECT:0:GLIBCXX_3.4.16 OBJECT:0:GLIBCXX_3.4.17 +OBJECT:0:GLIBCXX_3.4.18 +OBJECT:0:GLIBCXX_3.4.19 OBJECT:0:GLIBCXX_3.4.2 OBJECT:0:GLIBCXX_3.4.3 OBJECT:0:GLIBCXX_3.4.4
Re: C: Add new warning -Wunprototyped-calls
On Sat, Apr 6, 2013 at 11:50 PM, Andreas Schwab sch...@linux-m68k.org wrote: Tobias Burnus bur...@net-b.de writes: gcc.dg/Wunprototyped-calls.c:13:3: warning: call to function ‘g’ without a real prototype [-Wunprototyped-calls] What is a real prototype? One reason I didn't bother to upstream that patch is language lawyer legalise ... We want to catch int foo (); int bar (T x) { return foo (x); } int foo (U) { ... } that is, calling foo () from a context where the definition or declaration with argument specification is not visible. This causes the C frontend to apply var-args promotion rules to all arguments which may differ from promotion rules that would be applied when a real prototype was visible at the point of the function call. I'd just say without a prototype. int foo(); or just foo(); is specified as part of 6.7.5.3/14 as The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied. (this appears mostly in KR style programs where the T D ( identifier-list(opt) ) form is valid). I am not sure that GCC doing varargs style promotions to calls with only this kind of declarator is valid or if the program would be rejected by KR (and only the GCC extension of varargs functions without a first named arguments makes us do what we do ...). The patch was implemented while hunting down miscompiles in either X or ghostscript (I don't remember...). Richard. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [patch] update documentation for SEQUENCE
On Sun, Apr 7, 2013 at 12:04 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, The existing documentation for SEQUENCE still states it is used for DEFINE_EXPAND sequences. I think I wasn't even hacking GCC when that practice was abandoned, and in the mean time some other uses of SEQUENCE have appeared in the compiler. So, a long-overdue documentation update. OK for trunk? Ok. Thanks, Richard. Ciao! Steven * doc/rtl.texi (sequence): Rewrite documentation to match the current use of SEQUENCE rtl objects. * rtl.def (SEQUENCE): Likewise. Index: doc/rtl.texi === --- doc/rtl.texi(revision 197532) +++ doc/rtl.texi(working copy) @@ -3099,18 +3099,11 @@ side-effects. @findex sequence @item (sequence [@var{insns} @dots{}]) -Represents a sequence of insns. Each of the @var{insns} that appears -in the vector is suitable for appearing in the chain of insns, so it -must be an @code{insn}, @code{jump_insn}, @code{call_insn}, -@code{code_label}, @code{barrier} or @code{note}. +Represents a sequence of insns. If a @code{sequence} appears in the +chain of insns, then each of the @var{insns} that appears in the sequence +must be suitable for appearing in the chain of insns, i.e. must satisfy +the @code{INSN_P} predicate. -A @code{sequence} RTX is never placed in an actual insn during RTL -generation. It represents the sequence of insns that result from a -@code{define_expand} @emph{before} those insns are passed to -@code{emit_insn} to insert them in the chain of insns. When actually -inserted, the individual sub-insns are separated out and the -@code{sequence} is forgotten. - After delay-slot scheduling is completed, an insn and all the insns that reside in its delay slots are grouped together into a @code{sequence}. The insn requiring the delay slot is the first insn in the vector; @@ -3123,6 +3116,19 @@ the effect of the insns in the delay slots. In su the branch and should be executed only if the branch is taken; otherwise the insn should be executed only if the branch is not taken. @xref{Delay Slots}. + +Some back ends also use @code{sequence} objects for purposes other than +delay-slot groups. This is not supported in the common parts of the +compiler, which treat such sequences as delay-slot groups. + +DWARF2 Call Frame Address (CFA) adjustments are sometimes also expressed +using @code{sequence} objects as the value of a @code{RTX_FRAME_RELATED_P} +note. This only happens if the CFA adjustments cannot be easily derived +from the pattern of the instruction to which the note is attached. In +such cases, the value of the note is used instead of best-guesing the +semantics of the instruction. The back end can attach notes containing +a @code{sequence} of @code{set} patterns that express the effect of the +parent instruction. @end table These expression codes appear in place of a side effect, as the body of Index: rtl.def === --- rtl.def (revision 197533) +++ rtl.def (working copy) @@ -102,10 +102,24 @@ DEF_RTL_EXPR(EXPR_LIST, expr_list, ee, RTX_EXT The insns are represented in print by their uids. */ DEF_RTL_EXPR(INSN_LIST, insn_list, ue, RTX_EXTRA) -/* SEQUENCE appears in the result of a `gen_...' function - for a DEFINE_EXPAND that wants to make several insns. - Its elements are the bodies of the insns that should be made. - `emit_insn' takes the SEQUENCE apart and makes separate insns. */ +/* SEQUENCE is used in late passes of the compiler to group insns for + one reason or another. + + For example, after delay slot filling, branch instructions with filled + delay slots are represented as a SEQUENCE of length 1 + n_delay_slots, + with the branch instruction in XEXPVEC(seq, 0, 0) and the instructions + occupying the delay slots in the remaining XEXPVEC slots. + + Another place where a SEQUENCE may appear, is in REG_FRAME_RELATED_EXPR + notes, to express complex operations that are not obvious from the insn + to which the REG_FRAME_RELATED_EXPR note is attached. In this usage of + SEQUENCE, the sequence vector slots do not hold real instructions but + only pseudo-instructions that can be translated to DWARF CFA expressions. + + Some back ends also use SEQUENCE to group insns in bundles. + + Much of the compiler infrastructure is not prepared to handle SEQUENCE + objects. Only passes after pass_free_cfg are expected to handle them. */ DEF_RTL_EXPR(SEQUENCE, sequence, E, RTX_EXTRA) /* Represents a non-global base address. This is only used in alias.c. */
Re: [RFA][PATCH] Improve VRP of COND_EXPR_CONDs
On Sat, Apr 6, 2013 at 2:48 PM, Jeff Law l...@redhat.com wrote: Given something like this: bb 6: _23 = changed_17 ^ 1; _12 = (_Bool) _23; if (_12 != 0) goto bb 10; else goto bb 7; Assume _23 and changed_17 have integer types wider than a boolean, but VRP has determined they have a range [0..1]. We should be turning that into: bb 6: _23 = changed_17 ^ 1; _12 = (_Bool) _23; if (_23 != 0) goto bb 10; else goto bb 7; Note the change in the conditional. This also makes the statement _12 = (_Bool) _23 dead which should be eliminated by DCE. This kind of thing happens regularly in GCC itself and is fixed by the attached patch. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? commit fd82eea6f208bb12646e0e0e307fb86f043c1649 Author: Jeff Law l...@redhat.com Date: Sat Apr 6 06:46:58 2013 -0600 * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean when the boolean was created by converting a wider object which had a boolean range. * gcc.dg/tree-ssa/vrp87.c: New test diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 44797cc..d34ecde 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,9 @@ 2013-04-06 Jeff Law l...@redhat.com + * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean + when the boolean was created by converting a wider object which + had a boolean range. + * gimple.c (canonicalize_cond_expr_cond): Rewrite x ^ y into x != y. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 601ca66..6ed8af2 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,7 @@ 2013-04-06 Jeff Law l...@redhat.com + * gcc.dg/tree-ssa/vrp87.c: New test + * gcc.dg/tree-ssa/forwprop-25.c: New test 2013-04-03 Jeff Law l...@redhat.com diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c new file mode 100644 index 000..7feff81 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp2-details -fdump-tree-cddce2-details } */ + +struct bitmap_head_def; +typedef struct bitmap_head_def *bitmap; +typedef const struct bitmap_head_def *const_bitmap; + + +typedef unsigned long BITMAP_WORD; +typedef struct bitmap_element_def +{ + struct bitmap_element_def *next; + unsigned int indx; + BITMAP_WORD bits[((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u))]; +} bitmap_element; + + + + + + +typedef struct bitmap_head_def +{ + bitmap_element *first; + +} bitmap_head; + + + +static __inline__ unsigned char +bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt, + bitmap_element * dst_prev, const bitmap_element * a_elt, + const bitmap_element * b_elt, unsigned char changed) +{ + + if (a_elt) +{ + + if (!changed dst_elt) + { + changed = 1; + } +} + else +{ + changed = 1; +} + return changed; +} + +unsigned char +bitmap_ior_into (bitmap a, const_bitmap b) +{ + bitmap_element *a_elt = a-first; + const bitmap_element *b_elt = b-first; + bitmap_element *a_prev = ((void *) 0); + unsigned char changed = 0; + + while (b_elt) +{ + + if (!a_elt || a_elt-indx == b_elt-indx) + changed = bitmap_elt_ior (a, a_elt, a_prev, a_elt, b_elt, changed); + else if (a_elt-indx b_elt-indx) + changed = 1; + b_elt = b_elt-next; + + +} + + return changed; +} + +/* Verify that VRP simplified an if statement. */ +/* { dg-final { scan-tree-dump Folded into: if.* vrp2} } */ +/* Verify that DCE after VRP2 eliminates a dead conversion + to a (Bool). */ +/* { dg-final { scan-tree-dump Deleting.*_Bool.*; cddce2} } */ +/* { dg-final { cleanup-tree-dump vrp2 } } */ +/* { dg-final { cleanup-tree-dump cddce2 } } */ + diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 250a506..d76cead 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -8584,6 +8584,43 @@ simplify_cond_using_ranges (gimple stmt) } } + /* If we have a comparison of a SSA_NAME boolean against + a constant (which obviously must be [0..1]). See if the + SSA_NAME was set by a type conversion where the source + of the conversion is another SSA_NAME with a range [0..1]. + + If so, we can replace the SSA_NAME in the comparison with + the RHS of the conversion. This will often make the type + conversion dead code which DCE will clean up. */ + if (TREE_CODE (op0) == SSA_NAME + TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE Use (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE || (INTEGRAL_TYPE_P (TREE_TYPE (op0)) TYPE_PRECISION (TREE_TYPE (op0)) == 1)) to catch some more cases. + is_gimple_min_invariant (op1)) In
Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.
On Mon, Apr 8, 2013 at 4:47 AM, Zhouyi Zhou zhouzho...@gmail.com wrote: When inline failed because of callee is overwritable, gcc will not report it in dump file (triggered by -fdump-tree-einline) as other not inlinable cases do. This patch correct this. Regtested/bootstrapped on x86_64-linux. Can you trigger this message to show up with -Winline before/after the patch? Can you please add a testcase then? Thanks, Richard. ChangeLog: 2013-04-08 Zhouyi Zhou yizhouz...@ict.ac.cn * cif-code.def (OVERWRITABLE): correct the comment for overwritable function * ipa-inline.c (can_inline_edge_p): let dump mechanism report the inline fail caused by overwritable functions. Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c(revision 197549) +++ gcc/ipa-inline.c(working copy) @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e else if (avail = AVAIL_OVERWRITABLE) { e-inline_failed = CIF_OVERWRITABLE; - return false; + inlinable = false; } else if (e-call_stmt_cannot_inline_p) { Index: gcc/cif-code.def === --- gcc/cif-code.def(revision 197549) +++ gcc/cif-code.def(working copy) @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE, /* Function is not inlinable. */ DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_(function not inlinable)) -/* Function is not overwritable. */ +/* Function is overwritable. */ DEFCIFCODE(OVERWRITABLE, N_(function body can be overwritten at link time)) /* Function is not an inlining candidate. */
Re: [Patch, fortran, 4.9] Use bool type instead gfc_try
Le 08/04/2013 10:34, Tobias Burnus a écrit : Janne Blomqvist wrote: On Thu, Mar 21, 2013 at 11:31 PM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Updated patch which in addition does the above transformations as well. .. and here is the actual patch (thanks Bernhard!) http://gcc.gnu.org/ml/fortran/2013-03/msg00108.html Thanks for the update and sorry for the delay. The patch idea as such is okay. However, the patch isn't. [... formatting problems ...] there is also a SUCCESS_EXIT_CODE changed to true_EXIT_CODE. I think that's unintended. Mikael
Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1
On Fri, Apr 5, 2013 at 2:34 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: Richard, There has been something that has bothered me about you proposal for the storage manager and i think i can now characterize that problem. Say i want to compute the expression (a + b) / c converting from tree values, using wide-int as the engine and then storing the result in a tree. (A very common operation for the various simplifiers in gcc.) in my version of wide-int where there is only the stack allocated fix size allocation for the data, the compiler arranges for 6 instances of wide-int that are statically allocated on the stack when the function is entered. There would be 3 copies of the precision and data to get things started and one allocation variable sized object at the end when the INT_CST is built and one copy to put it back. As i have argued, these copies are of negligible size. In your world, to get things started, you would do 3 pointer copies to get the values out of the tree to set the expression leaves but then you will call the allocator 3 times to get space to hold the intermediate nodes before you get to pointer copy the result back into the result cst which still needs an allocation to build it. I am assuming that we can play the same game at the tree level that we do at the rtl level where we do 1 variable sized allocation to get the entire INT_CST rather than doing 1 fixed sized allocation and 1 variable sized one. even if we take the simpler example of a + b, you still loose. The cost of the extra allocation and it's subsequent recovery is more than my copies. In fact, even in the simplest case of someone going from a HWI thru wide_int into tree, you have 2 allocations vs my 1. Just to clarify, my code wouldn't handle tree a, b, c; tree res = (a + b) / c; transparently. The most complex form of the above that I think would be reasonable to handle would be tree a, b, c; wide_int wires = (wi (a) + b) / c; tree res = build_int_cst (TREE_TYPE (a), wires); and the code as posted would even require you to specify the return type of operator+ and operator/ explicitely like wide_int wires = (wi (a).operator+wi_embed_var (b)).operator/wi_embed_var (c); but as I said I just didn't bother to decide that the return type is always of wide_int variable-len-storage kind. Now, the only real allocation that happens is done by build_int_cst. There is one wide_int on the stack to hold the a + b result and one separate wide_int to hold wires (it's literally written in the code). There are no pointer copies involved in the end - the result from converting a tree to a wide_inttree-storage is the original 'tree' pointer itself, thus a register. I just do not see the cost savings and if there are no cost savings, you certainly cannot say that having these templates is simpler than not having the templates. I think you are missing the point - by abstracting away the storage you don't necessarily need to add the templates. But you open up a very easy route for doing so and you make the operations _trivially_ work on the tree / RTL storage with no overhead in generated code and minimal overhead in the amount of code in GCC itself. In my prototype the overhead of adding 'tree' support is to place class wi_tree_int_cst { tree cst; public: void construct (tree c) { cst = c; } const HOST_WIDE_INT *storage() const { return reinterpret_cast HOST_WIDE_INT *(TREE_INT_CST (cst)); } unsigned len() const { return 2; } }; template class wi_traits tree { public: typedef wide_int wi_tree_int_cst wi_t; wi_traits(tree t) { wi_tree_int_cst ws; ws.construct (t); w.construct (ws); } wi_t* operator-() { return w; } private: wi_t w; }; into tree.h. Richard. Kenny On 04/02/2013 11:04 AM, Richard Biener wrote: On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck zad...@naturalbridge.com wrote: This patch contains a large number of the changes requested by Richi. It does not contain any of the changes that he requested to abstract the storage layer. That suggestion appears to be quite unworkable. I of course took this claim as a challenge ... with the following result. It is of course quite workable ;) The attached patch implements the core wide-int class and three storage models (fixed size for things like plain HWI and double-int, variable size similar to how your wide-int works and an adaptor for the double-int as contained in trees). With that you can now do HOST_WIDE_INT wi_test (tree x) { // template argument deduction doesn't do the magic we want it to do // to make this kind of implicit conversions work // overload resolution considers this kind of conversions so we // need some magic that combines both ... but seeding the overload // set with some instantiations doesn't seem to be possible :/ // wide_int w = x + 1; wide_int w; w += x; w += 1; // template argument
Re: Comments on the suggestion to use infinite precision math for wide int.
On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: Richard, You advocate that I should be using an infinite precision representation and I advocate a finite precision representation where the precision is taken from the context. I would like to make the case for my position here, in a separate thread, because the other thread is just getting too messy. At both the tree level and the rtl level you have a type (mode is just bad rep for types) and both of those explicitly have precisions. The semantics of the programming languages that we implement define, or at least recommend, that most operations be done in a precision that is implementation dependent (or like java a particular machine independent precision). Each hardware platform specifies exactly how every operation is done. I will admit that infinite precision is more esthetically pleasing than what i have done, but exact precision matches the needs of these clients. The problem is that the results from infinite precision arithmetic differ in many significant ways from finite precision math. And the number of places where you have to inject a precision to get the expected answer, ultimately makes the infinite precision representation unattractive. As I said on Thursday, whenever you do operations that do not satisfy the requirements of a mathematical ring (add sub and mul are in a ring, divide, shift and comparisons are not) you run the risk of getting a result that is not what would have been obtained with either a strict interpretation of the semantics or the machine. Intuitively any operation that looks at the bits above the precision does not qualify as an operation that works in a ring. The poster child for operations that do not belong to a ring is division. For my example, I am using 4 bit integers because it makes the examples easy, but similar examples exist for any fixed precision. Consider 8 * 10 / 4 in an infinite precision world the result is 20, but in a 4 bit precision world the answer is 0. another example is to ask if -10 * 10 is less than 0? again you get a different answer with infinite precision. I would argue that if i declare a variable of type uint32 and scale my examples i have the right to expect the compiler to produce the same result as the machine would. While C and C++ may have enough wiggle room in their standards so that this is just an unexpected, but legal, result as opposed to being wrong, everyone will hate you (us) if we do this. Furthermore, Java explicitly does not allow this (not that anyone actually uses gcj). I do not know enough about go, ada and fortran to say how it would effect them. In looking at the double-int class, the only operation that does not fit in a ring that is done properly is shifting. There we explicitly pass in the precision. The reason that we rarely see this kind of problem even though double-int implements 128 bit infinite precision is that currently very little of the compiler actually uses infinite precision in a robust way. In a large number of places, the code looks like: if (TYPE_PRECISION (TREE_TYPE (...)) HOST_BITS_PER_WIDE_INT) do something using inline operators. else either do not do something or use const-double, such code clears out most of these issues before the two passes that embrace infinite precision get a chance to do much damage. However, my patch at the rtl level gets rid of most of this kind of code and replaces it with calls to wide-int that currently uses only operations within the precision. I assume that if i went down the infinite precision road at the tree level, that all of this would come to the surface very quickly. I prefer to not change my rep and not have to deal with this later. Add, subtract, multiply and the logicals are all safe. But divide, remainder, and all of the comparisons need explicit precisions. In addition operations like clz, ctl and clrsb need precisions. In total about half of the functions would need a precision passed in. My point is that once you have to start passing in the precision in for all of those operations, it seems to be cleaner to get the precision from the leaves of the tree as I currently do. Once you buy into the math in a particular precision world, a lot of the other issues that you raise are just settled. Asking how to extend a value beyond it's precision is like asking what the universe was like before the big bang. It is just something you do not need to know. I understand that you would like to have functions like x + 1 work, and so do I. I just could not figure out how to make them have unsurprising semantics. In particular, g++ did not seem to be happy with me defining two plus operators, one for each of signed and unsigned HWIs. It seems like if someone explicitly added a wide_int and an unsigned HWI that they had a right to have the unsigned hwi not be sign
[C++ Patch] PR 56871
Hi, seems an easy issue: we aren't allowing an explicit specializations differing from the template declaration with respect to the constexpr specifier. Tested x86_64-linux. Thanks, Paolo. // /cp 2013-04-08 Paolo Carlini paolo.carl...@oracle.com PR c++/56871 * decl.c (validate_constexpr_redeclaration): Allow an explicit specialization to be different wrt the constexpr specifier. /testsuite 2013-04-08 Paolo Carlini paolo.carl...@oracle.com PR c++/56871 * g++.dg/cpp0x/constexpr-specialization.C: New. Index: cp/decl.c === --- cp/decl.c (revision 197572) +++ cp/decl.c (working copy) @@ -1203,6 +1203,14 @@ validate_constexpr_redeclaration (tree old_decl, t = DECL_DECLARED_CONSTEXPR_P (new_decl); return true; } + /* 7.1.5 [dcl.constexpr] + Note: An explicit specialization can differ from the template + declaration with respect to the constexpr specifier. */ + if (TREE_CODE (old_decl) == FUNCTION_DECL + TREE_CODE (new_decl) == FUNCTION_DECL + ! DECL_TEMPLATE_SPECIALIZATION (old_decl) + DECL_TEMPLATE_SPECIALIZATION (new_decl)) +return true; error (redeclaration %qD differs in %constexpr%, new_decl); error (from previous declaration %q+D, old_decl); return false; Index: testsuite/g++.dg/cpp0x/constexpr-specialization.C === --- testsuite/g++.dg/cpp0x/constexpr-specialization.C (revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-specialization.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/56871 +// { dg-options -std=c++11 } + +templatetypename T constexpr int foo(T); +template int foo(int); +template int foo(int);// { dg-error previous } +template constexpr int foo(int); // { dg-error redeclaration } + +templatetypename T int bar(T); +template constexpr int bar(int); +template constexpr int bar(int); // { dg-error previous } +template int bar(int);// { dg-error redeclaration }
Re: [patch, AVR] Add new ATmega*RFR* devices
As Georg-Johann Lay wrote: Joerg Wunsch wrote: The attached patch adds the new ATmega*RFR* devices to AVR-GCC. [...] Supply the auto generated files, too. Cf. t-avr, avr-mcus.def etc. OK, thanks for the reminder. Here is the updated patch. -- Joerg Wunsch * Development engineer, Dresden, Germany Atmel Automotive GmbH, Theresienstrasse 2, D-74027 Heilbronn Geschaeftsfuehrung: Steven A. Laub, Stephen Cumming Amtsgericht Stuttgart, Registration HRB 106594 ChangeLog entry: 2013-04-08 Joerg Wunsch joerg.wun...@atmel.com * gcc/config/avr/avr-mcus.def: Add ATmega644RFR2, ATmega128RFR2, ATmega1284RFR2, ATmega256RFR2, ATmega2564RFR2; remove non-existent ATmega64RFA2 * gcc/doc/avr-mmcu.texi: Regenerate. * gcc/config/avr/avr-tables.opt: Regenerate. * gcc/config/avr/t-multilib: Regenerate. Index: gcc/config/avr/avr-mcus.def === --- gcc/config/avr/avr-mcus.def (Revision 197562) +++ gcc/config/avr/avr-mcus.def (Arbeitskopie) @@ -229,8 +229,8 @@ AVR_MCU (atmega64c1, ARCH_AVR5, __AVR_ATmega64C1__,0, 0, 0x0100, 1, m64c1) AVR_MCU (atmega64m1, ARCH_AVR5, __AVR_ATmega64M1__,0, 0, 0x0100, 1, m64m1) AVR_MCU (atmega64hve, ARCH_AVR5, __AVR_ATmega64HVE__, 0, 0, 0x0100, 1, m64hve) -AVR_MCU (atmega64rfa2, ARCH_AVR5, __AVR_ATmega64RFA2__, 0, 0, 0x0200, 1, m64rfa2) AVR_MCU (atmega64rfr2, ARCH_AVR5, __AVR_ATmega64RFR2__, 0, 0, 0x0200, 1, m64rfr2) +AVR_MCU (atmega644rfr2,ARCH_AVR5, __AVR_ATmega644RFR2__, 0, 0, 0x0200, 1, m644rfr2) AVR_MCU (atmega32hvb, ARCH_AVR5, __AVR_ATmega32HVB__, 0, 0, 0x0100, 1, m32hvb) AVR_MCU (atmega32hvbrevb, ARCH_AVR5, __AVR_ATmega32HVBREVB__, 0, 0, 0x0100, 1, m32hvbrevb) AVR_MCU (atmega16hva2, ARCH_AVR5, __AVR_ATmega16HVA2__, 0, 0, 0x0100, 1, m16hva2) @@ -262,6 +262,8 @@ AVR_MCU (atmega1284, ARCH_AVR51, __AVR_ATmega1284__, 0, 0, 0x0100, 2, m1284) AVR_MCU (atmega1284p, ARCH_AVR51, __AVR_ATmega1284P__, 0, 0, 0x0100, 2, m1284p) AVR_MCU (atmega128rfa1,ARCH_AVR51, __AVR_ATmega128RFA1__,0, 0, 0x0200, 2, m128rfa1) +AVR_MCU (atmega128rfr2,ARCH_AVR51, __AVR_ATmega128RFR2__,0, 0, 0x0200, 2, m128rfr2) +AVR_MCU (atmega1284rfr2, ARCH_AVR51, __AVR_ATmega1284RFR2__, 0, 0, 0x0200, 2, m1284rfr2) AVR_MCU (at90can128, ARCH_AVR51, __AVR_AT90CAN128__, 0, 0, 0x0100, 2, can128) AVR_MCU (at90usb1286, ARCH_AVR51, __AVR_AT90USB1286__, 0, 0, 0x0100, 2, usb1286) AVR_MCU (at90usb1287, ARCH_AVR51, __AVR_AT90USB1287__, 0, 0, 0x0100, 2, usb1287) @@ -269,6 +271,8 @@ AVR_MCU (avr6, ARCH_AVR6, NULL,0, 0, 0x0200, 4, m2561) AVR_MCU (atmega2560, ARCH_AVR6, __AVR_ATmega2560__,0, 0, 0x0200, 4, m2560) AVR_MCU (atmega2561, ARCH_AVR6, __AVR_ATmega2561__,0, 0, 0x0200, 4, m2561) +AVR_MCU (atmega256rfr2,ARCH_AVR6, __AVR_ATmega256RFR2__, 0, 0, 0x0200, 4, m256rfr2) +AVR_MCU (atmega2564rfr2, ARCH_AVR6, __AVR_ATmega2564RFR2__,0, 0, 0x0200, 4, m2564rfr2) /* Xmega, 16K = Flash 64K, RAM = 64K */ AVR_MCU (avrxmega2,ARCH_AVRXMEGA2, NULL, 0, 0, 0x2000, 1, x32a4) AVR_MCU (atxmega16a4, ARCH_AVRXMEGA2, __AVR_ATxmega16A4__, 0, 0, 0x2000, 1, x16a4) Index: gcc/doc/avr-mmcu.texi === --- gcc/doc/avr-mmcu.texi (Revision 197562) +++ gcc/doc/avr-mmcu.texi (Arbeitskopie) @@ -38,15 +38,15 @@ @item avr5 ``Enhanced'' devices with 16@tie{}KiB up to 64@tie{}KiB of program memory. -@*@var{mcu}@tie{}= @code{ata5790}, @code{ata5790n}, @code{ata5795}, @code{atmega16}, @code{atmega16a}, @code{atmega16hva}, @code{atmega16hva}, @code{atmega16hva2}, @code{atmega16hva2}, @code{atmega16hvb}, @code{atmega16hvb}, @code{atmega16hvbrevb}, @code{atmega16m1}, @code{atmega16m1}, @code{atmega16u4}, @code{atmega16u4}, @code{atmega161}, @code{atmega162}, @code{atmega163}, @code{atmega164a}, @code{atmega164p}, @code{atmega164pa}, @code{atmega165}, @code{atmega165a}, @code{atmega165p}, @code{atmega165pa}, @code{atmega168}, @code{atmega168a}, @code{atmega168p}, @code{atmega168pa}, @code{atmega169}, @code{atmega169a}, @code{atmega169p}, @code{atmega169pa}, @code{atmega26hvg}, @code{atmega32}, @code{atmega32a}, @code{atmega32a}, @code{atmega32c1}, @code{atmega32c1}, @code{atmega32hvb}, @code{atmega32hvb}, @code{atmega32hvbrevb}, @code{atmega32m1}, @code{atmega32m1}, @code{atmega32u4}, @code{atmega32u4}, @code{atmega32u6}, @code{atmega32u6}, @code{atmega323}, @code{atmega324a}, @code{atmega324p}, @code{atmega324pa}, @code{atmega325}, @code{atmega325a}, @code{atmega325p}, @code{atmega3250}, @code{atmega3250a}, @code{atmega3250p}, @code{atmega3250pa}, @code{atmega328},
[PATCH] Fix PR48762
This patch prevents two Invalid read of size 8 and one Invalid write of size 8 warnings when cc1 is run under valgrind. What happens here is that we firstly allocate 0B ebb_data.path = XNEWVEC (struct branch_path, PARAM_VALUE (PARAM_MAX_CSE_PATH_LENGTH)); (in fact, XNEWVEC always allocates at least 1B--but still it's not enough), then in cse_find_path we have (path_size is 0) if (path_size == 0) data-path[path_size++].bb = first_bb; so we immediately have invalid write and moreover path_size increments, thus we call cse_find_path again, then we get the invalid reads. So fixed by guarding the write with PARAM_MAX_CSE_PATH_LENGTH 0. Alternatively, we can bump the minimum of that param, as usual ;) Bootstrapped/regtested on x86_64-linux, ok for trunk/4.8? 2013-04-08 Marek Polacek pola...@redhat.com PR tree-optimization/48762 * cse.c (cse_find_path): Require PARAM_MAX_CSE_PATH_LENGTH be 0. --- gcc/cse.c.mp2013-04-08 13:19:15.082670099 +0200 +++ gcc/cse.c 2013-04-08 13:19:29.014713914 +0200 @@ -6166,7 +6166,7 @@ cse_find_path (basic_block first_bb, str } /* If the path was empty from the beginning, construct a new path. */ - if (path_size == 0) + if (path_size == 0 PARAM_VALUE (PARAM_MAX_CSE_PATH_LENGTH) 0) data-path[path_size++].bb = first_bb; else { Marek
[build] Use -z ignore instead of --as-needed on Solaris
While the Solaris linker doesn't support the --as-needed/--no-as-needed options (yet), it long has provided the equivalent -z ignore/-z record options. This patch makes use of them, avoiding unnecessary dependencies on libgcc_s.so.1. Bootstrapped without regressions on i386-pc-solaris2.11 (and checking that many dependencies on libgcc_s.so.1 in runtime libraries are gone that were flagged as unused by ldd -u) and x86_64-unknown-linux-gnu (gcc/specs unchanged, make check still running). Ok for mainline if it passes? Thanks. Rainer 2013-04-05 Rainer Orth r...@cebitec.uni-bielefeld.de * configure.ac (gcc_cv_ld_as_needed): Set gcc_cv_ld_as_needed_option, gcc_cv_no_as_needed_option. Use -z ignore, -z record on *-*-solaris2*. (HAVE_LD_AS_NEEDED): Update comment. (LD_AS_NEEDED_OPTION, LD_NO_AS_NEEDED_OPTION): Define. * configure: Regenerate. * config.in: Regenerate. * gcc.c (init_gcc_specs) [USE_LD_AS_NEEDED]: Use LD_AS_NEEDED_OPTION, LD_NO_AS_NEEDED_OPTION. * config/sol2.h [HAVE_LD_AS_NEEDED] (USE_LD_AS_NEEDED): Define. * doc/tm.texi.in (USE_LD_AS_NEEDED): Allow for --as-needed equivalents. Fix markup. * doc/tm.texi: Regenerate. # HG changeset patch # Parent 602ad5b6c5e29819082e386836c33220c78ae4b7 Use -z ignore instead of --as-needed on Solaris diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -181,6 +181,11 @@ along with GCC; see the file COPYING3. %(link_arch) \ %{Qy:} %{!Qn:-Qy} +/* Use --as-needed/-z ignore -lgcc_s for eh support. */ +#ifdef HAVE_LD_AS_NEEDED +#define USE_LD_AS_NEEDED 1 +#endif + #ifdef USE_GLD /* Solaris 11 build 135+ implements dl_iterate_phdr. GNU ld needs --eh-frame-hdr to create the required .eh_frame_hdr sections. */ diff --git a/gcc/configure.ac b/gcc/configure.ac --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4538,6 +4538,8 @@ AC_MSG_RESULT($gcc_cv_ld_eh_gc_sections_ AC_CACHE_CHECK(linker --as-needed support, gcc_cv_ld_as_needed, [gcc_cv_ld_as_needed=no +gcc_cv_ld_as_needed_option='--as-needed' +gcc_cv_ld_no_as_needed_option='--no-as-needed' if test $in_tree_ld = yes ; then if test $gcc_cv_gld_major_version -eq 2 -a $gcc_cv_gld_minor_version -ge 16 -o $gcc_cv_gld_major_version -gt 2 \ test $in_tree_ld_is_elf = yes; then @@ -4547,12 +4549,25 @@ elif test x$gcc_cv_ld != x; then # Check if linker supports --as-needed and --no-as-needed options if $gcc_cv_ld --help 2/dev/null | grep as-needed /dev/null; then gcc_cv_ld_as_needed=yes + else + case $target in + # Solaris 2 ld always supports -z ignore/-z record. + *-*-solaris2*) + gcc_cv_ld_as_needed=yes + gcc_cv_ld_as_needed_option=-z ignore + gcc_cv_ld_no_as_needed_option=-z record + ;; + esac fi fi ]) if test x$gcc_cv_ld_as_needed = xyes; then AC_DEFINE(HAVE_LD_AS_NEEDED, 1, -[Define if your linker supports --as-needed and --no-as-needed options.]) +[Define if your linker supports --as-needed/--no-as-needed or equivalent options.]) + AC_DEFINE_UNQUOTED(LD_AS_NEEDED_OPTION, $gcc_cv_ld_as_needed_option, +[Define to the linker option to ignore unused dependencies.]) + AC_DEFINE_UNQUOTED(LD_NO_AS_NEEDED_OPTION, $gcc_cv_ld_no_as_needed_option, +[Define to the linker option to keep unused dependencies.]) fi case $target:$tm_file in diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -260,7 +260,8 @@ line, but, unlike @code{LIBGCC_SPEC}, it @defmac USE_LD_AS_NEEDED A macro that controls the modifications to @code{LIBGCC_SPEC} mentioned in @code{REAL_LIBGCC_SPEC}. If nonzero, a spec will be -generated that uses --as-needed and the shared libgcc in place of the +generated that uses @option{--as-needed} or equivalent options and the +shared @file{libgcc} in place of the static exception handler library, when linking without any of @code{-static}, @code{-static-libgcc}, or @code{-shared-libgcc}. @end defmac diff --git a/gcc/gcc.c b/gcc/gcc.c --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -1361,7 +1361,8 @@ init_gcc_specs (struct obstack *obstack, %{!static:%{!static-libgcc: #if USE_LD_AS_NEEDED %{!shared-libgcc:, - static_name, --as-needed , shared_name, --no-as-needed + static_name, LD_AS_NEEDED_OPTION , + shared_name, LD_NO_AS_NEEDED_OPTION } %{shared-libgcc:, shared_name, %{!shared: , static_name, } -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, ARM, iWMMXT] PR target/54338 - Include IWMMXT_GR_REGS in ALL_REGS
ChangeLog 2013-04-02 Xinyu Qi x...@marvell.com PR target/54338 * config/arm/arm.h (REG_CLASS_CONTENTS): Include IWMMXT_GR_REGS in ALL_REGS. Thanks, Xinyu Thanks now applied to trunk. For the future please consider creating patches at the top level directory. Makes it easier for application by someone else :) . regards Ramana
[PATCH] Assorted dump/debug fixes for the vectorizer
The following avoids the excessive verboseness of get_vectype_* and leaves better traces of the original stmt in the vectorizer temporary names by preserving their SSA name version. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-04-08 Richard Biener rguent...@suse.de * gimple-pretty-print.c (debug_gimple_stmt): Do not print extra newline. * tree-vect-loop.c (vect_determine_vectorization_factor): Dump determined vector type. (vect_analyze_data_refs): Likewise. (vect_get_new_vect_var): Adjust. (vect_create_destination_var): Preserve SSA name versions. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Do not dump anything here. * gfortran.dg/vect/fast-math-mgrid-resid.f: Adjust. Index: gcc/gimple-pretty-print.c === --- gcc/gimple-pretty-print.c (revision 197486) +++ gcc/gimple-pretty-print.c (working copy) @@ -84,7 +84,6 @@ DEBUG_FUNCTION void debug_gimple_stmt (gimple gs) { print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS); - fprintf (stderr, \n); } Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 197486) +++ gcc/tree-vect-loop.c(working copy) @@ -409,6 +409,12 @@ vect_determine_vectorization_factor (loo } STMT_VINFO_VECTYPE (stmt_info) = vectype; + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, vectype: ); + dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype); + } } /* The vectorization factor is according to the smallest Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 197486) +++ gcc/tree-vect-data-refs.c (working copy) @@ -3206,6 +3206,17 @@ vect_analyze_data_refs (loop_vec_info lo } return false; } + else + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + got vectype for stmt: ); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_generic_expr (MSG_NOTE, TDF_SLIM, +STMT_VINFO_VECTYPE (stmt_info)); + } + } /* Adjust the minimal vectorization factor according to the vector type. */ @@ -3293,13 +3304,13 @@ vect_get_new_vect_var (tree type, enum v switch (var_kind) { case vect_simple_var: -prefix = vect_; +prefix = vect; break; case vect_scalar_var: -prefix = stmp_; +prefix = stmp; break; case vect_pointer_var: -prefix = vect_p; +prefix = vectp; break; default: gcc_unreachable (); @@ -3307,7 +3318,7 @@ vect_get_new_vect_var (tree type, enum v if (name) { - char* tmp = concat (prefix, name, NULL); + char* tmp = concat (prefix, _, name, NULL); new_vect_var = create_tmp_reg (type, tmp); free (tmp); } @@ -3836,7 +3847,8 @@ tree vect_create_destination_var (tree scalar_dest, tree vectype) { tree vec_dest; - const char *new_name; + const char *name; + char *new_name; tree type; enum vect_var_kind kind; @@ -3845,10 +3857,13 @@ vect_create_destination_var (tree scalar gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); - new_name = get_name (scalar_dest); - if (!new_name) -new_name = var_; + name = get_name (scalar_dest); + if (name) +asprintf (new_name, %s_%u, name, SSA_NAME_VERSION (scalar_dest)); + else +asprintf (new_name, _%u, SSA_NAME_VERSION (scalar_dest)); vec_dest = vect_get_new_vect_var (type, kind, new_name); + free (new_name); return vec_dest; } Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 197486) +++ gcc/tree-vect-stmts.c (working copy) @@ -6094,30 +6094,10 @@ get_vectype_for_scalar_type_and_size (tr return NULL_TREE; vectype = build_vector_type (scalar_type, nunits); - if (dump_enabled_p ()) -{ - dump_printf_loc (MSG_NOTE, vect_location, - get vectype with %d units of type , nunits); - dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); -} - - if (!vectype) -return NULL_TREE; - - if (dump_enabled_p ()) -{ - dump_printf_loc (MSG_NOTE, vect_location, vectype: ); - dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype); -} if (!VECTOR_MODE_P (TYPE_MODE (vectype)) !INTEGRAL_MODE_P (TYPE_MODE (vectype))) -{ - if (dump_enabled_p ()) -dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - mode not supported by target.); - return NULL_TREE; -} +return
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 06:46 AM, Richard Biener wrote: On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: Richard, You advocate that I should be using an infinite precision representation and I advocate a finite precision representation where the precision is taken from the context. I would like to make the case for my position here, in a separate thread, because the other thread is just getting too messy. At both the tree level and the rtl level you have a type (mode is just bad rep for types) and both of those explicitly have precisions. The semantics of the programming languages that we implement define, or at least recommend, that most operations be done in a precision that is implementation dependent (or like java a particular machine independent precision). Each hardware platform specifies exactly how every operation is done. I will admit that infinite precision is more esthetically pleasing than what i have done, but exact precision matches the needs of these clients. The problem is that the results from infinite precision arithmetic differ in many significant ways from finite precision math. And the number of places where you have to inject a precision to get the expected answer, ultimately makes the infinite precision representation unattractive. As I said on Thursday, whenever you do operations that do not satisfy the requirements of a mathematical ring (add sub and mul are in a ring, divide, shift and comparisons are not) you run the risk of getting a result that is not what would have been obtained with either a strict interpretation of the semantics or the machine. Intuitively any operation that looks at the bits above the precision does not qualify as an operation that works in a ring. The poster child for operations that do not belong to a ring is division. For my example, I am using 4 bit integers because it makes the examples easy, but similar examples exist for any fixed precision. Consider 8 * 10 / 4 in an infinite precision world the result is 20, but in a 4 bit precision world the answer is 0. another example is to ask if -10 * 10 is less than 0? again you get a different answer with infinite precision. I would argue that if i declare a variable of type uint32 and scale my examples i have the right to expect the compiler to produce the same result as the machine would. While C and C++ may have enough wiggle room in their standards so that this is just an unexpected, but legal, result as opposed to being wrong, everyone will hate you (us) if we do this. Furthermore, Java explicitly does not allow this (not that anyone actually uses gcj). I do not know enough about go, ada and fortran to say how it would effect them. In looking at the double-int class, the only operation that does not fit in a ring that is done properly is shifting. There we explicitly pass in the precision. The reason that we rarely see this kind of problem even though double-int implements 128 bit infinite precision is that currently very little of the compiler actually uses infinite precision in a robust way. In a large number of places, the code looks like: if (TYPE_PRECISION (TREE_TYPE (...)) HOST_BITS_PER_WIDE_INT) do something using inline operators. else either do not do something or use const-double, such code clears out most of these issues before the two passes that embrace infinite precision get a chance to do much damage. However, my patch at the rtl level gets rid of most of this kind of code and replaces it with calls to wide-int that currently uses only operations within the precision. I assume that if i went down the infinite precision road at the tree level, that all of this would come to the surface very quickly. I prefer to not change my rep and not have to deal with this later. Add, subtract, multiply and the logicals are all safe. But divide, remainder, and all of the comparisons need explicit precisions. In addition operations like clz, ctl and clrsb need precisions. In total about half of the functions would need a precision passed in. My point is that once you have to start passing in the precision in for all of those operations, it seems to be cleaner to get the precision from the leaves of the tree as I currently do. Once you buy into the math in a particular precision world, a lot of the other issues that you raise are just settled. Asking how to extend a value beyond it's precision is like asking what the universe was like before the big bang. It is just something you do not need to know. I understand that you would like to have functions like x + 1 work, and so do I. I just could not figure out how to make them have unsurprising semantics. In particular, g++ did not seem to be happy with me defining two plus operators, one for each of signed and unsigned HWIs. It seems like if someone explicitly added a wide_int and an unsigned HWI that they had a right to have the unsigned hwi not be sign extended. But if you can show
[PATCH][ARM] Improve code generation for anddi3
Hi all, When compiling: unsigned long long muld (unsigned long long X, unsigned long long Y) { unsigned long long mask = 0xull; return (X mask) * (Y mask); } we get a suboptimal sequence: stmfd sp!, {r4, r5} mvn r4, #0 mov r5, #0 and r0, r0, r4 and r3, r3, r5 and r1, r1, r5 and r2, r2, r4 mul r3, r0, r3 mla r3, r2, r1, r3 umull r0, r1, r0, r2 ldmfd sp!, {r4, r5} add r1, r3, r1 bx lr This patch improves that situation by changing the anddi3 insn into an insn_and_split and simplifying the SImode ands. Also, the NEON version is merged with the non-NEON one. This allows us to generate just: umull r0, r1, r2, r0 bx lr for the above code. Regtested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill gcc/ChangeLog 2013-04-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (const_ok_for_dimode_op): Handle AND case. * config/arm/arm.md (*anddi3_insn): Change to insn_and_split. * config/arm/constraints.md (De): New constraint. * config/arm/neon.md (anddi3_neon): Delete. (neon_vandmode): Expand to standard anddi3 pattern. * config/arm/predicates.md (imm_for_neon_inv_logic_operand): Move earlier in the file. (neon_inv_logic_op2): Likewise. (arm_anddi_operand_neon): New predicate. gcc/testsuite/ChangeLog 2013-04-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/anddi3-opt.c: New test. * gcc.target/arm/anddi3-opt2.c: Likewise. anddi3_new.patch Description: Binary data
Re: Comments on the suggestion to use infinite precision math for wide int.
It may be interesting to look at what we have done in Ada with regard to overflow in intermediate expressions. Briefly we allow specification of three modes all intermediate arithmetic is done in the base type, with overflow signalled if an intermediate value is outside this range. all intermediate arithmetic is done in the widest integer type, with overflow signalled if an intermediate value is outside this range. all intermediate arithmetic uses an infinite precision arithmetic package built for this purpose. In the second and third cases we do range analysis that allows smaller intermediate precision if we know it's safe. We also allow separate specification of the mode inside and outside assertions (e.g. preconditions and postconditions) since in the latter you often want to regard integers as mathematical, not subject to intermediate overflow.
Re: C: Add new warning -Wunprototyped-calls
Richard Biener richard.guent...@gmail.com writes: when a real prototype was visible How is that different from a prototype? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: C: Add new warning -Wunprototyped-calls
On Mon, Apr 8, 2013 at 3:05 PM, Andreas Schwab sch...@linux-m68k.org wrote: Richard Biener richard.guent...@gmail.com writes: when a real prototype was visible How is that different from a prototype? It's different from the case where a KR definition was seen and thus type information is present via that mechanism. We don't want to warn in that case. As I suggested, the warning should just print without a prototype but prototype here means that a definition before the call is enough to make us happy (as opposed to -Wstrict-prototypes which warns about function definitions without a previous prototype we want to warn about calls to functions without a definition or a prototype). Any better suggestion? Richard. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 04:56 AM, Florian Weimer wrote: On 04/07/2013 07:16 PM, Kenneth Zadeck wrote: The poster child for operations that do not belong to a ring is division. For my example, I am using 4 bit integers because it makes the examples easy, but similar examples exist for any fixed precision. Consider 8 * 10 / 4 in an infinite precision world the result is 20, but in a 4 bit precision world the answer is 0. I think you mean 4 instead of 20. oops another example is to ask if -10 * 10 is less than 0? again you get a different answer with infinite precision. Actually, for C/C++ ,you don't—because of undefined signed overflow (at least with default compiler flags). But similar examples with unsigned types exist, so this point isn't too relevant. I would argue that if i declare a variable of type uint32 and scale my examples i have the right to expect the compiler to produce the same result as the machine would. In my very, very limited experience, the signed/unsigned mismatch is more confusing. With infinite precision, this confusion would not arise (but adjustment would be needed to get limited-precision results, as you write). With finite precision, you either need separate types for signed/unsigned, or separate operations. I come from a world where people write code where they expect full control of the horizon and vertical when they program. Hank Warren, the author of Hacker Delight is in my group and a huge number of those tricks require understanding what is going on in the machine. If the compiler decides that it wants to do things differently, you are dead. While C and C++ may have enough wiggle room in their standards so that this is just an unexpected, but legal, result as opposed to being wrong, everyone will hate you (us) if we do this. Furthermore, Java explicitly does not allow this (not that anyone actually uses gcj). I do not know enough about go, Go specified two's-complement signed arithmetic and does not automatically promote to int (i.e., it performs arithmetic in the type, and mixed arguments are not supported). Go constant arithmetic is infinite precision. ada and fortran to say how it would effect them. Ada requires trapping arithmetic for signed integers. Currently, this is implemented in the front end. Arithmetic happens in the base range of a type (which is symmetric around zero and chosen to correspond to a machine type). Ada allows omitting intermediate overflow checks as long as you produce the infinite precision result (or raise an overflow exception). I think this applies to Ada constant arithmetic as well. (GNAT has a mode where comparisons are computed with infinite precision, which is extremely useful for writing bounds checking code.) Considering the range of different arithmetic operations we need to support, I'm not convinced that the ring model is appropriate. I will answer this in Robert's email.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 9:15 AM, Kenneth Zadeck wrote: I think this applies to Ada constant arithmetic as well. Ada constant arithmetic (at compile time) is always infinite precision (for float as well as for integer).
Re: RFC: color diagnostics markers
On Fri, Apr 05, 2013 at 11:51:43PM +0200, Manuel López-Ibáñez wrote: In this patch the default is never, because for some reason auto triggers colorization during regression testing. I have not found a That reason is obvious, dejagnu (expect?) creates pseudo terminals, so isatty is true, we'd need to just use -fno-diagnostics-color by default for the testsuite (IMHO not a big deal). Anyway, I've kept the default as never for now, but am sending my review comments in form of a new diff, which fixes formatting, avoids memory leaks and changes it to introduce more color names (for caret, locus, quoted text), change default of note color (for some color compatibility with clang, bold green is there used for caret lines, for notes they use bold black apparently, but that doesn't work too well on white-on-black terminals). Right now the patch is unfinished, because there is no support for the new %[locus]%s:%d:%d%[] style diagnostics strings (where %[locus] and %[] stand for switching to locus color and resetting color %back) in the -Wformat code (and gettext). I'm wondering if instead of the %[colorname] and %[] it wouldn't be better to just have some %r or whatever letter isn't taken yet which would consume a const char * colorname from %va_arg, and some other letter with no argument that would do color reset. Ideas for best unused letters for that? Perhaps then -Wformat support for it would be easier. I.e. instead of: pp_printf (%[locus]%s:%d:%d[], loc.file, loc.line, loc.column); one would write: pp_printf (%r%s:%d:%d%R, locus, loc.file, loc.line, loc.column); Jakub --- gcc/opts.c.jj 2013-03-05 07:00:46.847494476 +0100 +++ gcc/opts.c 2013-04-08 14:29:20.592412422 +0200 @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. #include flags.h #include params.h #include diagnostic.h +#include diagnostic-color.h #include opts-diagnostic.h #include insn-attr-common.h #include common/common-target.h @@ -1497,6 +1498,11 @@ common_handle_option (struct gcc_options dc-show_caret = value; break; +case OPT_fdiagnostics_color_: + pp_show_color (dc-printer) + = colorize_init ((diagnostic_color_rule_t) value); + break; + case OPT_fdiagnostics_show_option: dc-show_option_requested = value; break; --- gcc/Makefile.in.jj 2013-04-04 15:03:29.285380160 +0200 +++ gcc/Makefile.in 2013-04-08 14:44:47.076155748 +0200 @@ -1465,7 +1465,7 @@ OBJS = \ # Objects in libcommon.a, potentially used by all host binaries and with # no target dependencies. -OBJS-libcommon = diagnostic.o pretty-print.o intl.o input.o version.o +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o input.o version.o # Objects in libcommon-target.a, used by drivers and by the core # compiler and containing target-dependent code. @@ -2668,11 +2668,12 @@ fold-const.o : fold-const.c $(CONFIG_H) $(GIMPLE_H) realmpfr.h $(TREE_FLOW_H) diagnostic.o : diagnostic.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ version.h $(DEMANGLE_H) $(INPUT_H) intl.h $(BACKTRACE_H) $(DIAGNOSTIC_H) \ - diagnostic.def + diagnostic.def diagnostic-color.h +diagnostic-color.o : diagnostic-color.c $(CONFIG_H) $(SYSTEM_H) diagnostic-color.h opts.o : opts.c $(OPTS_H) $(OPTIONS_H) $(DIAGNOSTIC_CORE_H) $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(DUMPFILE_H) $(TM_H) \ $(DIAGNOSTIC_H) insn-attr-common.h intl.h $(COMMON_TARGET_H) \ - $(FLAGS_H) $(PARAMS_H) opts-diagnostic.h + $(FLAGS_H) $(PARAMS_H) opts-diagnostic.h diagnostic-color.h opts-global.o : opts-global.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(DIAGNOSTIC_H) $(OPTS_H) $(FLAGS_H) $(GGC_H) $(TREE_H) langhooks.h \ $(TM_H) $(RTL_H) $(DBGCNT_H) debug.h $(LTO_STREAMER_H) output.h \ @@ -3434,7 +3435,8 @@ params.o : params.c $(CONFIG_H) $(SYSTEM $(PARAMS_H) $(DIAGNOSTIC_CORE_H) pointer-set.o: pointer-set.c pointer-set.h $(CONFIG_H) $(SYSTEM_H) hooks.o: hooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(HOOKS_H) -pretty-print.o: $(CONFIG_H) $(SYSTEM_H) coretypes.h intl.h $(PRETTY_PRINT_H) +pretty-print.o: $(CONFIG_H) $(SYSTEM_H) coretypes.h intl.h $(PRETTY_PRINT_H) \ + diagnostic-color.h errors.o : errors.c $(CONFIG_H) $(SYSTEM_H) errors.h dbgcnt.o: dbgcnt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DUMPFILE_H) \ $(DIAGNOSTIC_CORE_H) $(DBGCNT_H) --- gcc/common.opt.jj 2013-04-04 15:03:29.285380160 +0200 +++ gcc/common.opt 2013-04-08 11:32:33.438159412 +0200 @@ -1028,6 +1028,30 @@ fdiagnostics-show-caret Common Var(flag_diagnostics_show_caret) Init(1) Show the source line with a caret indicating the column +fdiagnostics-color +Common Alias(fdiagnostics-color=,always,never) +; + +fdiagnostics-color= +Common Joined RejectNegative Enum(diagnostic_color_rule) +-fdiagnostics-color=[never|always|auto]Colorize diagnostics + +; Required for these enum values. +SourceInclude +diagnostic-color.h + +Enum +Name(diagnostic_color_rule) Type(int) + +EnumValue +Enum(diagnostic_color_rule)
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 09:19 AM, Robert Dewar wrote: On 4/8/2013 9:15 AM, Kenneth Zadeck wrote: I think this applies to Ada constant arithmetic as well. Ada constant arithmetic (at compile time) is always infinite precision (for float as well as for integer). What do you mean when you say constant arithmetic?Do you mean places where there is an explicit 8 * 6 in the source or do you mean any arithmetic that a compiler, using the full power of interprocedural constant propagation can discover?
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 09:03 AM, Robert Dewar wrote: It may be interesting to look at what we have done in Ada with regard to overflow in intermediate expressions. Briefly we allow specification of three modes all intermediate arithmetic is done in the base type, with overflow signalled if an intermediate value is outside this range. all intermediate arithmetic is done in the widest integer type, with overflow signalled if an intermediate value is outside this range. all intermediate arithmetic uses an infinite precision arithmetic package built for this purpose. In the second and third cases we do range analysis that allows smaller intermediate precision if we know it's safe. We also allow separate specification of the mode inside and outside assertions (e.g. preconditions and postconditions) since in the latter you often want to regard integers as mathematical, not subject to intermediate overflow. So then how does a language like ada work in gcc? My assumption is that most of what you describe here is done in the front end and by the time you get to the middle end of the compiler, you have chosen types for which you are comfortable to have any remaining math done in along with explicit checks for overflow where the programmer asked for them. Otherwise, how could ada have ever worked with gcc? kenny
Re: [RFA][PATCH] Improve VRP of COND_EXPR_CONDs
On 04/08/2013 03:45 AM, Richard Biener wrote: @@ -8584,6 +8584,43 @@ simplify_cond_using_ranges (gimple stmt) } } + /* If we have a comparison of a SSA_NAME boolean against + a constant (which obviously must be [0..1]). See if the + SSA_NAME was set by a type conversion where the source + of the conversion is another SSA_NAME with a range [0..1]. + + If so, we can replace the SSA_NAME in the comparison with + the RHS of the conversion. This will often make the type + conversion dead code which DCE will clean up. */ + if (TREE_CODE (op0) == SSA_NAME + TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE Use (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE || (INTEGRAL_TYPE_P (TREE_TYPE (op0)) TYPE_PRECISION (TREE_TYPE (op0)) == 1)) to catch some more cases. Good catch. Done. + is_gimple_min_invariant (op1)) In this case it's simpler to test TREE_CODE (op1) == INTEGER_CST. Agreed fixed. +{ + gimple def_stmt = SSA_NAME_DEF_STMT (op0); + tree innerop; + + if (!is_gimple_assign (def_stmt) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + return false; + + innerop = gimple_assign_rhs1 (def_stmt); + + if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (innerop)) As Steven said, the abnormal check is not necessary, but for completeness you should check TREE_CODE (innerop) == SSA_NAME. Valid (but unfolded) GIMPLE can have (_Bool) 1, too. Agreed fixed. Note that we already have code with similar functionality (see if a conversion would alter the value of X) as part of optimizing (T1)(T2)X to (T1)X in simplify_conversion_using_ranges. Maybe a part of it can be split out and used to simplify conditions for a bigger range of types than just compares against boolean 0/1. That may be a follow-up -- there's still several of these things I'm looking at. I wanted to go ahead and start pushing out those which were clearly improvements rather than queue them while I looked at all the oddities I'm seeing in the dumps. jeff
Re: [PATCH GCC]Relax the probability condition in CE pass when optimizing for code size
On 04/06/2013 09:15 PM, Bin Cheng wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Bin Cheng Sent: Tuesday, March 26, 2013 4:33 PM To: 'Joern Rennecke' Cc: gcc-patches@gcc.gnu.org; 'Jeff Law' Subject: RE: [PATCH GCC]Relax the probability condition in CE pass when optimizing for code size -Original Message- From: Joern Rennecke [mailto:joern.renne...@embecosm.com] Sent: Monday, March 25, 2013 8:53 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH GCC]Relax the probability condition in CE pass when optimizing for code size Quoting Bin Cheng bin.ch...@arm.com: During the work I observed passes before combine might interfere with CE pass, so this patch is enabled for ce2/ce3 after combination pass. It is tested on x86/thumb2 for both normal and Os. Is it ok for trunk? There are bound to be target and application specific variations on which scaling factors work best. 2013-03-25 Bin Cheng bin.ch...@arm.com * ifcvt.c (ifcvt_after_combine): New static variable. It would make more sense to pass in the scale factor as a an argument to if_convert. And get the respective values from a set of gcc parameters, so they can be tweaked by ports and/or by a user/ML learning framework (e.g. Milepost) supplying the appropriate --param option. I agree it would be more flexible to pass the factor as parameter, but not sure how useful to users it will be because: firstly it has already been target specific by the BRANCH_COST heuristic; for code size, the heuristic should be tuned to achieve an overall good results, I doubt to which extend it depends on specific target/application. Hi Jeff, This is based on your heuristic tuning in ifcvt, would you help us on this issue with some suggestions? Not sure what you need from me. It seems to me that having the scaling factor be dependent on optimizing for size vs optimizing for speed makes sense. The only question is whether or not it's important enough to be a knob the user can turn -- I've got no strong opinions on that. jeff
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 9:24 AM, Kenneth Zadeck wrote: So then how does a language like ada work in gcc? My assumption is that most of what you describe here is done in the front end and by the time you get to the middle end of the compiler, you have chosen types for which you are comfortable to have any remaining math done in along with explicit checks for overflow where the programmer asked for them. That's right, the front end does all the promotion of types Otherwise, how could ada have ever worked with gcc? Sometimes we do have to make changes to gcc to accomodate Ada specific requirements, but this was not one of those cases. Of course the back end would do a better job of the range analysis to remove some unnecessary use of infinite precision, but the front end in practice does a good enough job.
Re: C: Add new warning -Wunprototyped-calls
Richard Biener richard.guent...@gmail.com writes: On Mon, Apr 8, 2013 at 3:05 PM, Andreas Schwab sch...@linux-m68k.org wrote: Richard Biener richard.guent...@gmail.com writes: when a real prototype was visible How is that different from a prototype? It's different from the case where a KR definition was seen and thus type information is present via that mechanism. We don't want to warn in that case. But that isn't a prototype. As I suggested, the warning should just print without a prototype but prototype here means that a definition before the call is enough to make us happy (as opposed to -Wstrict-prototypes which warns about function definitions without a previous prototype we want to warn about calls to functions without a definition or a prototype). How does a definition help here if it isn't a prototype? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 9:23 AM, Kenneth Zadeck wrote: On 04/08/2013 09:19 AM, Robert Dewar wrote: On 4/8/2013 9:15 AM, Kenneth Zadeck wrote: I think this applies to Ada constant arithmetic as well. Ada constant arithmetic (at compile time) is always infinite precision (for float as well as for integer). What do you mean when you say constant arithmetic?Do you mean places where there is an explicit 8 * 6 in the source or do you mean any arithmetic that a compiler, using the full power of interprocedural constant propagation can discover? Somewhere between the two. Ada has a very well defined notion of what is and what is not a static expression, it definitely does not include everything the compiler can discover, but it goes beyond just explicit literal arithmetic, e.g. declared constants X : Integer := 75; are considered static. It is static expressions that must be computed with full precision at compile time. For expressions the compiler can tell are constant even though not officially static, it is fine to compute at compile time for integer, but NOT for float, since you want to use target precision for all non-static float-operations.
[PING]RE: [patch] cilkplus: Array notation for C patch
Hello Joseph, Did you get a chance to look at this patch? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Friday, March 29, 2013 5:58 PM To: 'Joseph Myers'; 'Aldy Hernandez' Cc: 'gcc-patches' Subject: RE: [patch] cilkplus: Array notation for C patch Hello Joseph, Aldy et al., I reworded couple comments (e.g changed builtin with built-in, etc) and added a header comment to the c-array-notation.c that explains the overall process. I am attaching a fixed patch. Thanks, Balaji V. Iyer. Here are the Changelog entries again: gcc/ChangeLog +2013-03-28 Balaji V. Iyer balaji.v.i...@intel.com + + * doc/extend.texi (C Extensions): Added documentation about Cilk Plus + array notation built-in reduction functions. + * doc/passes.texi (Passes): Added documentation about changes done + for Cilk Plus. + * doc/invoke.texi (C Dialect Options): Added documentation about + the -fcilkplus flag. + * doc/generic.texi (Storage References): Added documentation for + ARRAY_NOTATION_REF storage. + * Makefile.in (C_COMMON_OBJS): Added c-family/array-notation- common.o. + * tree-pretty-print.c (dump_generic_node): Add case for + ARRAY_NOTATION_REF. + (BUILTINS_DEF): Depend on cilkplus.def. + * builtins.def: Include cilkplus.def. + Define DEF_CILKPLUS_BUILTIN. + * builtin-types.def: Define BT_FN_INT_PTR_PTR_PTR. + * cilkplus.def: New file. gcc/c-family/ChangeLog +2013-03-28 Balaji V. Iyer balaji.v.i...@intel.com + + * c-common.c (c_define_builtins): When cilkplus is enabled, the + function array_notation_init_builtins is called. + (c_common_init_ts): Added ARRAY_NOTATION_REF as typed. + * c-common.def (ARRAY_NOTATION_REF): New tree. + * c-common.h (build_array_notation_expr): New function declaration. + (build_array_notation_ref): Likewise. + (extract_sec_implicit_index_arg): New extern declaration. + (is_sec_implicit_index_fn): Likewise. + (ARRAY_NOTATION_CHECK): New define. + (ARRAY_NOTATION_ARRAY): Likewise. + (ARRAY_NOTATION_START): Likewise. + (ARRAY_NOTATION_LENGTH): Likewise. + (ARRAY_NOTATION_STRIDE): Likewise. + (ARRAY_NOTATION_TYPE): Likewise. + * c-pretty-print.c (pp_c_postifix_expression): Added a new case for + ARRAY_NOTATION_REF. + (pp_c_expression): Likewise. + * c.opt (flag_enable_cilkplus): New flag. + * array-notation-common.c: New file. gcc/c/ChangeLog +2013-03-28 Balaji V. Iyer balaji.v.i...@intel.com + + * c-typeck.c (build_array_ref): Added a check to see if array's + index is greater than one. If true, then emit an error. + (build_function_call_vec): Exclude error reporting and checking + for builtin array-notation functions. + (convert_arguments): Likewise. + (c_finish_return): Added a check for array notations as a return + expression. If true, then emit an error. + (c_finish_loop): Added a check for array notations in a loop + condition. If true then emit an error. + (lvalue_p): Added a ARRAY_NOTATION_REF case. + (build_binary_op): Added a check for array notation expr inside + op1 and op0. If present, we call another function to find correct + type. + * Make-lang.in (C_AND_OBJC_OBJS): Added c-array-notation.o. + * c-parser.c (c_parser_compound_statement): Check if array + notation code is used in tree, if so, then transform them into + appropriate C code. + (c_parser_expr_no_commas): Check if array notation is used in LHS + or RHS, if so, then build array notation expression instead of + regular modify. + (c_parser_postfix_expression_after_primary): Added a check for + colon(s) after square braces, if so then handle it like an array + notation. Also, break up array notations in unary op if found. + (c_parser_direct_declarator_inner): Added a check for array + notation. + (c_parser_compound_statement): Added a check for array notation in + a stmt. If one is present, then expand array notation expr. + (c_parser_if_statement): Likewise. + (c_parser_switch_statement): Added a check for array notations in + a switch statement's condition. If true, then output an error. + (c_parser_while_statement): Similarly, but for a while. + (c_parser_do_statement): Similarly, but for a do-while. + (c_parser_for_statement): Similarly, but for a for-loop. + (c_parser_unary_expression): Check if array notation is used in a + pre-increment or pre-decrement expression. If true, then expand + them. + (c_parser_array_notation): New function. + * c-array-notation.c: New file. + * c-tree.h (is_cilkplus_reduce_builtin): Protoize. -Original Message- From: Iyer, Balaji V Sent: Thursday, March 28, 2013 1:07 PM To: Joseph Myers;
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 09:52 AM, Robert Dewar wrote: On 4/8/2013 9:23 AM, Kenneth Zadeck wrote: On 04/08/2013 09:19 AM, Robert Dewar wrote: On 4/8/2013 9:15 AM, Kenneth Zadeck wrote: I think this applies to Ada constant arithmetic as well. Ada constant arithmetic (at compile time) is always infinite precision (for float as well as for integer). What do you mean when you say constant arithmetic?Do you mean places where there is an explicit 8 * 6 in the source or do you mean any arithmetic that a compiler, using the full power of interprocedural constant propagation can discover? Somewhere between the two. Ada has a very well defined notion of what is and what is not a static expression, it definitely does not include everything the compiler can discover, but it goes beyond just explicit literal arithmetic, e.g. declared constants X : Integer := 75; I actually guessed that it was something like this but i did not want to spend the time trying to figure this bit of ada syntax out. are considered static. It is static expressions that must be computed with full precision at compile time. For expressions the compiler can tell are constant even though not officially static, it is fine to compute at compile time for integer, but NOT for float, since you want to use target precision for all non-static float-operations. yes but the relevant question for the not officially static integer constants is in what precision are those operations to be performed in?I assume that you choose gcc types for these operations and you expect the math to be done within that type, i.e. exactly the way you expect the machine to perform.
Re: [RFA][PATCH] Improve VRP of COND_EXPR_CONDs
On Mon, Apr 8, 2013 at 3:27 PM, Jeff Law l...@redhat.com wrote: On 04/08/2013 03:45 AM, Richard Biener wrote: @@ -8584,6 +8584,43 @@ simplify_cond_using_ranges (gimple stmt) } } + /* If we have a comparison of a SSA_NAME boolean against + a constant (which obviously must be [0..1]). See if the + SSA_NAME was set by a type conversion where the source + of the conversion is another SSA_NAME with a range [0..1]. + + If so, we can replace the SSA_NAME in the comparison with + the RHS of the conversion. This will often make the type + conversion dead code which DCE will clean up. */ + if (TREE_CODE (op0) == SSA_NAME + TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE Use (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE || (INTEGRAL_TYPE_P (TREE_TYPE (op0)) TYPE_PRECISION (TREE_TYPE (op0)) == 1)) to catch some more cases. Good catch. Done. + is_gimple_min_invariant (op1)) In this case it's simpler to test TREE_CODE (op1) == INTEGER_CST. Agreed fixed. +{ + gimple def_stmt = SSA_NAME_DEF_STMT (op0); + tree innerop; + + if (!is_gimple_assign (def_stmt) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + return false; + + innerop = gimple_assign_rhs1 (def_stmt); + + if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (innerop)) As Steven said, the abnormal check is not necessary, but for completeness you should check TREE_CODE (innerop) == SSA_NAME. Valid (but unfolded) GIMPLE can have (_Bool) 1, too. Agreed fixed. Note that we already have code with similar functionality (see if a conversion would alter the value of X) as part of optimizing (T1)(T2)X to (T1)X in simplify_conversion_using_ranges. Maybe a part of it can be split out and used to simplify conditions for a bigger range of types than just compares against boolean 0/1. That may be a follow-up -- there's still several of these things I'm looking at. I wanted to go ahead and start pushing out those which were clearly improvements rather than queue them while I looked at all the oddities I'm seeing in the dumps. Fine with me. Richard. jeff
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 9:58 AM, Kenneth Zadeck wrote: yes but the relevant question for the not officially static integer constants is in what precision are those operations to be performed in?I assume that you choose gcc types for these operations and you expect the math to be done within that type, i.e. exactly the way you expect the machine to perform. As I explained in an earlier message, *within* a single expression, we are free to use higher precision, and we provide modes that allow this up to and including the usea of infinite precision. That applies not just to constant expressions but to all expressions.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 10:12 AM, Robert Dewar wrote: On 4/8/2013 9:58 AM, Kenneth Zadeck wrote: yes but the relevant question for the not officially static integer constants is in what precision are those operations to be performed in?I assume that you choose gcc types for these operations and you expect the math to be done within that type, i.e. exactly the way you expect the machine to perform. As I explained in an earlier message, *within* a single expression, we are free to use higher precision, and we provide modes that allow this up to and including the usea of infinite precision. That applies not just to constant expressions but to all expressions. My confusion is what you mean by we? Do you mean we the writer of the program, we the person invoking the compiler by the use command line options or we, your company's implementation of ada? My interpretation of your first email was that it was possible for the programmer to do something equivalent to adding attributes surrounding a block in the program to control the precision and overflow detection of the expressions in the block. And if this is so, then by the time the expression is seen by the middle end of gcc, those attributes will have been converted into tree code will evaluate the code in a well defined way by both the optimization passes and the target machine. Kenny
Re: RFC: color diagnostics markers
On 8 April 2013 15:23, Jakub Jelinek ja...@redhat.com wrote: On Fri, Apr 05, 2013 at 11:51:43PM +0200, Manuel López-Ibáñez wrote: In this patch the default is never, because for some reason auto triggers colorization during regression testing. I have not found a That reason is obvious, dejagnu (expect?) creates pseudo terminals, so isatty is true, we'd need to just use -fno-diagnostics-color by default for the testsuite (IMHO not a big deal). Fine for me. Anyway, I've kept the default as never for now, but am sending my review comments in form of a new diff, which fixes formatting, avoids memory leaks and changes it to introduce more color names (for caret, locus, quoted text), change default of note color (for some color compatibility with clang, bold green is there used for caret lines, for notes they use bold black apparently, but that doesn't work too well on white-on-black terminals). Right now the patch is unfinished, because there is no support for the new %[locus]%s:%d:%d%[] style diagnostics strings (where %[locus] and %[] stand for switching to locus color and resetting color %back) in the -Wformat code (and gettext). I'm wondering if instead of the %[colorname] and %[] it wouldn't be better to just have some %r or whatever letter isn't taken yet which would consume a const char * colorname from %va_arg, and some other letter with no argument that would do color reset. Ideas for best unused letters for that? Perhaps then -Wformat support for it would be easier. I.e. instead of: pp_printf (%[locus]%s:%d:%d[], loc.file, loc.line, loc.column); one would write: pp_printf (%r%s:%d:%d%R, locus, loc.file, loc.line, loc.column); Thanks for working on this, your improvements are quite nice. About %r versus %[colorname], I just don't see the user-case for dynamic color names. In fact, I would be fine with something like: pp_start_color() pp_stop_color() pp_wrap_in_color() It is a bit more verbose, but also clearer when reading the code. And no need for %[colorname] or %r or -Wformat support. Cheers, Manuel.
Re: Comments on the suggestion to use infinite precision math for wide int.
On Mon, Apr 8, 2013 at 2:43 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 04/08/2013 06:46 AM, Richard Biener wrote: On Sun, Apr 7, 2013 at 7:16 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: Richard, You advocate that I should be using an infinite precision representation and I advocate a finite precision representation where the precision is taken from the context. I would like to make the case for my position here, in a separate thread, because the other thread is just getting too messy. At both the tree level and the rtl level you have a type (mode is just bad rep for types) and both of those explicitly have precisions. The semantics of the programming languages that we implement define, or at least recommend, that most operations be done in a precision that is implementation dependent (or like java a particular machine independent precision). Each hardware platform specifies exactly how every operation is done. I will admit that infinite precision is more esthetically pleasing than what i have done, but exact precision matches the needs of these clients. The problem is that the results from infinite precision arithmetic differ in many significant ways from finite precision math. And the number of places where you have to inject a precision to get the expected answer, ultimately makes the infinite precision representation unattractive. As I said on Thursday, whenever you do operations that do not satisfy the requirements of a mathematical ring (add sub and mul are in a ring, divide, shift and comparisons are not) you run the risk of getting a result that is not what would have been obtained with either a strict interpretation of the semantics or the machine. Intuitively any operation that looks at the bits above the precision does not qualify as an operation that works in a ring. The poster child for operations that do not belong to a ring is division. For my example, I am using 4 bit integers because it makes the examples easy, but similar examples exist for any fixed precision. Consider 8 * 10 / 4 in an infinite precision world the result is 20, but in a 4 bit precision world the answer is 0. another example is to ask if -10 * 10 is less than 0? again you get a different answer with infinite precision. I would argue that if i declare a variable of type uint32 and scale my examples i have the right to expect the compiler to produce the same result as the machine would. While C and C++ may have enough wiggle room in their standards so that this is just an unexpected, but legal, result as opposed to being wrong, everyone will hate you (us) if we do this. Furthermore, Java explicitly does not allow this (not that anyone actually uses gcj). I do not know enough about go, ada and fortran to say how it would effect them. In looking at the double-int class, the only operation that does not fit in a ring that is done properly is shifting. There we explicitly pass in the precision. The reason that we rarely see this kind of problem even though double-int implements 128 bit infinite precision is that currently very little of the compiler actually uses infinite precision in a robust way. In a large number of places, the code looks like: if (TYPE_PRECISION (TREE_TYPE (...)) HOST_BITS_PER_WIDE_INT) do something using inline operators. else either do not do something or use const-double, such code clears out most of these issues before the two passes that embrace infinite precision get a chance to do much damage. However, my patch at the rtl level gets rid of most of this kind of code and replaces it with calls to wide-int that currently uses only operations within the precision. I assume that if i went down the infinite precision road at the tree level, that all of this would come to the surface very quickly. I prefer to not change my rep and not have to deal with this later. Add, subtract, multiply and the logicals are all safe. But divide, remainder, and all of the comparisons need explicit precisions. In addition operations like clz, ctl and clrsb need precisions. In total about half of the functions would need a precision passed in. My point is that once you have to start passing in the precision in for all of those operations, it seems to be cleaner to get the precision from the leaves of the tree as I currently do. Once you buy into the math in a particular precision world, a lot of the other issues that you raise are just settled. Asking how to extend a value beyond it's precision is like asking what the universe was like before the big bang. It is just something you do not need to know. I understand that you would like to have functions like x + 1 work, and so do I. I just could not figure out how to make them have unsurprising semantics. In particular, g++ did not seem to be happy with me defining two plus operators, one for each of signed and unsigned HWIs.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 10:26 AM, Kenneth Zadeck wrote: My confusion is what you mean by we? Do you mean we the writer of the program, we the person invoking the compiler by the use command line options or we, your company's implementation of ada? Sorry, bad usage, The gcc implementation of Ada allows the user to specify by pragmas how intermediate overflow is handled. My interpretation of your first email was that it was possible for the programmer to do something equivalent to adding attributes surrounding a block in the program to control the precision and overflow detection of the expressions in the block. And if this is so, then by the time the expression is seen by the middle end of gcc, those attributes will have been converted into tree code will evaluate the code in a well defined way by both the optimization passes and the target machine. Yes, that's a correct understanding Kenny
Re: [Fortran, RFC patch] Document naming and argument passing convention
Dear all, attached is an updated version of the patch, which address the raised issues and some minor problems and omissions I found. OK for the trunk? Tobias 2013-04-08 Tobias Burnus bur...@net-b.de * gfortran.texi (KIND Type Parameters, Internal representation of LOGICAL variables): Add crossrefs. (Intrinsic Types): Mention issues with _Bool interop. (Naming and argument-passing conventions): New section. diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 4f9008d..46fdeb3 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -1166,7 +1166,8 @@ parameters of the @code{ISO_FORTRAN_ENV} module instead of the concrete values. The available kind parameters can be found in the constant arrays @code{CHARACTER_KINDS}, @code{INTEGER_KINDS}, @code{LOGICAL_KINDS} and @code{REAL_KINDS} in the @code{ISO_FORTRAN_ENV} module -(see @ref{ISO_FORTRAN_ENV}). +(see @ref{ISO_FORTRAN_ENV}). For C interoperability, the kind parameters of +the @code{ISO_C_BINDING} module should be used (see @ref{ISO_C_BINDING}). @node Internal representation of LOGICAL variables @@ -1184,16 +1185,7 @@ A @code{LOGICAL(KIND=N)} variable is represented as an values: @code{1} for @code{.TRUE.} and @code{0} for @code{.FALSE.}. Any other integer value results in undefined behavior. -Note that for mixed-language programming using the -@code{ISO_C_BINDING} feature, there is a @code{C_BOOL} kind that can -be used to create @code{LOGICAL(KIND=C_BOOL)} variables which are -interoperable with the C99 _Bool type. The C99 _Bool type has an -internal representation described in the C99 standard, which is -identical to the above description, i.e. with 1 for true and 0 for -false being the only permissible values. Thus the internal -representation of @code{LOGICAL} variables in GNU Fortran is identical -to C99 _Bool, except for a possible difference in storage size -depending on the kind. +See also @ref{Argument passing conventions} and @ref{Interoperability with C}. @node Thread-safety of the runtime library @@ -2204,6 +2196,7 @@ common, but not the former. * Interoperability with C:: * GNU Fortran Compiler Directives:: * Non-Fortran Main Program:: +* Naming and argument-passing conventions:: @end menu This chapter is about mixed-language interoperability, but also applies @@ -2250,6 +2243,16 @@ in C and Fortran, the named constants shall be used which are defined in the for kind parameters and character named constants for the escape sequences in C. For a list of the constants, see @ref{ISO_C_BINDING}. +For logical types, please note that the Fortran standard only guarantees +interoperability between C99's @code{_Bool} and Fortran's @code{C_Bool}-kind +logicals and C99 defines that @code{true} has the value 1 and @code{false} +the value 0. Using any other integer value with GNU Fortran's @code{LOGICAL} +(with any kind parameter) gives an undefined result. (Passing other integer +values than 0 and 1 to GCC's @code{_Bool} is also undefined, unless the +integer is explicitly or implicitly casted to @code{_Bool}.) + + + @node Derived Types and struct @subsection Derived Types and struct @@ -2975,6 +2978,144 @@ int main (int argc, char *argv[]) @end table +@node Naming and argument-passing conventions +@section Naming and argument-passing conventions + +This section gives an overview about the naming convention of procedures +and global variables and about the argument passing conventions used by +GNU Fortran. If a C binding has been specified, the naming convention +and some of the argument-passing conventions change. If possible, +mixed-language and mixed-compiler projects should use the better defined +C binding for interoperability. See @pxref{Interoperability with C}. + +@menu +* Naming conventions:: +* Argument passing conventions:: +@end menu + + +@node Naming conventions +@subsection Naming conventions + +According the Fortran standard, valid Fortran names consist of a letter +between @code{A} to @code{Z}, @code{a} to @code{z}, digits @code{0}, +@code{1} to @code{9} and underscores (@code{_}) with the restriction +that names may only start with a letter. As vendor extension, the +dollar sign (@code{$}) is additionally permitted with the option +@option{-fdollar-ok}, but not as first character and only if the +target system supports it. + +By default, the procedure name is the lower-cased Fortran name with an +appended underscore (@code{_}); using @option{-fno-underscoring} no +underscore is appended while @code{-fsecond-underscore} appends two +underscores. Depending on the target system and the calling convention, +the procedure might be additionally dressed; for instance, on 32bit +Windows with @code{stdcall}, an at-sign @code{@@} followed by an integer +number is appended. For the changing the calling convention, see +@pxref{GNU Fortran Compiler Directives}. + +For common blocks, the same convention is used, i.e. by default an +underscore is
Re: [C++ Patch] PR 56871
... I think that by the time we do the check, if old_decl is a FUNCTION_DECL we can safely assume that new_decl is also a FUNCTION_DECL, thus I can simplify the code. I'm finishing testing the below variant. Thanks Paolo. /// Index: cp/decl.c === --- cp/decl.c (revision 197572) +++ cp/decl.c (working copy) @@ -1196,12 +1196,21 @@ validate_constexpr_redeclaration (tree old_decl, t if (DECL_DECLARED_CONSTEXPR_P (old_decl) == DECL_DECLARED_CONSTEXPR_P (new_decl)) return true; - if (TREE_CODE (old_decl) == FUNCTION_DECL DECL_BUILT_IN (old_decl)) + if (TREE_CODE (old_decl) == FUNCTION_DECL) { - /* Hide a built-in declaration. */ - DECL_DECLARED_CONSTEXPR_P (old_decl) - = DECL_DECLARED_CONSTEXPR_P (new_decl); - return true; + if (DECL_BUILT_IN (old_decl)) + { + /* Hide a built-in declaration. */ + DECL_DECLARED_CONSTEXPR_P (old_decl) + = DECL_DECLARED_CONSTEXPR_P (new_decl); + return true; + } + /* 7.1.5 [dcl.constexpr] +Note: An explicit specialization can differ from the template +declaration with respect to the constexpr specifier. */ + if (! DECL_TEMPLATE_SPECIALIZATION (old_decl) + DECL_TEMPLATE_SPECIALIZATION (new_decl)) + return true; } error (redeclaration %qD differs in %constexpr%, new_decl); error (from previous declaration %q+D, old_decl); Index: testsuite/g++.dg/cpp0x/constexpr-specialization.C === --- testsuite/g++.dg/cpp0x/constexpr-specialization.C (revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-specialization.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/56871 +// { dg-options -std=c++11 } + +templatetypename T constexpr int foo(T); +template int foo(int); +template int foo(int);// { dg-error previous } +template constexpr int foo(int); // { dg-error redeclaration } + +templatetypename T int bar(T); +template constexpr int bar(int); +template constexpr int bar(int); // { dg-error previous } +template int bar(int);// { dg-error redeclaration }
Re: RFC: color diagnostics markers
On Mon, Apr 08, 2013 at 04:29:02PM +0200, Manuel López-Ibáñez wrote: In fact, I would be fine with something like: pp_start_color() pp_stop_color() pp_wrap_in_color() It is a bit more verbose, but also clearer when reading the code. And no need for %[colorname] or %r or -Wformat support. But you then need to break the code into multiple function calls, which decreases readability. pp_verbatim (context-printer, _(%s:%d:%d: [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), xloc.file, xloc.line, xloc.column, skip); can be right now a single call, while you would need several. Also, if you eventually want to colorize something in say error_at, warning_at and similar format strings. For those you really don't have the printer at hand, and can't easily break it into multiple calls. The reason for %r/%R instead of the %[ in the patch is that I think it will be easier to teach -Wformat and gettext about it that way, rather than if the argument is embedded in between [ and ]. With %r/%R it would be: pp_verbatim (context-printer, _(%r%s:%d:%d:%R [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), locus, xloc.file, xloc.line, xloc.column, skip); Jakub
Re: [cilkplus] misc cleanups for #pragma simd implementation
On 04/08/13 08:59, Iyer, Balaji V wrote: Hi Aldy, Here are the things I found with the patch. All my comments have BVI: in front of them. BTW, it would be nice if you could use standard mailer quotation when responding (, etc). - return; BVI: I am OK with removing this return, but the reason why I put it there is because it gets easier for me to set the break point there. This is not standard practice in GCC source code. It will have to be removed if we ever merge. case PRAGMA_CILK_GRAINSIZE: - if (context == pragma_external) - { - c_parser_error (parser,pragma grainsize must be inside a function); - return false; - } - if (flag_enable_cilk) - c_parser_cilk_grainsize (parser); - else - { - warning (0, pragma grainsize ignored); - c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL); - } + if (!c_parser_pragma_simd_ok_p (parser, context)) + return false; + cilkplus_local_simd_values.loc = loc; + c_parser_cilk_grainsize (parser); BVI: This is incorrect. #pragma grainsize is part of cilk keywords. It has no relation to the pragma simd and it will work wthout vectorization support. Fixed, let me know if the current implementation is correct. +// FIXME: We should really rewrite all this psv* business to use vectors. +/* Given an index into the pragma simd list (PSV_INDEX), find its + entry and return it. */ BVI: I am in the process of doing so. I will send out that patch as soon as I get some free time. Ok, I have left all the FIXMEs so we don't miss any of them. for (ps_iter = psv_head; ps_iter-ptr_next != NULL; ps_iter = ps_iter-ptr_next) -{ - ; -} +; BVI: Are you sure the compiler let you get away this this? It gave me a warning once (in stage2 I believe). Sure, I've done it for years. BVI: I have fixed these scripts already: the correct notation that I have used is cilkplus_type_language_compile/execute/errors.exp Ok, I have removed them from my patch. - + BVI: Why did you replace a space with a tab? Whoops, removed the space (and the tab). How about this? commit 1847c6c76ca2ed0da68cb7985fde4c0b4d634b65 Author: Aldy Hernandez al...@redhat.com Date: Mon Apr 8 09:59:38 2013 -0500 Minor cleanups for pragma simd implementation. diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index f00d28d..a48b011 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -117,26 +117,11 @@ c_parse_init (void) ridpointers [(int) c_common_reswords[i].rid] = id; } - /* Here we initialize the simd_values structure. We only need it - initialized the first time, after each consumptions, for-loop will - automatically consume the values and delete the information. */ - cilkplus_local_simd_values.index = 0; - cilkplus_local_simd_values.pragma_encountered = false; - cilkplus_local_simd_values.types = P_SIMD_NOASSERT; - cilkplus_local_simd_values.vectorlength = NULL_TREE; - cilkplus_local_simd_values.vec_length_list= NULL; - cilkplus_local_simd_values.vec_length_size= 0; - cilkplus_local_simd_values.private_vars = NULL_TREE; - cilkplus_local_simd_values.priv_var_list = NULL; - cilkplus_local_simd_values.priv_var_size = 0; - cilkplus_local_simd_values.linear_vars= NULL_TREE; - cilkplus_local_simd_values.linear_var_size= 0; - cilkplus_local_simd_values.linear_var_list= NULL; - cilkplus_local_simd_values.linear_steps = NULL_TREE; - cilkplus_local_simd_values.linear_steps_list = NULL; - cilkplus_local_simd_values.linear_steps_size = 0; - cilkplus_local_simd_values.reduction_vals = NULL; - cilkplus_local_simd_values.ptr_next = NULL; + /* Only initialize the first time. After each consumption, the + for-loop handling code (c_finish_loop) will automatically consume + the values and delete the information. */ + memset (cilkplus_local_simd_values, 0, + sizeof (cilkplus_local_simd_values)); clear_pragma_simd_list (); } @@ -1251,12 +1236,16 @@ static void c_parser_objc_at_synthesize_declaration (c_parser *); static void c_parser_objc_at_dynamic_declaration (c_parser *); static bool c_parser_objc_diagnose_bad_element_prefix (c_parser *, struct c_declspecs *); + +// FIXME: Re-work this so there are only prototypes for mutually +// recursive functions. +/* Cilk Plus supporting routines. */ static void c_parser_cilk_for_statement (c_parser *, tree); -void c_parser_simd_linear (c_parser *); -void c_parser_simd_private (c_parser *); -void c_parser_simd_assert (c_parser *, bool); -void c_parser_simd_vectorlength (c_parser *); -void c_parser_simd_reduction (c_parser *); +static void c_parser_simd_linear (c_parser *); +static void c_parser_simd_private (c_parser *); +static void c_parser_simd_assert (c_parser *, bool); +static void c_parser_simd_vectorlength (c_parser *); +static void
RFA: Fix tree-optimization/55524
This is basically the same patch as attached to the PR, except that I have changed the goto-loop into a do-while loop with a new comment; this caused the need for a lot of reformatting. bootstrapped regtested on i686-pc-linux-gnu. 2013-04-08 Joern Rennecke joern.renne...@embecosm.com * tree-ssa-math-opts.c (mult_to_fma_pass): New file static struct. (convert_mult_to_fma): In first pass, don't use an fms construct when we don't have an fms operation, but fmna. (execute_optimize_widening_mul): Add a second pass if convert_mult_to_fma requests it. Index: gcc/tree-ssa-math-opts.c === --- gcc/tree-ssa-math-opts.c(revision 197578) +++ gcc/tree-ssa-math-opts.c(working copy) @@ -2461,6 +2461,12 @@ convert_plusminus_to_widen (gimple_stmt_ return true; } +static struct +{ + bool second_pass; + bool retry_request; +} mult_to_fma_pass; + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ @@ -2570,6 +2576,22 @@ convert_mult_to_fma (gimple mul_stmt, tr return false; } + /* If the subtrahend (gimple_assign_rhs2 (use_stmt)) is computed +by a MULT_EXPR that we'll visit later, we might be able to +get a more profitable match with fnma. +OTOH, if we don't, a negate / fma pair has likely lower latency +that a mult / subtract pair. */ + if (use_code == MINUS_EXPR !negate_p + gimple_assign_rhs1 (use_stmt) == result + optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing + optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing + mult_to_fma_pass.second_pass == false) + { + /* ??? Could make setting of retry_request dependent on some +rtx_cost measure we evaluate beforehand. */ + mult_to_fma_pass.retry_request = true; + return false; + } /* We can't handle a * b + a * b. */ if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false; @@ -2657,76 +2679,89 @@ execute_optimize_widening_mul (void) memset (widen_mul_stats, 0, sizeof (widen_mul_stats)); - FOR_EACH_BB (bb) -{ - gimple_stmt_iterator gsi; - for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);) -{ - gimple stmt = gsi_stmt (gsi); - enum tree_code code; + /* We may run one or two passes. In the first pass, if have fnma, + but not fms, we don't synthesize fms so that we can get the maximum + matches for fnma. If we have therefore skipped opportunities to + synthesize fms, we'll run a second pass where we use any such + opportunities that still remain. */ + mult_to_fma_pass.retry_request = false; + do +{ + mult_to_fma_pass.second_pass = mult_to_fma_pass.retry_request; + FOR_EACH_BB (bb) + { + gimple_stmt_iterator gsi; - if (is_gimple_assign (stmt)) + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);) { - code = gimple_assign_rhs_code (stmt); - switch (code) + gimple stmt = gsi_stmt (gsi); + enum tree_code code; + + if (is_gimple_assign (stmt)) { - case MULT_EXPR: - if (!convert_mult_to_widen (stmt, gsi) - convert_mult_to_fma (stmt, - gimple_assign_rhs1 (stmt), - gimple_assign_rhs2 (stmt))) + code = gimple_assign_rhs_code (stmt); + switch (code) { - gsi_remove (gsi, true); - release_defs (stmt); - continue; - } - break; - - case PLUS_EXPR: - case MINUS_EXPR: - convert_plusminus_to_widen (gsi, stmt, code); - break; + case MULT_EXPR: + if (!convert_mult_to_widen (stmt, gsi) + convert_mult_to_fma (stmt, + gimple_assign_rhs1 (stmt), + gimple_assign_rhs2 (stmt))) + { + gsi_remove (gsi, true); + release_defs (stmt); + continue; + } + break; + + case PLUS_EXPR: + case MINUS_EXPR: + convert_plusminus_to_widen (gsi, stmt, code); + break; - default:; + default:; + } } - } - else if
[patch cygwin]: Replace use of TARGET_CYGWIN64 by TARGET_64BIT
Hi, this patch fixes an obvious typo in recently applied patch. ChangeLog 2013-04-08 Kai Tietz kti...@redhat.com * config/i386/cygwin.h (EXTRA_OS_CPP_BUILTINS): Replaced TARGET_CYGWIN64 by TARGET_64BIT. Applied to trunk as obvious fix. as revision 197593. Regards, Kai Index: cygwin.h === --- cygwin.h(Revision 197586) +++ cygwin.h(Arbeitskopie) @@ -22,7 +22,7 @@ along with GCC; see the file COPYING3. If not see do \ { \ builtin_define (__CYGWIN__); \ - if (!TARGET_CYGWIN64)\ + if (!TARGET_64BIT) \ builtin_define (__CYGWIN32__);\ builtin_define (__unix__); \ builtin_define (__unix); \
Re: [PATCH] Avoid warning when unused attribute applied to C++ member variables (issue8212043)
Ping. Thanks, Teresa On Sun, Mar 31, 2013 at 9:39 AM, Teresa Johnson tejohn...@google.com wrote: On Sun, Mar 31, 2013 at 1:36 AM, Andrew Pinski pins...@gmail.com wrote: On Sun, Mar 31, 2013 at 12:10 AM, Teresa Johnson tejohn...@google.com wrote: This patch allows the unused attribute to be used without warning on C++ class members, which are of type FIELD_DECL. This is for compatibility with clang, which allows the attribute to be specified on class members and struct fields. It looks like more work would need to be done to implement the actual unused variable detection and warning on FIELD_DECLs, but this change will at least avoid the warning on the code that uses the unused attribute in these cases. The documentation at http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html also doesn't seem to preclude its use on C++ member variables. This also allows it on field in normal C case. As far as I understand they are fields and not variables in the normal programming sense which is why the document does not mention them. That's true that this change will also allow the unused attribute on normal C struct fields. I just verified that clang also allows this, and it could potentially be taken advantage of to warn on unused fields as well. Teresa Thanks, Andrew Pinski Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? 2013-03-30 Teresa Johnson tejohn...@google.com * c-family/c-common.c (handle_unused_attribute): Handle FIELD_DECL for C++ class members. Index: c-family/c-common.c === --- c-family/c-common.c (revision 197266) +++ c-family/c-common.c (working copy) @@ -6753,6 +6753,7 @@ handle_unused_attribute (tree *node, tree name, tr if (TREE_CODE (decl) == PARM_DECL || TREE_CODE (decl) == VAR_DECL + || TREE_CODE (decl) == FIELD_DECL || TREE_CODE (decl) == FUNCTION_DECL || TREE_CODE (decl) == LABEL_DECL || TREE_CODE (decl) == TYPE_DECL) -- This patch is available for review at http://codereview.appspot.com/8212043 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [C++ Patch] PR 56871
OK. Jason
Re: RFC: color diagnostics markers
On 8 April 2013 16:43, Jakub Jelinek ja...@redhat.com wrote: On Mon, Apr 08, 2013 at 04:29:02PM +0200, Manuel López-Ibáñez wrote: In fact, I would be fine with something like: pp_start_color() pp_stop_color() pp_wrap_in_color() It is a bit more verbose, but also clearer when reading the code. And no need for %[colorname] or %r or -Wformat support. But you then need to break the code into multiple function calls, which decreases readability. pp_verbatim (context-printer, _(%s:%d:%d: [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), xloc.file, xloc.line, xloc.column, skip); I guess decreases readability depends whether one knows what the extra codes mean or not. I still have to check many times what %K and %q#+T and other less common codes exactly do. I'd rather have less codes than more. And one could argue that the above call should be split, since the %s:%d:%d: should not be translated. That said, I would prefer that instead of expanded_location xloc; xloc = expand_location (loc); if (context-show_column) pp_verbatim (context-printer, _(%r%s:%d:%d:%R [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), locus, xloc.file, xloc.line, xloc.column, skip); else pp_verbatim (context-printer, _(%r%s:%d:%R [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), locus, xloc.file, xloc.line, skip); we had: pp_verbatim (context-printer, _(%X [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), expand_location(loc), skip); and the pretty-printer takes care of applying color or not (or expanding column numbers or not, etc). Or without the extra %X code: pp_print_locus (context-printer, loc); pp_verbatim (context-printer, _( [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), skip); Or even if we don't want the pretty printer to know about -show_column: diagnostic_print_locus (context, loc); pp_verbatim (context-printer, _( [ skipping %d instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]\n), skip); and the internal diagnostics machinery takes care of applying color or not (or expanding column numbers or not, etc). can be right now a single call, while you would need several. Also, if you eventually want to colorize something in say error_at, warning_at and similar format strings. For those you really don't have the printer at Do we really want to allow that much flexibility? Then the color_dict needs to be dynamic or the caller is restricted to re-using existing colornames. I was expecting the use of color to be rather limited to a very very few well-defined concepts. I was hoping that higher-level diagnostic functions would be oblivious to the color stuff to not make the diagnostics code much more complex. Maybe I am wrong here and FE maintainers do want that flexibility. Cheers, Manuel.
Re: RFC: color diagnostics markers
On Mon, Apr 08, 2013 at 07:54:18PM +0200, Manuel López-Ibáñez wrote: can be right now a single call, while you would need several. Also, if you eventually want to colorize something in say error_at, warning_at and similar format strings. For those you really don't have the printer at Do we really want to allow that much flexibility? Then the color_dict needs to be dynamic or the caller is restricted to re-using existing colornames. Yes, I think we want that flexibility, it certainly isn't that much difficult to support it (a few lines of code, will try to code the %r/%R variant tomorrow), and from time to time it can be useful. Perhaps that %L or whatever character isn't taken for the expanded location could be used too. I was expecting the use of color to be rather limited to a very very few well-defined concepts. I was hoping that higher-level diagnostic functions would be oblivious to the color stuff to not make the diagnostics code much more complex. I don't see why we would need dynamic color names, as the color names are to be overridable through GCC_COLORS, documented in invoke.text etc., the list better be static and not too long, but we can add new color names in the future when needed. Jakub
useless cast blocking some optimization in gcc 4.7.3
Hello, I have identified a big performance regression between 4.6 and 4.7. (I have enclosed a pathological test). After investigation, it is because of the += statement applied on 2 signed chars. - It is now type-promoted to int when it is written result += foo().(since 4.7) - it is type promoted to unsigned char when it is written result = result + foo(). The char-int-char cast is blocking some optimizations in later phases. Anyway, this doesn't look wrong, so I extended fold optimization in order to catch this case. (patch enclosed) The patch basically transforms : (TypeA) ( (TypeB) a1 + (TypeB) a2 )/* with a1 and a2 of the signed type TypeA */ into : a1 + a2 I believe this is legal for any licit a1/a2 input values (no overflow on signed char). No new failure on the two tested targets : sh-superh-elf and x86_64-unknown-linux-gnu. Should I enter a bugzilla to track this ? Is it ok for trunk ? 2013-04-08 Laurent Alfonsi laurent.alfo...@st.com * fold-const.c (fold_unary_loc): Suppress useless type promotion. Thanks, Laurent #include cstdio typedef char int8_t; const int iterations = 20; const int SIZE = 200; int8_t data8[SIZE]; /**/ template typename T inline void check_result(T result) { if (result != T(200)) { printf(test failed %d!=%d\n, result, 200); } } /**/ template typename T struct all_constants { static T get_one(T input) { return (T(1)); } }; /**/ template typename T, typename Input void test_constant(T* first, int count) { int i; for(i = 0; i iterations; ++i) { T result = 0; for (int n = 0; n count; ++n) { result += Input::get_one( first[n] ); } check_resultT(result); } } /**/ int main(int argc, char** argv) { test_constantint8_t, all_constantsint8_t (data8,SIZE); return 0; } --- ./gcc.orig/gcc/fold-const.c 2013-04-08 14:09:32.0 +0200 +++ ./gcc/gcc/fold-const.c 2013-04-08 11:08:16.0 +0200 @@ -8055,6 +8055,26 @@ } } + /* Convert (T1) ((T2)X + (T2)Y) into X + Y, + if X and Y already have type T1 (integral only), and T2 T1 */ + if (INTEGRAL_TYPE_P (type) + TYPE_OVERFLOW_UNDEFINED (type) + (TREE_CODE (op0) == PLUS_EXPR || TREE_CODE (op0) == MINUS_EXPR + || TREE_CODE (op0) == MULT_EXPR) + TREE_CODE (TREE_OPERAND (op0, 0)) == NOP_EXPR + TREE_CODE (TREE_OPERAND (op0, 1)) == NOP_EXPR + type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 0), 0)) + type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 1), 0)) + TYPE_PRECISION (type) TYPE_PRECISION (TREE_TYPE (op0))) + { + tem = fold_build2_loc (loc, TREE_CODE (op0), type, + fold_convert_loc (loc, type, + TREE_OPERAND (op0, 0)), + fold_convert_loc (loc, type, + TREE_OPERAND (op0, 1))); + return fold_convert_loc (loc, type, tem); + } + tem = fold_convert_const (code, type, op0); return tem ? tem : NULL_TREE;
[PATCH] Don't forwprop into clobbers in some cases (PR tree-optimization/56854)
Hi! lhs ={v} {CLOBBER}; stmts right now allow only VAR_DECL or MEM_REF lhs, but the forwprop code below on the attached testcase attempts to propagate an ARRAY_REF (of MEM_REF) into it. Fixed by not propagating in that case, allowing arbitrary memory lhs is IMHO unnecessary and such lhs's wouldn't be very useful for DSE anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-04-08 Jakub Jelinek ja...@redhat.com PR tree-optimization/56854 * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Don't forward into clobber stmts if it would change MEM_REF lhs into non-MEM_REF. * g++.dg/torture/pr56854.C: New test. --- gcc/tree-ssa-forwprop.c.jj 2013-02-25 23:51:21.0 +0100 +++ gcc/tree-ssa-forwprop.c 2013-04-08 16:12:37.0 +0200 @@ -826,7 +826,11 @@ forward_propagate_addr_expr_1 (tree name integer_zerop (TREE_OPERAND (lhs, 1)) useless_type_conversion_p (TREE_TYPE (TREE_OPERAND (def_rhs, 0)), -TREE_TYPE (gimple_assign_rhs1 (use_stmt +TREE_TYPE (gimple_assign_rhs1 (use_stmt))) + /* Don't forward anything into clobber stmts if it would result + in the lhs no longer being a MEM_REF. */ + (!gimple_clobber_p (use_stmt) + || TREE_CODE (TREE_OPERAND (def_rhs, 0)) == MEM_REF)) { tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0); tree new_offset, new_base, saved, new_lhs; --- gcc/testsuite/g++.dg/torture/pr56854.C.jj 2013-04-08 18:03:37.978009666 +0200 +++ gcc/testsuite/g++.dg/torture/pr56854.C 2013-04-08 18:03:09.0 +0200 @@ -0,0 +1,24 @@ +// PR tree-optimization/56854 +// { dg-do compile } + +inline void * +operator new (__SIZE_TYPE__, void *p) throw () +{ + return p; +} + +struct A +{ + int a; + A () : a (0) {} + ~A () {} + A operator= (const A v) { this-~A (); new (this) A (v); return *this; } +}; +A b[4], c[4]; + +void +foo () +{ + for (int i = 0; i 4; ++i) +c[i] = b[i]; +} Jakub
[linaro/gcc-4_8-branch] Merge from upstream gcc-4_8-branch and backports from trunk
Hi, I have just merge upstream gcc-4_8-branch into linaro/gcc-4_8-branch, up to r197294. (The merge is r197598.) I have also backported the following trunk revisions into the linaro/gcc-4_8-branch: 196856, 196858, 196876, 197046, 197051, 197052, 197153, 197207, 197341, 197342, and 197346. (Backports are revisions 197599:197609). Thanks, Matt -- Matthew Gretton-Dann Toolchain Working Group, Linaro
[patch, fortran] Committed fix for PR 56782
Hello world, I committed the attached patch as obvious to fix the regression with array constructors on trunk, after regression-testing. Will commit to 4.8 next. Thomas 2013-04-08 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/56782 * frontend-passes.c (callback_reduction): Dont't do any simplification if there is only a single element which has an iterator. 2013-04-08 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/56782 * gfortran.dg/array_constructor_44.f90: New test. Index: frontend-passes.c === --- frontend-passes.c (Revision 197233) +++ frontend-passes.c (Arbeitskopie) @@ -300,7 +300,12 @@ callback_reduction (gfc_expr **e, int *walk_subtre c = gfc_constructor_first (arg-value.constructor); - if (c == NULL) + /* Don't do any simplififcation if we have + - no element in the constructor or + - only have a single element in the array which contains an + iterator. */ + + if (c == NULL || (c-iterator != NULL gfc_constructor_next (c) == NULL)) return 0; res = copy_walk_reduction_arg (c-expr, fn); ! { dg-do run } ! { dg-options -ffrontend-optimize } ! PR 56872 - wrong front-end optimization with a single constructor. ! Original bug report by Rich Townsend. integer :: k real :: s integer :: m s = 2.0 m = 4 res = SUM([(s**(REAL(k-1)/REAL(m-1)),k=1,m)]) if (abs(res - 5.84732246) 1e-6) call abort end
Re: useless cast blocking some optimization in gcc 4.7.3
Hello, On Mon, 8 Apr 2013, Laurent Alfonsi wrote: I have identified a big performance regression between 4.6 and 4.7. (I have enclosed a pathological test). After investigation, it is because of the += statement applied on 2 signed chars. - It is now type-promoted to int when it is written result += foo(). (since 4.7) - it is type promoted to unsigned char when it is written result = result + foo(). The char-int-char cast is blocking some optimizations in later phases. Which ones? Anyway, this doesn't look wrong, so I extended fold optimization in order to catch this case. (patch enclosed) The patch basically transforms : (TypeA) ( (TypeB) a1 + (TypeB) a2 )/* with a1 and a2 of the signed type TypeA */ into : a1 + a2 I believe this is legal for any licit a1/a2 input values (no overflow on signed char). I don't think this is ok, please refer to the discussion around the PR and patch that added this conversion, it was done on purpose. According to this (4th item) http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html char a=100; a+=a; is perfectly defined and a is -56 (assuming a signed 8 bit char and a strictly larger int). However, your transformation turns it into undefined behavior: an addition that overflows in a type with TYPE_OVERFLOW_UNDEFINED. -- Marc Glisse
Make lto-symtab to ignore conflicts in static functions
Hi, currently lto-symtab is trying to resolve all duplicated declarations, including static variables where such duplicates should not happen. This conflicts with the plan to solve PR54095 by postponning renaming to the partitioning. This patch adds lto_symtab_symbol_p that disable merging on statics and keeps duplicate entries for a given asm name. Boostrapped/regtested x86_64-linux, OK? Honza PR lto/54095 lto-symtab.c (lto_symtab_symbol_p): New function. (lto_symtab_resolve_can_prevail_p, lto_symtab_resolve_symbols, lto_symtab_resolve_symbols, lto_symtab_merge_decls_2, lto_symtab_merge_decls_1, lto_symtab_merge_cgraph_nodes_1): Skip static symbols. Index: lto-symtab.c === *** lto-symtab.c(revision 197551) --- lto-symtab.c(working copy) *** lto_symtab_resolve_replaceable_p (symtab *** 226,237 return false; } /* Return true if the symtab entry E can be the prevailing one. */ static bool lto_symtab_resolve_can_prevail_p (symtab_node e) { ! if (!symtab_real_symbol_p (e)) return false; /* The C++ frontend ends up neither setting TREE_STATIC nor --- 226,249 return false; } + /* Return true, if the symbol E should be resolved by lto-symtab. +Those are all real symbols that are not static (we handle renaming +of static later in partitioning). */ + + static bool + lto_symtab_symbol_p (symtab_node e) + { + if (!TREE_PUBLIC (e-symbol.decl)) + return false; + return symtab_real_symbol_p (e); + } + /* Return true if the symtab entry E can be the prevailing one. */ static bool lto_symtab_resolve_can_prevail_p (symtab_node e) { ! if (!lto_symtab_symbol_p (e)) return false; /* The C++ frontend ends up neither setting TREE_STATIC nor *** lto_symtab_resolve_symbols (symtab_node *** 261,267 /* Always set e-node so that edges are updated to reflect decl merging. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) ! if (symtab_real_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) --- 273,279 /* Always set e-node so that edges are updated to reflect decl merging. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) ! if (lto_symtab_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) *** lto_symtab_resolve_symbols (symtab_node *** 275,281 { /* Assert it's the only one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! if (symtab_real_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) --- 287,293 { /* Assert it's the only one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! if (lto_symtab_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) *** lto_symtab_resolve_symbols (symtab_node *** 310,317 /* Do a second round choosing one from the replaceable prevailing decls. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) { ! if (!lto_symtab_resolve_can_prevail_p (e) ! || !symtab_real_symbol_p (e)) continue; /* Choose the first function that can prevail as prevailing. */ --- 322,328 /* Do a second round choosing one from the replaceable prevailing decls. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) { ! if (!lto_symtab_resolve_can_prevail_p (e)) continue; /* Choose the first function that can prevail as prevailing. */ *** lto_symtab_merge_decls_2 (symtab_node fi *** 365,375 /* Try to merge each entry with the prevailing one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! { ! if (!lto_symtab_merge (prevailing, e) ! !diagnosed_p) ! mismatches.safe_push (e-symbol.decl); ! } if (mismatches.is_empty ()) return; --- 376,387 /* Try to merge each entry with the prevailing one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! if (TREE_PUBLIC (e-symbol.decl)) ! { ! if (!lto_symtab_merge (prevailing, e) !
Re: [patch] update documentation for SEQUENCE
On Mon, Apr 8, 2013 at 11:30 AM, Richard Biener wrote: On Sun, Apr 7, 2013 at 12:04 AM, Steven Bosscher wrote: Hello, The existing documentation for SEQUENCE still states it is used for DEFINE_EXPAND sequences. I think I wasn't even hacking GCC when that practice was abandoned, and in the mean time some other uses of SEQUENCE have appeared in the compiler. So, a long-overdue documentation update. OK for trunk? Ok. Thanks, I'm committing this along with something else I noticed: NOTE_INSN_LOOP notes don't exist anymore, and NOTE_INSN_EH_REGION notes don't have NOTE_BLOCK_NUMBER anymore but do have so-far-undocumented NOTE_EH_HANDLER. @@ -3602,29 +3608,9 @@ of debugging information. @item NOTE_INSN_EH_REGION_BEG @itemx NOTE_INSN_EH_REGION_END These types of notes indicate the position of the beginning and end of a -level of scoping for exception handling. @code{NOTE_BLOCK_NUMBER} -identifies which @code{CODE_LABEL} or @code{note} of type -@code{NOTE_INSN_DELETED_LABEL} is associated with the given region. +level of scoping for exception handling. @code{NOTE_EH_HANDLER} +identifies which region is associated with these notes. -@findex NOTE_INSN_LOOP_BEG -@findex NOTE_INSN_LOOP_END -@item NOTE_INSN_LOOP_BEG -@itemx NOTE_INSN_LOOP_END -These types of notes indicate the position of the beginning and end -of a @code{while} or @code{for} loop. They enable the loop optimizer -to find loops quickly. - -@findex NOTE_INSN_LOOP_CONT -@item NOTE_INSN_LOOP_CONT -Appears at the place in a loop that @code{continue} statements jump to. - -@findex NOTE_INSN_LOOP_VTOP -@item NOTE_INSN_LOOP_VTOP -This note indicates the place in a loop where the exit test begins for -those loops in which the exit test has been duplicated. This position -becomes another virtual start of the loop when considering loop -invariants. - @findex NOTE_INSN_FUNCTION_BEG @item NOTE_INSN_FUNCTION_BEG Appears at the start of the function body, after the function
[patch] obvious: remove REG_EH_CONTEXT note
Remnants of the RTL inliner... Committed as obvious. * reg-notes.def (REG_EH_CONTEXT): Remove unused note. --- trunk/gcc/reg-notes.def 2013/04/08 19:36:43 197610 +++ trunk/gcc/reg-notes.def 2013/04/08 19:59:57 197611 @@ -172,11 +172,6 @@ the rest of the compiler as a CALL_INSN. */ REG_NOTE (CFA_FLUSH_QUEUE) -/* Indicates that REG holds the exception context for the function. - This context is shared by inline functions, so the code to acquire - the real exception context is delayed until after inlining. */ -REG_NOTE (EH_CONTEXT) - /* Indicates what exception region an INSN belongs in. This is used to indicate what region to which a call may throw. REGION 0 indicates that a call cannot throw at all. REGION -1 indicates
Tract symbol names that are unique in DSO
Hi, this patch adds a new symbol flag UNIQUE_NAME. Its purpose is to disable renaming at LTO time when the symbol is already known to be unique in the whole resulting DSO. This happens for symbols that was previously global and we know from LTO plugin resolution data that they are not bound by non-LTO world. I also made clones to be unique. This needs more care. 1) when clonning at compilation time, one can produce two clones of same name (like foo.sra.1) for static functions. 2) we make an assumption here that the namespace .clonetype.num is private for GCC. This is how things works since introduction of WHOPR, but it is not documented. We may need to add those ugly __GLOBAL_XYZ manglings. I would like to handle 2) incrementally after some discussion with plugin folks. The flag is currently write only, I am going to use by later patch. Bootstrapped/regtested x86_64-linux, will commit it after we settle on the ohter changes that needs the flag. Honza PR lto/54095 * cgraph.c (cgraph_make_node_local_1): Se unique_name. * cgraph.h (symtab_node_base): Add unique_name. * lto-cgraph.c (lto_output_node, lto_output_varpool_node, input_overwrite_node, input_varpool_node): Stream unique_name. * cgraphclones.c (cgraph_create_virtual_clone, cgraph_function_versioning): Set unique_name. * ipa.c (function_and_variable_visibility): Set unique_name. Index: cgraph.c === *** cgraph.c(revision 197551) --- cgraph.c(working copy) *** cgraph_make_node_local_1 (struct cgraph_ *** 1798,1803 --- 1800,1807 node-symbol.externally_visible = false; node-local.local = true; + node-symbol.unique_name = (node-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY + || node-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP); node-symbol.resolution = LDPR_PREVAILING_DEF_IRONLY; gcc_assert (cgraph_function_body_availability (node) == AVAIL_LOCAL); } Index: cgraph.h === *** cgraph.h(revision 197551) --- cgraph.h(working copy) *** struct GTY(()) symtab_node_base *** 62,67 --- 62,69 /* Needed variables might become dead by optimization. This flag forces the variable to be output even if it appears dead otherwise. */ unsigned force_output : 1; + /* True when the name is known to be unique and thus it does not need mangling. */ + unsigned unique_name : 1; /* Ordering of all symtab entries. */ int order; Index: lto-cgraph.c === *** lto-cgraph.c(revision 197551) --- lto-cgraph.c(working copy) *** lto_output_node (struct lto_simple_outpu *** 468,473 --- 468,474 bp_pack_value (bp, node-local.can_change_signature, 1); bp_pack_value (bp, node-local.redefined_extern_inline, 1); bp_pack_value (bp, node-symbol.force_output, 1); + bp_pack_value (bp, node-symbol.unique_name, 1); bp_pack_value (bp, node-symbol.address_taken, 1); bp_pack_value (bp, node-abstract_and_needed, 1); bp_pack_value (bp, tag == LTO_symtab_analyzed_node *** lto_output_varpool_node (struct lto_simp *** 533,538 --- 534,540 bp = bitpack_create (ob-main_stream); bp_pack_value (bp, node-symbol.externally_visible, 1); bp_pack_value (bp, node-symbol.force_output, 1); + bp_pack_value (bp, node-symbol.unique_name, 1); bp_pack_value (bp, node-finalized, 1); bp_pack_value (bp, node-alias, 1); bp_pack_value (bp, node-alias_of != NULL, 1); *** input_overwrite_node (struct lto_file_de *** 886,891 --- 888,894 node-local.can_change_signature = bp_unpack_value (bp, 1); node-local.redefined_extern_inline = bp_unpack_value (bp, 1); node-symbol.force_output = bp_unpack_value (bp, 1); + node-symbol.unique_name = bp_unpack_value (bp, 1); node-symbol.address_taken = bp_unpack_value (bp, 1); node-abstract_and_needed = bp_unpack_value (bp, 1); node-symbol.used_from_other_partition = bp_unpack_value (bp, 1); *** input_varpool_node (struct lto_file_decl *** 1040,1045 --- 1043,1049 bp = streamer_read_bitpack (ib); node-symbol.externally_visible = bp_unpack_value (bp, 1); node-symbol.force_output = bp_unpack_value (bp, 1); + node-symbol.unique_name = bp_unpack_value (bp, 1); node-finalized = bp_unpack_value (bp, 1); node-alias = bp_unpack_value (bp, 1); non_null_aliasof = bp_unpack_value (bp, 1); Index: cgraphclones.c === *** cgraphclones.c (revision 197551) --- cgraphclones.c (working copy) *** cgraph_create_virtual_clone (struct cgra *** 324,329 --- 324,337
Make change_decl_assembler_name functional with inline clones
Hi, this patch makes change_decl_assembler_name to do the right thing with inline clones. My original plan was to remove inline clones from assembler_name_hash, but it hits the problem that we currently need to make them unique for purposes of LTO sreaming. It is not hard to walk the clone tree and update it. Later we can reorg streaming to not rely on uniqueness of symbol names of function bodies not associated with a real symbol and perhaps simplify this somewhat. Bootstrapped/regtested x86_64-linux, will commit it shortly. PR lto/54095 * symtab.c (insert_to_assembler_name_hash): Handle clones. (unlink_from_assembler_name_hash): Likewise. (symtab_prevail_in_asm_name_hash, symtab_register_node, symtab_unregister_node, symtab_initialize_asm_name_hash, change_decl_assembler_name): Update. Index: symtab.c === *** symtab.c(revision 197551) --- symtab.c(working copy) *** eq_assembler_name (const void *p1, const *** 102,108 /* Insert NODE to assembler name hash. */ static void ! insert_to_assembler_name_hash (symtab_node node) { if (is_a varpool_node (node) DECL_HARD_REGISTER (node-symbol.decl)) return; --- 102,108 /* Insert NODE to assembler name hash. */ static void ! insert_to_assembler_name_hash (symtab_node node, bool with_clones) { if (is_a varpool_node (node) DECL_HARD_REGISTER (node-symbol.decl)) return; *** insert_to_assembler_name_hash (symtab_no *** 111,116 --- 111,119 if (assembler_name_hash) { void **aslot; + struct cgraph_node *cnode; + tree decl = node-symbol.decl; + tree name = DECL_ASSEMBLER_NAME (node-symbol.decl); aslot = htab_find_slot_with_hash (assembler_name_hash, name, *** insert_to_assembler_name_hash (symtab_no *** 121,126 --- 124,136 if (*aslot != NULL) ((symtab_node)*aslot)-symbol.previous_sharing_asm_name = node; *aslot = node; + + /* Update also possible inline clones sharing a decl. */ + cnode = dyn_cast cgraph_node (node); + if (cnode cnode-clones with_clones) + for (cnode = cnode-clones; cnode; cnode = cnode-next_sibling_clone) + if (cnode-symbol.decl == decl) + insert_to_assembler_name_hash ((symtab_node) cnode, true); } } *** insert_to_assembler_name_hash (symtab_no *** 128,137 /* Remove NODE from assembler name hash. */ static void ! unlink_from_assembler_name_hash (symtab_node node) { if (assembler_name_hash) { if (node-symbol.next_sharing_asm_name) node-symbol.next_sharing_asm_name-symbol.previous_sharing_asm_name = node-symbol.previous_sharing_asm_name; --- 138,150 /* Remove NODE from assembler name hash. */ static void ! unlink_from_assembler_name_hash (symtab_node node, bool with_clones) { if (assembler_name_hash) { + struct cgraph_node *cnode; + tree decl = node-symbol.decl; + if (node-symbol.next_sharing_asm_name) node-symbol.next_sharing_asm_name-symbol.previous_sharing_asm_name = node-symbol.previous_sharing_asm_name; *** unlink_from_assembler_name_hash (symtab_ *** 155,160 --- 168,180 } node-symbol.next_sharing_asm_name = NULL; node-symbol.previous_sharing_asm_name = NULL; + + /* Update also possible inline clones sharing a decl. */ + cnode = dyn_cast cgraph_node (node); + if (cnode cnode-clones with_clones) + for (cnode = cnode-clones; cnode; cnode = cnode-next_sibling_clone) + if (cnode-symbol.decl == decl) + unlink_from_assembler_name_hash ((symtab_node) cnode, true); } } *** unlink_from_assembler_name_hash (symtab_ *** 163,170 void symtab_prevail_in_asm_name_hash (symtab_node node) { ! unlink_from_assembler_name_hash (node); ! insert_to_assembler_name_hash (node); } --- 183,190 void symtab_prevail_in_asm_name_hash (symtab_node node) { ! unlink_from_assembler_name_hash (node, false); ! insert_to_assembler_name_hash (node, false); } *** symtab_register_node (symtab_node node) *** 196,202 /* Be sure to do this last; C++ FE might create new nodes via DECL_ASSEMBLER_NAME langhook! */ ! insert_to_assembler_name_hash (node); } /* Make NODE to be the one symtab hash is pointing to. Used when reshaping tree --- 216,222 /* Be sure to do this last; C++ FE might create new nodes via DECL_ASSEMBLER_NAME langhook! */ ! insert_to_assembler_name_hash (node, false); } /* Make NODE to be the one symtab hash is pointing to. Used when reshaping tree *** symtab_unregister_node (symtab_node node *** 259,265 else *slot =
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/13, Kenneth Zadeck zad...@naturalbridge.com wrote: The other problem, which i invite you to use the full power of your c++ sorcery on, is the one where defining an operator so that wide-int + unsigned hwi is either rejected or properly zero extended. If you can do this, I will go along with your suggestion that the internal rep should be sign extended. Saying that constants are always sign extended seems ok, but there are a huge number of places where we convert unsigned hwis as the second operand and i do not want that to be a trap. I went thru a round of this, where i did not post the patch because i could not make this work. And the number of places where you want to use an hwi as the second operand dwarfs the number of places where you want to use a small integer constant. You can use overloading, as in the following, which actually ignores handling the sign in the representation. class number { unsigned int rep1; int representation; public: number(int arg) : representation(arg) {} number(unsigned int arg) : representation(arg) {} friend number operator+(number, int); friend number operator+(number, unsigned int); friend number operator+(int, number); friend number operator+(unsigned int, number); }; number operator+(number n, int si) { return n.representation + si; } number operator+(number n, unsigned int ui) { return n.representation + ui; } number operator+(int si, number n) { return n.representation + si; } number operator+(unsigned int ui, number n) { return n.representation + ui; } If the argument type is of a template type parameter, then you can test the template type via if (std::is_signedT::value) // sign extend else // zero extend See http://www.cplusplus.com/reference/type_traits/is_signed/. If you want to handle non-builtin types that are asigne dor unsigned, then you need to add a specialization for is_signed. -- Lawrence Crowl
Re: [i386] Replace builtins with vector extensions
On Sun, 7 Apr 2013, Marc Glisse wrote: extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_epi16 (__m128i __A, int __B) { - return (__m128i)__builtin_ia32_psllwi128 ((__v8hi)__A, __B); + return (__m128i) ((__v8hi)__A __B); } Actually, I believe I have to keep using the builtins for shifts, because the intrinsics have well defined behavior for large __B whereas and don't. -- Marc Glisse
Re: RFC: add some static probes to libstdc++
Jonathan == Jonathan Wakely jwakely@gmail.com writes: Jonathan On 2 April 2013 16:39, Marc Glisse wrote: On Tue, 2 Apr 2013, Jonathan Wakely wrote: Should we update the prerequisites documentation to say that if Systemtap is installed it needs to be at least version X? I thought you were going to suggest enhancing the configure test so it fails on old systemtap (detects it as absent). Jonathan Ah yes, that's a much better idea! Sorry about the delay on this. I've been away. I will try to write a fix this week. Tom
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/13, Richard Biener richard.guent...@gmail.com wrote: I advocate the infinite precision signed representation as one solution to avoid the issues that come up with your implementation (as I currently have access to) which has a representation with N bits of precision encoded with M = N bits and no sign information. That obviously leaves operations on numbers of that representation with differing N undefined. You define it by having coded the operations which as far as I can see simply assume N is equal for any two operands and the effective sign for extending the M-bits encoding to the common N-bits precision is available. A thorough specification of both the encoding scheme and the operation semantics is missing. I can side-step both of these issues nicely by simply using a infinite precision signed representation and requiring the client to explicitely truncate / extend to a specific precision when required. I also leave open the possibility to have the _encoding_ be always the same as an infinite precision signed representation but to always require an explicitely specified target precision for each operation (which rules out the use of operator overloading). For efficiency, the machine representation of an infinite precision number should allow for a compact one-word representation. class infinite { int length; union representation { int inside_word; int *outside_words; } field; public: int mod_one_word() { if (length == 1) return field.inside_word; else return field.outside_word[0]; } }; Also for efficiency, you want to know the modulus at the time you do the last normal operation on it, not as a subsequent operation. Citing your example: 8 * 10 / 4 and transforming it slightly into a commonly used pattern: (byte-size * 8 + bit-size) / 8 then I argue that what people want here is this carried out in _infinite_ precision! But what people want isn't really relevant, what is relevant is what the language and/or compatiblity requires. Ideally, gcc should accurately represent languages with both finite size and infinite size. Even if byte-size happens to come from a sizetype TREE_INT_CST with 64bit precision. So either choice - having a fixed-precision representation or an infinite-precision representation - can and will lead to errors done by the programmer. And as you can easily build a finite precision wrapper around an infinite precision implementation but not the other way around it's obvious to me what the implementation should provide. IIUC, the issue here is not the logical chain of implementation, but the interface that is most helpful to the programmers in getting to performant, correct code. I expect we need the infinite precision forms, but also that having more concise coding for fixed-precision would be helpful. For mixed operations, all the languages that I know of promote smaller operands to larger operands, so I think a reasonable definition is possible here. -- Lawrence Crowl
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/13, Robert Dewar de...@adacore.com wrote: On 4/8/2013 10:26 AM, Kenneth Zadeck wrote: On 04/08/2013 10:12 AM, Robert Dewar wrote: On 4/8/2013 9:58 AM, Kenneth Zadeck wrote: yes but the relevant question for the not officially static integer constants is in what precision are those operations to be performed in? I assume that you choose gcc types for these operations and you expect the math to be done within that type, i.e. exactly the way you expect the machine to perform. As I explained in an earlier message, *within* a single expression, we are free to use higher precision, and we provide modes that allow this up to and including the usea of infinite precision. That applies not just to constant expressions but to all expressions. My confusion is what you mean by we? Do you mean we the writer of the program, we the person invoking the compiler by the use command line options or we, your company's implementation of ada? Sorry, bad usage, The gcc implementation of Ada allows the user to specify by pragmas how intermediate overflow is handled. Correct me if I'm wrong, but the Ada standard doesn't require any particular maximum evaluation precision, but only that you get an exception if the values exceed the chosen maximum. My interpretation of your first email was that it was possible for the programmer to do something equivalent to adding attributes surrounding a block in the program to control the precision and overflow detection of the expressions in the block. And if this is so, then by the time the expression is seen by the middle end of gcc, those attributes will have been converted into tree code will evaluate the code in a well defined way by both the optimization passes and the target machine. Yes, that's a correct understanding In essence, you have moved some of the optimization from the back end to the front end. Correct? -- Lawrence Crowl
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 5:12 PM, Lawrence Crowl wrote: (BTW, you *really* don't need to quote entire messages, I find it rather redundant for the entire thread to be in every message, we all have thread following mail readers!) Correct me if I'm wrong, but the Ada standard doesn't require any particular maximum evaluation precision, but only that you get an exception if the values exceed the chosen maximum. Right, that's at run-time, at compile-time for static expressions, infinite precision is required. But at run-time, all three of the modes we provide are standard conforming. In essence, you have moved some of the optimization from the back end to the front end. Correct? Sorry, I don't quite understand that. If you are syaing that the back end could handle this widening for intermediate values, sure it could, this is the kind of thing that can be done at various different places.
Re: [patch] Hash table changes from cxx-conversion branch
Ping? On 3/31/13, Lawrence Crowl cr...@googlers.com wrote: On 3/28/13, Richard Biener richard.guent...@gmail.com wrote: On Mar 27, 2013 Lawrence Crowl cr...@googlers.com wrote: On 3/27/13, Richard Biener richard.guent...@gmail.com wrote: On Mar 23, 2013 Lawrence Crowl cr...@googlers.com wrote: This patch is a consolodation of the hash_table patches to the cxx-conversion branch. Update various hash tables from htab_t to hash_table. Modify types and calls to match. Ugh. Can you split it up somewhat ... like split target bits away at least? Targets may prefer to keep the old hashes for ease of branch maintainance. I will do that. * tree-ssa-live.c'var_map_base_init::tree_to_index New struct tree_int_map_hasher. I think this wants to be generalized - we have the common tree_map/tree_decl_map and tree_int_map maps in tree.h - those (and its users) should be tackled in a separate patch by providing common hashtable trails implementations. I will investigate for a separate patch. Remove unused: htab_t scop::original_pddrs SCOP_ORIGINAL_PDDRS Remove unused: insert_loop_close_phis insert_guard_phis debug_ivtype_map ivtype_map_elt_info new_ivtype_map_elt Unused function/type removal are obvious changes. Remove unused: dse.c bitmap clear_alias_sets dse.c bitmap disqualified_clear_alias_sets dse.c alloc_pool clear_alias_mode_pool dse.c dse_step2_spill dse.c dse_step5_spill graphds.h htab_t graph::indices See above. It wasn't obvious that the functions could be removed. :-) Are you saying you don't want these notations in the description? No, I was saying that removal of unused functions / types should be committed separately and do not need approval as they are obvious. If they are not obvious (I didn't look at that patch part), then posting separately still helps ;) I've split out the removals to separate patches. The remaining work is in two independent pieces. The changes within the config directory and the changes outside that directory. The descriptions and patch are attached compressed due to mailer size issues. Okay for trunk? -- Lawrence Crowl -- Lawrence Crowl
Re: Comments on the suggestion to use infinite precision math for wide int.
In some sense you have to think in terms of three worlds: 1) what you call compile-time static expressions is one world which in gcc is almost always done by the front ends. 2) the second world is what the optimizers can do. This is not compile-time static expressions because that is what the front end has already done. 3) there is run time. My view on this is that optimization is just doing what is normally done at run time but doing it early. From that point of view, we are if not required, morally obligated to do thing in the same way that the hardware would have done them.This is why i am so against richi on wanting to do infinite precision.By the time the middle or the back end sees the representation, all of the things that are allowed to be done in infinite precision have already been done. What we are left with is a (mostly) strongly typed language that pretty much says exactly what must be done. Anything that we do in the middle end or back ends in infinite precision will only surprise the programmer and make them want to use llvm. Kenny On 04/08/2013 05:36 PM, Robert Dewar wrote: On 4/8/2013 5:12 PM, Lawrence Crowl wrote: (BTW, you *really* don't need to quote entire messages, I find it rather redundant for the entire thread to be in every message, we all have thread following mail readers!) Correct me if I'm wrong, but the Ada standard doesn't require any particular maximum evaluation precision, but only that you get an exception if the values exceed the chosen maximum. Right, that's at run-time, at compile-time for static expressions, infinite precision is required. But at run-time, all three of the modes we provide are standard conforming. In essence, you have moved some of the optimization from the back end to the front end. Correct? Sorry, I don't quite understand that. If you are syaing that the back end could handle this widening for intermediate values, sure it could, this is the kind of thing that can be done at various different places.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 5:46 PM, Kenneth Zadeck wrote: In some sense you have to think in terms of three worlds: 1) what you call compile-time static expressions is one world which in gcc is almost always done by the front ends. 2) the second world is what the optimizers can do. This is not compile-time static expressions because that is what the front end has already done. 3) there is run time. My view on this is that optimization is just doing what is normally done at run time but doing it early. From that point of view, we are if not required, morally obligated to do thing in the same way that the hardware would have done them.This is why i am so against richi on wanting to do infinite precision.By the time the middle or the back end sees the representation, all of the things that are allowed to be done in infinite precision have already been done. What we are left with is a (mostly) strongly typed language that pretty much says exactly what must be done. Anything that we do in the middle end or back ends in infinite precision will only surprise the programmer and make them want to use llvm. That may be so in C, in Ada it would be perfectly reasonable to use infinite precision for intermediate results in some cases, since the language standard specifically encourages this approach.
[google gcc-4_7] offline profile merge tool (issue8508048)
Hi, This is a offline profile merge program. Usage: profile_merge.py [options] arg1 arg2 ... Options: -h, --helpshow this help message and exit -w MULTIPLIERS, --multipliers=MULTIPLIERS Comma separated list of multipliers to be applied for each corresponding profile. -o OUTPUT_PROFILE, --output=OUTPUT_PROFILE Output directory or zip file to dump the merged profile. Default output is profile-merged.zip. Arguments: Comma separated list of input directories or zip files that contain profile data to merge. Histogram is recomputed (i.e. preicise). Module grouping information in LIPO is approximation. Thanks, -Rong 2013-04-08 Rong Xu x...@google.com * contrib/profile_merge.py: An offline profile merge tool. Index: contrib/profile_merge.py === --- contrib/profile_merge.py(revision 0) +++ contrib/profile_merge.py(revision 0) @@ -0,0 +1,1301 @@ +#!/usr/bin/python2.7 +# +# Copyright 2013 Google Inc. All Rights Reserved. + +Merge two or more gcda profile. + + +__author__ = 'Seongbae Park, Rong Xu' +__author_email__ = 'sp...@google.com, x...@google.com' + +import array +from optparse import OptionGroup +from optparse import OptionParser +import os +import struct +import zipfile + +new_histogram = None + + +class Error(Exception): + Exception class for profile module. + + +def ReadAllAndClose(path): + Return the entire byte content of the specified file. + + Args: +path: The path to the file to be opened and read. + + Returns: +The byte sequence of the content of the file. + + data_file = open(path, 'rb') + data = data_file.read() + data_file.close() + return data + + +def MergeCounters(objs, index, multipliers): + Accumulate the counter at index from all counters objs. + val = 0 + for j in xrange(len(objs)): +val += multipliers[j] * objs[j].counters[index] + return val + + +class DataObject(object): + Base class for various datum in GCDA/GCNO file. + + def __init__(self, tag): +self.tag = tag + + +class Function(DataObject): + Function and its counters. + + Attributes: +length: Length of the data on the disk +ident: Ident field +line_checksum: Checksum of the line number +cfg_checksum: Checksum of the control flow graph +counters: All counters associated with the function +file: The name of the file the function is defined in. Optional. +line: The line number the function is defined at. Optional. + + Function object contains other counter objects and block/arc/line objects. + + + def __init__(self, reader, tag, n_words): +Read function record information from a gcda/gcno file. + +Args: + reader: gcda/gcno file. + tag: funtion tag. + n_words: length of function record in unit of 4-byte. + +DataObject.__init__(self, tag) +self.length = n_words +self.counters = [] + +if reader: + pos = reader.pos + self.ident = reader.ReadWord() + self.line_checksum = reader.ReadWord() + self.cfg_checksum = reader.ReadWord() + + # Function name string is in gcno files, but not + # in gcda files. Here we make string reading optional. + if (reader.pos - pos) n_words: +reader.ReadStr() + + if (reader.pos - pos) n_words: +self.file = reader.ReadStr() +self.line_number = reader.ReadWord() + else: +self.file = '' +self.line_number = 0 +else: + self.ident = 0 + self.line_checksum = 0 + self.cfg_checksum = 0 + self.file = None + self.line_number = 0 + + def Write(self, writer): +Write out the function. + +writer.WriteWord(self.tag) +writer.WriteWord(self.length) +writer.WriteWord(self.ident) +writer.WriteWord(self.line_checksum) +writer.WriteWord(self.cfg_checksum) +for c in self.counters: + c.Write(writer) + + def EntryCount(self): +Return the number of times the function called. +return self.ArcCounters().counters[0] + + def Merge(self, others, multipliers): +Merge all functions in others into self. + +Args: + others: A sequence of Function objects + multipliers: A sequence of integers to be multiplied during merging. + +for o in others: + assert self.ident == o.ident + assert self.line_checksum == o.line_checksum + assert self.cfg_checksum == o.cfg_checksum + +for i in xrange(len(self.counters)): + self.counters[i].Merge([o.counters[i] for o in others], multipliers) + + def Print(self): +Print all the attributes in full detail. +print 'function: ident %d length %d line_chksum %x cfg_chksum %x' % ( +self.ident, self.length, +self.line_checksum, self.cfg_checksum) +if self.file: + print 'file: %s' % self.file + print 'line_number: %d' % self.line_number +for c in
Re: [google gcc-4_7] offline profile merge tool (issue8508048)
The copyright header is wrong. Please use the standard one for GCC. David On Mon, Apr 8, 2013 at 2:57 PM, Rong Xu x...@google.com wrote: Hi, This is a offline profile merge program. Usage: profile_merge.py [options] arg1 arg2 ... Options: -h, --helpshow this help message and exit -w MULTIPLIERS, --multipliers=MULTIPLIERS Comma separated list of multipliers to be applied for each corresponding profile. -o OUTPUT_PROFILE, --output=OUTPUT_PROFILE Output directory or zip file to dump the merged profile. Default output is profile-merged.zip. Arguments: Comma separated list of input directories or zip files that contain profile data to merge. Histogram is recomputed (i.e. preicise). Module grouping information in LIPO is approximation. Thanks, -Rong 2013-04-08 Rong Xu x...@google.com * contrib/profile_merge.py: An offline profile merge tool. Index: contrib/profile_merge.py === --- contrib/profile_merge.py(revision 0) +++ contrib/profile_merge.py(revision 0) @@ -0,0 +1,1301 @@ +#!/usr/bin/python2.7 +# +# Copyright 2013 Google Inc. All Rights Reserved. + +Merge two or more gcda profile. + + +__author__ = 'Seongbae Park, Rong Xu' +__author_email__ = 'sp...@google.com, x...@google.com' + +import array +from optparse import OptionGroup +from optparse import OptionParser +import os +import struct +import zipfile + +new_histogram = None + + +class Error(Exception): + Exception class for profile module. + + +def ReadAllAndClose(path): + Return the entire byte content of the specified file. + + Args: +path: The path to the file to be opened and read. + + Returns: +The byte sequence of the content of the file. + + data_file = open(path, 'rb') + data = data_file.read() + data_file.close() + return data + + +def MergeCounters(objs, index, multipliers): + Accumulate the counter at index from all counters objs. + val = 0 + for j in xrange(len(objs)): +val += multipliers[j] * objs[j].counters[index] + return val + + +class DataObject(object): + Base class for various datum in GCDA/GCNO file. + + def __init__(self, tag): +self.tag = tag + + +class Function(DataObject): + Function and its counters. + + Attributes: +length: Length of the data on the disk +ident: Ident field +line_checksum: Checksum of the line number +cfg_checksum: Checksum of the control flow graph +counters: All counters associated with the function +file: The name of the file the function is defined in. Optional. +line: The line number the function is defined at. Optional. + + Function object contains other counter objects and block/arc/line objects. + + + def __init__(self, reader, tag, n_words): +Read function record information from a gcda/gcno file. + +Args: + reader: gcda/gcno file. + tag: funtion tag. + n_words: length of function record in unit of 4-byte. + +DataObject.__init__(self, tag) +self.length = n_words +self.counters = [] + +if reader: + pos = reader.pos + self.ident = reader.ReadWord() + self.line_checksum = reader.ReadWord() + self.cfg_checksum = reader.ReadWord() + + # Function name string is in gcno files, but not + # in gcda files. Here we make string reading optional. + if (reader.pos - pos) n_words: +reader.ReadStr() + + if (reader.pos - pos) n_words: +self.file = reader.ReadStr() +self.line_number = reader.ReadWord() + else: +self.file = '' +self.line_number = 0 +else: + self.ident = 0 + self.line_checksum = 0 + self.cfg_checksum = 0 + self.file = None + self.line_number = 0 + + def Write(self, writer): +Write out the function. + +writer.WriteWord(self.tag) +writer.WriteWord(self.length) +writer.WriteWord(self.ident) +writer.WriteWord(self.line_checksum) +writer.WriteWord(self.cfg_checksum) +for c in self.counters: + c.Write(writer) + + def EntryCount(self): +Return the number of times the function called. +return self.ArcCounters().counters[0] + + def Merge(self, others, multipliers): +Merge all functions in others into self. + +Args: + others: A sequence of Function objects + multipliers: A sequence of integers to be multiplied during merging. + +for o in others: + assert self.ident == o.ident + assert self.line_checksum == o.line_checksum + assert self.cfg_checksum == o.cfg_checksum + +for i in xrange(len(self.counters)): + self.counters[i].Merge([o.counters[i] for o in others], multipliers) + + def Print(self): +Print all the attributes in
Re: Comments on the suggestion to use infinite precision math for wide int.
On Apr 8, 2013, at 2:48 PM, Robert Dewar de...@adacore.com wrote: That may be so in C, in Ada it would be perfectly reasonable to use infinite precision for intermediate results in some cases, since the language standard specifically encourages this approach. gcc lacks an infinite precision plus operator?! :-)
[google gcc-4_7] offline profile merge (patchset 2) (issue8508048)
Revised copyright info. -Rong 2013-04-08 Rong Xu x...@google.com * contrib/profile_merge.py: An offline profile merge tool. Index: contrib/profile_merge.py === --- contrib/profile_merge.py(revision 0) +++ contrib/profile_merge.py(revision 0) @@ -0,0 +1,1320 @@ +#!/usr/bin/python2.7 +# +#Copyright (C) 2013 +#Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# + + +Merge two or more gcda profile. + + +__author__ = 'Seongbae Park, Rong Xu' +__author_email__ = 'sp...@google.com, x...@google.com' + +import array +from optparse import OptionGroup +from optparse import OptionParser +import os +import struct +import zipfile + +new_histogram = None + + +class Error(Exception): + Exception class for profile module. + + +def ReadAllAndClose(path): + Return the entire byte content of the specified file. + + Args: +path: The path to the file to be opened and read. + + Returns: +The byte sequence of the content of the file. + + data_file = open(path, 'rb') + data = data_file.read() + data_file.close() + return data + + +def MergeCounters(objs, index, multipliers): + Accumulate the counter at index from all counters objs. + val = 0 + for j in xrange(len(objs)): +val += multipliers[j] * objs[j].counters[index] + return val + + +class DataObject(object): + Base class for various datum in GCDA/GCNO file. + + def __init__(self, tag): +self.tag = tag + + +class Function(DataObject): + Function and its counters. + + Attributes: +length: Length of the data on the disk +ident: Ident field +line_checksum: Checksum of the line number +cfg_checksum: Checksum of the control flow graph +counters: All counters associated with the function +file: The name of the file the function is defined in. Optional. +line: The line number the function is defined at. Optional. + + Function object contains other counter objects and block/arc/line objects. + + + def __init__(self, reader, tag, n_words): +Read function record information from a gcda/gcno file. + +Args: + reader: gcda/gcno file. + tag: funtion tag. + n_words: length of function record in unit of 4-byte. + +DataObject.__init__(self, tag) +self.length = n_words +self.counters = [] + +if reader: + pos = reader.pos + self.ident = reader.ReadWord() + self.line_checksum = reader.ReadWord() + self.cfg_checksum = reader.ReadWord() + + # Function name string is in gcno files, but not + # in gcda files. Here we make string reading optional. + if (reader.pos - pos) n_words: +reader.ReadStr() + + if (reader.pos - pos) n_words: +self.file = reader.ReadStr() +self.line_number = reader.ReadWord() + else: +self.file = '' +self.line_number = 0 +else: + self.ident = 0 + self.line_checksum = 0 + self.cfg_checksum = 0 + self.file = None + self.line_number = 0 + + def Write(self, writer): +Write out the function. + +writer.WriteWord(self.tag) +writer.WriteWord(self.length) +writer.WriteWord(self.ident) +writer.WriteWord(self.line_checksum) +writer.WriteWord(self.cfg_checksum) +for c in self.counters: + c.Write(writer) + + def EntryCount(self): +Return the number of times the function called. +return self.ArcCounters().counters[0] + + def Merge(self, others, multipliers): +Merge all functions in others into self. + +Args: + others: A sequence of Function objects + multipliers: A sequence of integers to be multiplied during merging. + +for o in others: + assert self.ident == o.ident + assert self.line_checksum == o.line_checksum + assert self.cfg_checksum == o.cfg_checksum + +for i in xrange(len(self.counters)): + self.counters[i].Merge([o.counters[i] for o in others], multipliers) + + def Print(self): +Print all the attributes in full detail. +print 'function: ident %d length %d line_chksum %x cfg_chksum %x' % ( +self.ident, self.length, +self.line_checksum, self.cfg_checksum) +if self.file: + print 'file: %s' % self.file + print 'line_number: %d' % self.line_number +for c in self.counters: + c.Print() + + def
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 6:34 PM, Mike Stump wrote: On Apr 8, 2013, at 2:48 PM, Robert Dewar de...@adacore.com wrote: That may be so in C, in Ada it would be perfectly reasonable to use infinite precision for intermediate results in some cases, since the language standard specifically encourages this approach. gcc lacks an infinite precision plus operator?! :-) Right, that's why we do everything in the front end in the case of Ada. But it would be perfectly reasonable for the back end to do this substitution.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/08/2013 06:45 PM, Robert Dewar wrote: On 4/8/2013 6:34 PM, Mike Stump wrote: On Apr 8, 2013, at 2:48 PM, Robert Dewar de...@adacore.com wrote: That may be so in C, in Ada it would be perfectly reasonable to use infinite precision for intermediate results in some cases, since the language standard specifically encourages this approach. gcc lacks an infinite precision plus operator?! :-) Right, that's why we do everything in the front end in the case of Ada. But it would be perfectly reasonable for the back end to do this substitution. but there is no way in the current tree language to convey which ones you can and which ones you cannot.
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/8/2013 7:46 PM, Kenneth Zadeck wrote: On 04/08/2013 06:45 PM, Robert Dewar wrote: On 4/8/2013 6:34 PM, Mike Stump wrote: On Apr 8, 2013, at 2:48 PM, Robert Dewar de...@adacore.com wrote: That may be so in C, in Ada it would be perfectly reasonable to use infinite precision for intermediate results in some cases, since the language standard specifically encourages this approach. gcc lacks an infinite precision plus operator?! :-) Right, that's why we do everything in the front end in the case of Ada. But it would be perfectly reasonable for the back end to do this substitution. but there is no way in the current tree language to convey which ones you can and which ones you cannot. Well the back end has all the information to figure this out I think! But anyway, for Ada, the current situation is just fine, and has the advantage that the -gnatG expanded code listing clearly shows in Ada source form, what is going on.
Re: [PATCH, updated] Vtable pointer verification, C++ front end changes (patch 1 of 3)
Hi, sorry it has taken me so long to get back to this. Hopefully we can wrap it up quickly now that we're back in stage 1. On 02/25/2013 02:24 PM, Caroline Tice wrote: -CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xg++ -B$$r/$(HOST_SUBDIR)/gcc/ -nostdinc++ `if test -f $$r/$(TARGET_SUBDIR) /libstdc++-v3/scripts/testsuite_flags; then $(SHELL) $$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags --build- includes; else echo -funconfigured-libstdc++-v3 ; fi` -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -L$$r/$(TARGET_SUBDIR)/li bstdc++-v3/src/.libs' +CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xg++ -B$$r/$(HOST_SUBDIR)/gcc/ -nostdinc++ `if test -f $$r/$(TARGET_SUBDIR) /libstdc++-v3/scripts/testsuite_flags; then $(SHELL) $$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags --build- includes; else echo -funconfigured-libstdc++-v3 ; fi` -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -L$$r/$(TARGET_SUBDIR)/li bstdc++-v3/src/.libs -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs' You shouldn't need this, since libstdc++ includes libsupc++. And if you did need to do it, it would need to be in configure.ac or it will be discarded by the next autoconf. + information aboui which vtable will actually be emitted. */ about +vtv_finish_verification_constructor_init_function (tree function_body) +{ + tree fn; + + finish_compound_stmt (function_body); + fn = finish_function (0); + DECL_STATIC_CONSTRUCTOR (fn) = 1; + decl_init_priority_insert (fn, MAX_RESERVED_INIT_PRIORITY - 1); Why did you stop using finish_objects? If it was to be able to return the function, you can get that from current_function_decl before calling finish_objects. Index: gcc/cp/g++spec.c Changes to g++spec.c only affect the g++ driver, not the gcc driver. Are you sure this is what you want? Can't you handle this stuff directly in the specs like address sanitizer does? I haven't seen a response to this comment. + vtv_rts.cc \ + vtv_malloc.cc \ + vtv_utils.cc It seems to me that this code belongs in a separate library like libsanitizer, not in libstdc++. Or this one. - switch_to_section (sect); + if (sect-named.name + (strcmp (sect-named.name, .vtable_map_vars) == 0)) + { +#if defined (OBJECT_FORMAT_ELF) + targetm.asm_out.named_section (sect-named.name, +sect-named.common.flags +| SECTION_LINKONCE, +DECL_NAME (decl)); + in_section = sect; +#else + switch_to_section (sect); +#endif +} + else +switch_to_section (sect); + if (strcmp (name, .vtable_map_vars) == 0) + flags |= SECTION_LINKONCE; These changes should not be necessary. Just set DECL_ONE_ONLY on the vtable map variables. I believe this change was necessary so that each vtable map variable would have its own comdat name and be in its own comdat group...but I will revisit this and see if we still need it. What did you find? Perhaps you need to make sure that the map variables are getting passed to comdat_linkage at some point, such as here in vtable_find_or_create_map_decl: + DECL_SECTION_NAME (var_decl) = build_string (strlen (sect_name), + sect_name); + DECL_HAS_IMPLICIT_SECTION_NAME_P (var_decl) = true; + DECL_COMDAT_GROUP (var_decl) = get_identifier (var_name); Here comdat_linkage (var_decl) could replace these three lines and I believe make the above varasm change unnecessary. +/* This function adds classes we are interested in to a list of + classes that is saved during pre-compiled header generation. ... +/* This function goes through the list of classes we saved before the + pre-compiled header generation and calls vtv_save_base_class_info + on each one, to build up our class hierarchy data structure. */ These functions apply to non-PCH compiles as well; I find the mention of PCH here confusing. + tree void_ptr_type = build_pointer_type (void_type_node); + tree const_char_ptr_type = build_pointer_type + (build_qualified_type (char_type_node, + TYPE_QUAL_CONST)); These are already built, as ptr_type_node and const_string_type_node. + arg_types = chainon (arg_types, build_tree_list (NULL_TREE, void_type_node)); And you can use void_list_node instead of building a new void list. + arg_types = build_tree_list (NULL_TREE, build_pointer_type (void_ptr_type)); + arg_types = chainon (arg_types, build_tree_list (NULL_TREE, + const_ptr_type_node)); + arg_types = chainon (arg_types, build_tree_list (NULL_TREE, + size_type_node)); + arg_types = chainon (arg_types, build_tree_list (NULL_TREE, void_type_node)); + +
[RFA][PATCH] Improve VRP of COND_EXPR_CONDs -- v2
This incorporates the concrete suggestions from Steven Richi -- it doesn't do any refactoring of the VRP code. There's still stuff I'm looking at that might directly lead to some refactoring. In the mean time I'm submitting the obvious small improvements. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for trunk? Jeff commit d6d1e36561b9022bbcdf157a886895f5bb0ef2ae Author: Jeff Law l...@redhat.com Date: Sat Apr 6 06:46:58 2013 -0600 * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean when the boolean was created by converting a wider object which had a boolean range. * gcc.dg/tree-ssa/vrp87.c: New test diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6ee7d9c..110f61e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -16,8 +16,14 @@ 2013-04-08 Jeff Law l...@redhat.com + * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean + when the boolean was created by converting a wider object which + had a boolean range. + +2013-04-08 Jeff Law l...@redhat.com + * gimple.c (canonicalize_cond_expr_cond): Rewrite x ^ y into x != y. - + 2013-04-08 Richard Biener rguent...@suse.de * gimple-pretty-print.c (debug_gimple_stmt): Do not print diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c new file mode 100644 index 000..7feff81 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp2-details -fdump-tree-cddce2-details } */ + +struct bitmap_head_def; +typedef struct bitmap_head_def *bitmap; +typedef const struct bitmap_head_def *const_bitmap; + + +typedef unsigned long BITMAP_WORD; +typedef struct bitmap_element_def +{ + struct bitmap_element_def *next; + unsigned int indx; + BITMAP_WORD bits[((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u))]; +} bitmap_element; + + + + + + +typedef struct bitmap_head_def +{ + bitmap_element *first; + +} bitmap_head; + + + +static __inline__ unsigned char +bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt, + bitmap_element * dst_prev, const bitmap_element * a_elt, + const bitmap_element * b_elt, unsigned char changed) +{ + + if (a_elt) +{ + + if (!changed dst_elt) + { + changed = 1; + } +} + else +{ + changed = 1; +} + return changed; +} + +unsigned char +bitmap_ior_into (bitmap a, const_bitmap b) +{ + bitmap_element *a_elt = a-first; + const bitmap_element *b_elt = b-first; + bitmap_element *a_prev = ((void *) 0); + unsigned char changed = 0; + + while (b_elt) +{ + + if (!a_elt || a_elt-indx == b_elt-indx) + changed = bitmap_elt_ior (a, a_elt, a_prev, a_elt, b_elt, changed); + else if (a_elt-indx b_elt-indx) + changed = 1; + b_elt = b_elt-next; + + +} + + return changed; +} + +/* Verify that VRP simplified an if statement. */ +/* { dg-final { scan-tree-dump Folded into: if.* vrp2} } */ +/* Verify that DCE after VRP2 eliminates a dead conversion + to a (Bool). */ +/* { dg-final { scan-tree-dump Deleting.*_Bool.*; cddce2} } */ +/* { dg-final { cleanup-tree-dump vrp2 } } */ +/* { dg-final { cleanup-tree-dump cddce2 } } */ + diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 250a506..4520c89 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -8584,6 +8584,45 @@ simplify_cond_using_ranges (gimple stmt) } } + /* If we have a comparison of a SSA_NAME boolean against + a constant (which obviously must be [0..1]), see if the + SSA_NAME was set by a type conversion where the source + of the conversion is another SSA_NAME with a range [0..1]. + + If so, we can replace the SSA_NAME in the comparison with + the RHS of the conversion. This will often make the type + conversion dead code which DCE will clean up. */ + if (TREE_CODE (op0) == SSA_NAME + (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE + || (INTEGRAL_TYPE_P (TREE_TYPE (op)) + TYPE_PRECISION (TREE_TYPE (op0)) == 1)) + TREE_CODE (op1) == INTEGER_CST) +{ + gimple def_stmt = SSA_NAME_DEF_STMT (op0); + tree innerop; + + if (!is_gimple_assign (def_stmt) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + return false; + + innerop = gimple_assign_rhs1 (def_stmt); + + if (TREE_CODE (innerop) == SSA_NAME) + { + value_range_t *vr = get_value_range (innerop); + + if (range_int_cst_p (vr) + operand_equal_p (vr-min, integer_zero_node, 0) + operand_equal_p (vr-max, integer_one_node, 0)) + { + tree newconst = fold_convert (TREE_TYPE (innerop), op1); + gimple_cond_set_lhs (stmt, innerop); + gimple_cond_set_rhs (stmt, newconst); + return true; + } + } +} + return false; }
[PATCH] Improve cstore code generation on 64-bit sparc.
One major suboptimal area of the sparc back end is cstore generation on 64-bit. Due to the way arguments and return values of functions must be promoted, the ideal mode for cstore's result would be DImode. But this hasn't been done because of a fundamental limitation of the cstore patterns. They require a fixed mode be used for the boolean result value. I've decided to work around this by building a target hook which specifies the type to use for conditional store results, and then I use a special predicate for operans 0 in the cstore expanders so that they still match even when we use DImode. The default version of the target hook just does what it does now, so no other target should be impacted by this at all. Regstrapped on 32-bit sparc-linux-gnu and I've run the testsuite with -m64 to validate the 64-bit side. Any major objections? gcc/ * target.def (cstore_mode): New hook. * target.h: Include insn-codes.h * targhooks.c: Likewise. (default_cstore_mode): New function. * targhooks.h: Declare it. * doc/tm.texi.in: New hook slot for TARGET_CSTORE_MODE. * doc/tm.texi: Rebuild. * expmed.c (emit_cstore): Obtain cstore boolean result mode using target hook, rather than inspecting the insn_data. * config/sparc/sparc.c (sparc_cstore_mode): New function. (TARGET_CSTORE_MODE): Redefine. (emit_scc_insn): When TARGET_ARCH64, emit new 64-bit boolean result patterns. * config/sparc/predicates.md (cstore_result_operand): New special predicate. * config/sparc/sparc.md (cstoresi4, cstoredi4, cstoreF:mode4): Use it for operand 0. (*seqsi_special): Rewrite using 'P' mode iterator on operand 0. (*snesi_special): Likewise. (*snesi_zero): Likewise. (*seqsi_zero): Likewise. (*sltu_insn): Likewise. (*sgeu_insn): Likewise. (*seqdi_special): Make operand 0 and comparison operation be of DImode. (*snedi_special): Likewise. (*snedi_special_vis3): Likewise. (*neg_snesi_zero): Rename to *neg_snesisi_zero. (*neg_snesi_sign_extend): Rename to *neg_snesidi_zero. (*snesi_zero_extend): Delete, covered by 'P' mode iterator. (*neg_seqsi_zero): Rename to *neg_seqsisi_zero. (*neg_seqsi_sign_extend): Rename to *neg_seqsidi_zero. (*seqsi_zero_extend): Delete, covered by 'P' mode iterator. (*sltu_extend_sp64): Likewise. (*neg_sltu_insn): Rename to *neg_sltusi_insn. (*neg_sltu_extend_sp64): Rename to *neg_sltudi_insn. (*sgeu_extend_sp64): Delete, covered by 'P' mode iterator. (*neg_sgeu_insn): Rename to *neg_sgeusi_insn. (*neg_sgeu_extend_sp64): Rename to *neg_sgeudi_insn. gcc/testsuite/ * gcc.target/sparc/setcc-4.c: New test. * gcc.target/sparc/setcc-5.c: New test. --- gcc/config/sparc/predicates.md | 5 ++ gcc/config/sparc/sparc.c | 23 +- gcc/config/sparc/sparc.md| 137 ++- gcc/doc/tm.texi | 4 + gcc/doc/tm.texi.in | 2 + gcc/expmed.c | 2 +- gcc/target.def | 10 +++ gcc/target.h | 1 + gcc/targhooks.c | 9 ++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.target/sparc/setcc-4.c | 44 ++ gcc/testsuite/gcc.target/sparc/setcc-5.c | 42 ++ 12 files changed, 183 insertions(+), 97 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/setcc-4.c create mode 100644 gcc/testsuite/gcc.target/sparc/setcc-5.c diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md index b8524e5..073bce2 100644 --- a/gcc/config/sparc/predicates.md +++ b/gcc/config/sparc/predicates.md @@ -265,6 +265,11 @@ (ior (match_test register_operand (op, SImode)) (match_test TARGET_ARCH64 register_operand (op, DImode +;; Return true if OP is an integer register of the appropriate mode +;; for a cstore result. +(define_special_predicate cstore_result_operand + (match_test register_operand (op, TARGET_ARCH64 ? DImode : SImode))) + ;; Return true if OP is a floating point condition code register. (define_predicate fcc_register_operand (match_code reg) diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 3e98325..4a73c73 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -597,6 +597,7 @@ static void sparc_print_operand_address (FILE *, rtx); static reg_class_t sparc_secondary_reload (bool, rtx, reg_class_t, enum machine_mode, secondary_reload_info *); +static enum machine_mode sparc_cstore_mode (enum insn_code icode); #ifdef SUBTARGET_ATTRIBUTE_TABLE /* Table of valid machine attributes. */
Re: [RFA][PATCH] Improve VRP of COND_EXPR_CONDs -- v2
On 04/08/2013 07:54 PM, Jeff Law wrote: This incorporates the concrete suggestions from Steven Richi -- it doesn't do any refactoring of the VRP code. There's still stuff I'm looking at that might directly lead to some refactoring. In the mean time I'm submitting the obvious small improvements. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for trunk? Just a note, there's a typo op should be op0 in that patch; not sure why git gave me the old version since that's something I thought I'd patched and squashed out... Clearly a git user workflow error of some kind. Jeff
Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.
On Mon, Apr 8, 2013 at 5:48 PM, Richard Biener richard.guent...@gmail.com wrote: Can you trigger this message to show up with -Winline before/after the patch? Can you please add a testcase then? Thanks Richard for reviewing, from my point of view about gcc and my invoking of gcc, -Winline only works on callees that be declared inline, but if the callee is declared inline, it will be AVAIL_AVAILABLE in function can_inline_edge_p, thus out of the range of my patch. So I only add a testcase for fixing the tree dump, are there any thing more I can do? Regtested/bootstrapped on x86_64-linux ChangeLog: 2013-04-08 Zhouyi Zhou yizhouz...@ict.ac.cn * cif-code.def (OVERWRITABLE): correct the comment for overwritable function * ipa-inline.c (can_inline_edge_p): let dump mechanism report the inline fail caused by overwritable functions. * gcc.dg/tree-ssa/inline-11.c: New test Index: gcc/cif-code.def === --- gcc/cif-code.def(revision 197549) +++ gcc/cif-code.def(working copy) @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE, /* Function is not inlinable. */ DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_(function not inlinable)) -/* Function is not overwritable. */ +/* Function is overwritable. */ DEFCIFCODE(OVERWRITABLE, N_(function body can be overwritten at link time)) /* Function is not an inlining candidate. */ Index: gcc/testsuite/gcc.dg/tree-ssa/inline-11.c === --- gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-einline } */ +int w; +int bar (void) __attribute__ ((weak)); +int bar (){ + w++; +} +void foo() +{ + bar(); +} +/* { dg-final { scan-tree-dump-times function body can be overwritten at link time 1 einline } } */ +/* { dg-final { cleanup-tree-dump einline } } */ Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c(revision 197549) +++ gcc/ipa-inline.c(working copy) @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e else if (avail = AVAIL_OVERWRITABLE) { e-inline_failed = CIF_OVERWRITABLE; - return false; + inlinable = false; } else if (e-call_stmt_cannot_inline_p) {
Re: [patch libgcc]: Adjust cygming-crtbegin code to use weak
On 22/03/2013 08:44, Kai Tietz wrote: 2013-03-22 Kai Tietz kti...@redhat.com * config/i386/cygming-crtbegin.c (__register_frame_info): Make weak. (__deregister_frame_info): Likewise. Hi Kai, I read your explanation of the problem relating to x86-64 memory models over on the Cygwin dev list, and that explained your motivation for making this change; I see why it's not easy to get an *ABS* 0 reference there. So, providing dummy versions of the functions makes perfect sense to me, and certainly won't cause problems for i686. (I did a lot of testing, and the only problem I found is that a weak definition has to be provided on the linker command line *after* the file that contains the weak-with-zero-default definition if it is to override that; in the case here however we're going to be overriding the weak-with-default by a strong function declaration, so that issue does not arise.) I still have a comment or two about the patch itself: Index: libgcc/config/i386/cygming-crtbegin.c === --- libgcc/config/i386/cygming-crtbegin.c (Revision 196898) +++ libgcc/config/i386/cygming-crtbegin.c (Arbeitskopie) @@ -46,15 +46,33 @@ see the files COPYING3 and COPYING.RUNTIME respect #define LIBGCJ_SONAME libgcj_s.dll #endif - +#if DWARF2_UNWIND_INFO /* Make the declarations weak. This is critical for _Jv_RegisterClasses because it lives in libgcj.a */ -extern void __register_frame_info (const void *, struct object *) +extern void __register_frame_info (__attribute__((unused)) const void *, +__attribute__((unused)) struct object *) TARGET_ATTRIBUTE_WEAK; -extern void *__deregister_frame_info (const void *) +extern void *__deregister_frame_info (__attribute__((unused)) const void *) TARGET_ATTRIBUTE_WEAK; -extern void _Jv_RegisterClasses (const void *) TARGET_ATTRIBUTE_WEAK; +TARGET_ATTRIBUTE_WEAK void +__register_frame_info (__attribute__((unused)) const void *p, +__attribute__((unused)) struct object *o) +{} Braces should go on separate lines I think. +TARGET_ATTRIBUTE_WEAK void * +__deregister_frame_info (__attribute__((unused)) const void *p) +{ return (void*) 0; } Certainly here. +#endif /* DWARF2_UNWIND_INFO */ + +#if TARGET_USE_JCR_SECTION +extern void _Jv_RegisterClasses (__attribute__((unused)) const void *) + TARGET_ATTRIBUTE_WEAK; + +TARGET_ATTRIBUTE_WEAK void +_Jv_RegisterClasses (__attribute__((unused)) const void *p) +{} +#endif /* TARGET_USE_JCR_SECTION */ + #if defined(HAVE_LD_RO_RW_SECTION_MIXING) # define EH_FRAME_SECTION_CONST const #else Also, now that you've provided a default weak definition of the functions in the file itself, it's no longer possible for the function pointer variables (register_frame_fn, register_class_fn, deregister_frame_fn) to be zero, so you should remove the if () tests on them and just call them unconditionally. cheers, DaveK