Re: [ping] couple of fixes
Il 19/10/2012 19:01, Eric Botcazou ha scritto: PR bootstrap/54820 (stage #1 bootstrap failure) http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01093.html This one is okay, thanks. Paolo
Re: [PATCH GCC]Fix test case failure reported in PR54989
On Mon, Oct 22, 2012 at 11:00:08AM +0800, Bin Cheng wrote: The test case gcc/testsuite/gcc.dg/hoist-register-pressure.c is failed on x86_64-apple-darwin because it uses more registers than x86_64-linux. This can be fixed by simplifying the case using fewer registers. Tested on x86_64-apple-darwin/x86_64-linux, is it OK? I'd say it is better to do the scan-rtl-dump only on nonpic targets, that way it won't be done on darwin or for testing with --target_board=unix/-fpic where it would fail too. You can add the test with smaller register pressure as a new test (hoist-register-pressure2.c). Jakub
Re: Ping: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver
On Sun, 21 Oct 2012, Hans-Peter Nilsson wrote: CC:ing middle-end maintainers this time. I was a bit surprised when Eric Botcazou wrote in his review, quoted below, that he's not one of you. Maybe approve that too? If Eric is fine with the patch it is ok. Yes, he is not middle-end maintainer but RTL optimizer reviewer. Thanks, Richard. On Mon, 15 Oct 2012, Hans-Peter Nilsson wrote: On Fri, 12 Oct 2012, Eric Botcazou wrote: (insn 168 49 51 3 (set (reg/f:DI 253 $253) (plus:DI (reg/f:DI 253 $253) (const_int 24 [0x18]))) /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21 -1 (nil)) (insn 51 168 52 3 (clobber (reg/f:DI 253 $253)) ... Note that insn 168 deleted, which seems a logical optimization. The bug is to emit the clobber, not that the restoring insn is removed. Had that worked in the past for MMIX? Yes, for svn revision 106027 (20051030) 4.1.0-era (!) http://gcc.gnu.org/ml/gcc-testresults/2005-10/msg01340.html where the test must have passed, as gcc.c-torture/execute/built-in-setjmp.c is at least four years older than that. If so, what changed recently? By these days I didn't mean recent, just not eons ago. :) I see in a gcc-test-results posting from Mike Stein (whom I'd like to thank for test-results posting over the years), matching FAILs for svn revision 126095 (20070628) 4.3.0-era http://gcc.gnu.org/ml/gcc-testresults/2007-06/msg01287.html. Sorry, I have nothing in between those reports, my bad. Though I see no point narrowing down the failing revision further here IMO; as mentioned the bug is not that the restoring insn is removed. Agreed. However, I'd suggest rescuing the comment for the ELIMINABLE_REGS block from expand_nl_goto_receiver as it still sounds valid to me. Oops, my bad; I see I removed all the good comments. Fixed. * stmt.c (expand_nl_goto_receiver): Remove almost-copy of expand_builtin_setjmp_receiver. (expand_label): Adjust, call expand_builtin_setjmp_receiver with NULL for the label parameter. * builtins.c (expand_builtin_setjmp_receiver): Don't clobber the frame-pointer. Adjust comments. [HAVE_builtin_setjmp_receiver]: Emit builtin_setjmp_receiver only if LABEL is non-NULL. I cannot formally approve, but this looks good to me modulo: + If RECEIVER_LABEL is NULL, instead the port-specific parts of a + nonlocal goto handler are emitted. */ The port-specific parts wording is a bit confusing I think. I'd just write: If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler. Sure. Thanks for the review. Updated patch below. As nothing was changed from the previous post but comments as per the review (mostly moving / reviving, fixing one grammo), already covered by the changelog quoted above, the previous testing is still valid. Ok for trunk, approvers? Index: gcc/builtins.c === --- gcc/builtins.c (revision 192353) +++ gcc/builtins.c (working copy) @@ -885,14 +885,15 @@ expand_builtin_setjmp_setup (rtx buf_add } /* Construct the trailing part of a __builtin_setjmp call. This is - also called directly by the SJLJ exception handling code. */ + also called directly by the SJLJ exception handling code. + If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler. */ void expand_builtin_setjmp_receiver (rtx receiver_label ATTRIBUTE_UNUSED) { rtx chain; - /* Clobber the FP when we get here, so we have to make sure it's + /* Mark the FP as used when we get here, so we have to make sure it's marked as used by this function. */ emit_use (hard_frame_pointer_rtx); @@ -907,17 +908,28 @@ expand_builtin_setjmp_receiver (rtx rece #ifdef HAVE_nonlocal_goto if (! HAVE_nonlocal_goto) #endif -{ - emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx); - /* This might change the hard frame pointer in ways that aren't -apparent to early optimization passes, so force a clobber. */ - emit_clobber (hard_frame_pointer_rtx); -} +/* First adjust our frame pointer to its actual value. It was + previously set to the start of the virtual area corresponding to + the stacked variables when we branched here and now needs to be + adjusted to the actual hardware fp value. + + Assignments to virtual registers are converted by + instantiate_virtual_regs into the corresponding assignment + to the underlying register (fp in this case) that makes + the original assignment true. + So the following insn will actually be decrementing fp by + STARTING_FRAME_OFFSET. */ +emit_move_insn (virtual_stack_vars_rtx,
Re: Minimize downward code motion during reassociation
On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. Why make this all so complicated? It seems to me that we should fixup stmt order only after the whole ops vector has been materialized. + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe-op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe-op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb +gimple_uid (insert_stmt) gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just after that statement. + Instead, find the first non-label gimple statement in BB and insert before + that. */ + if (gimple_code (insert_stmt) == GIMPLE_PHI) +{ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); +} + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (lookup_stmt_eh_lp (insert_stmt) 0 that's already performed by stmt_can_throw_internal +stmt_can_throw_internal (insert_stmt)) But all this should be a non-issue as re-assoc should never assign an ops vector entry for such stmts (but it could have leafs defined by such stmts). If you only ever move definitions
Re: [PATCH] Fix dumps for IPA passes
On Sat, Oct 20, 2012 at 3:24 AM, Sharad Singhai sing...@google.com wrote: As suggested in http://gcc.gnu.org/ml/gcc/2012-10/msg00285.html, I have updated the attached patch to rename 'dump_enabled_phase' to 'dump_enabled_phase_p'. The 'dump_enabled_p ()' doesn't take any argument and can be used as a predicate for the dump calls. Once this patch gets in, the plan is to update the existing calls (in vectorizer passes) of the form if (dump_kind_p (flags)) dump_printf(flags, ...) to if (dump_enabled_p ()) dump_printf(flags, ...) Bootstrapped and tested on x86_64 and didn't observe any new test failures. Okay for trunk? Ok. Thanks, Richard. Thanks, Sharad 2012-10-19 Sharad Singhai sing...@google.com * dumpfile.c (dump_phase_enabled_p): Renamed dump_enabled_p. Update all callers. (dump_enabled_p): A new function to check if any of the dump files is available. (dump_kind_p): Remove check for current_function_decl. Add check for dumpfile and alt_dump_file. * dumpfile.h: Add declaration of dump_enabled_p. Index: dumpfile.c === --- dumpfile.c (revision 192623) +++ dumpfile.c (working copy) @@ -35,7 +35,7 @@ static int alt_flags;/* current op static FILE *alt_dump_file = NULL; static void dump_loc (int, FILE *, source_location); -static int dump_enabled_p (int); +static int dump_phase_enabled_p (int); static FILE *dump_open_alternate_stream (struct dump_file_info *); /* Table of tree dump switches. This must be consistent with the @@ -380,7 +380,7 @@ dump_start (int phase, int *flag_ptr) char *name; struct dump_file_info *dfi; FILE *stream; - if (phase == TDI_none || !dump_enabled_p (phase)) + if (phase == TDI_none || !dump_phase_enabled_p (phase)) return 0; dfi = get_dump_file_info (phase); @@ -461,7 +461,7 @@ dump_begin (int phase, int *flag_ptr) struct dump_file_info *dfi; FILE *stream; - if (phase == TDI_none || !dump_enabled_p (phase)) + if (phase == TDI_none || !dump_phase_enabled_p (phase)) return NULL; name = get_dump_file_name (phase); @@ -493,8 +493,8 @@ dump_begin (int phase, int *flag_ptr) If PHASE is TDI_tree_all, return nonzero if any dump is enabled for any phase. */ -int -dump_enabled_p (int phase) +static int +dump_phase_enabled_p (int phase) { if (phase == TDI_tree_all) { @@ -514,6 +514,14 @@ dump_begin (int phase, int *flag_ptr) } } +/* Return true if any of the dumps are enabled, false otherwise. */ + +inline bool +dump_enabled_p (void) +{ + return (dump_file || alt_dump_file); +} + /* Returns nonzero if tree dump PHASE has been initialized. */ int @@ -834,9 +842,8 @@ opt_info_switch_p (const char *arg) bool dump_kind_p (int msg_type) { - if (!current_function_decl) -return 0; - return ((msg_type pflags) || (msg_type alt_flags)); + return (dump_file (msg_type pflags)) +|| (alt_dump_file (msg_type alt_flags)); } /* Print basic block on the dump streams. */ Index: dumpfile.h === --- dumpfile.h (revision 192623) +++ dumpfile.h (working copy) @@ -121,6 +121,7 @@ extern int dump_switch_p (const char *); extern int opt_info_switch_p (const char *); extern const char *dump_flag_name (int); extern bool dump_kind_p (int); +extern inline bool dump_enabled_p (void); extern void dump_printf (int, const char *, ...) ATTRIBUTE_PRINTF_2; extern void dump_printf_loc (int, source_location, const char *, ...) ATTRIBUTE_PRINTF_3;
Re: Fix array bound niter estimate (PR middle-end/54937)
On Fri, 19 Oct 2012, Jan Hubicka wrote: On Fri, 19 Oct 2012, Jan Hubicka wrote: Hi, this patch fixes off-by-one error in the testcase attached. The problem is that dominance based test used by record_estimate to check whether the given statement must be executed at last iteration of the loop is wrong ignoring the side effect of other statements that may terminate the program. It also does not work for mulitple exits as excercised by cunroll-2.c testcase. This patch makes simple approach of computing set of all statements that must by executed last iteration first time record_estimate is executed this way. The set is computed conservatively walking header BB and its signle successors (possibly diving into nested loop) stopping on first BB with multiple exits. Better result can be computed by 1) estimating what loops are known to be finite 2) inserting fake edges for all infinite loop and all statements with side effect that may terminate the execution 3) using the post dominance info. would using post-dom info even work? That only says that _if_ the dominated stmt executed then it came through the dominator. It doesn't deal with functions that may not return. With fake edges inserted it will. We do have code for that used in profiling that also needs this stronger definition of CFG. Huh, but then we will need to split blocks. I don't think that's viable. What about the conservative variant of simply else delta = double_int_one; I think it would be bad idea: it makes us to completely unroll one interation too many that bloats code for no benefit. No optimization cancels the path in CFG because of undefined effect and thus the code will be output (unless someone smarter, like VRP, cleans up later, but it is more an exception than rule.) ? I don't like all the code you add, nor the use of -aux. Neither I really do, but what are the alternatives? See above ;) My first implementation simply checked that stmt is in the loop header and walked up to the beggining of basic blocks looking for side effects. Then I become worried about possibility of gigantic basic blocks with many array stores within the loop, so I decided to record the reachable statements instead of repeating the walk. Loop count estimation is recursive (i.e. it dives into inner loops), thus I ended up with using AUX. I can for sure put this separately or add extra reference argument passed over the whole call stack, but there are quite many functions that can leads to record_estimate. (I have nothing against that alternative however if AUX looks ugly) I am worried about passes trying to use AUX. We should at least document that it is for internal use only. i_bound += delta; Another alternative would be to not use i_bound for the strong upper bound but only the estimate (thus conservatively use i_bound + 1 for the upper bound if !is_exit). We can not derrive realistic estimate based on this: the loop may exit much earlier. We can only lower the estimate if it is already there and greater than this bound. This can probably happen with profile feedback and I can implement it later, I do not think it is terribly important though. Honza -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode
On 20/10/12 12:38, Julian Brown wrote: Hi, Quite a few tests fail for big-endian multilibs which use VFP instructions at present. One reason for many of these is glaringly obvious once you notice it: for D registers interpreted as two S registers, the lower-numbered register is always the less-significant part of the value, and the higher-numbered register the more-significant -- regardless of the endianness the processor is running in. However, for big-endian mode, when DFmode values are represented in memory (or indeed core registers), the opposite is true. So, a subreg expression such as the following will work fine on core registers (or e.g. pseudos assigned to stack slots): (subreg:SI (reg:DF) 0) but, when applied to a VFP register Dn, it should be resolved to the hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e. the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should be the most-significant part of the value). For the relatively few cases where DFmode values are interpreted as a pair of (integer) words, this means that wrong code is generated. My feeling is that implementing a proper solution to this problem is probably impractical -- the closest existing macros to control behaviour aren't sufficient for this case: * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct as is it. * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian order in registers, but refers to *all* registers. We only want to change the behaviour for the VFP registers. Defining a new macro FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would differ depending on the hard register under observation: that seems like too much to ask of generic machinery in the middle-end. So, the attached patch just avoids the problem, by pretending that greater-than-word-size values in VFP registers, in big-endian mode, are opaque and cannot be subreg'ed. In practice, for at least the test case I looked at, this isn't as much of a pessimisation as you might expect -- the value in question might already be stored in core registers (e.g. for function arguments with -mfloat-abi=softfp), so can be retrieved directly from those rather than via memory. This is the testsuite delta for current FSF mainline, with multilibs adjusted to build for little/big-endian, and using options -mbig-endian -mfloat-abi=softfp -mfpu=vfpv3 for testing: FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O1 execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 -fomit-frame-pointer execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 -g execution test FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -Os execution test FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O1 FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2 FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O3 -fomit-frame-pointer FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O3 -g FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -Og -g FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -Os FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O1 execution test FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 execution test FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
Re: Fix array bound niter estimate (PR middle-end/54937)
On Fri, 19 Oct 2012, Jan Hubicka wrote: What about the conservative variant of simply else delta = double_int_one; I think it would be bad idea: it makes us to completely unroll one interation too many that bloats code for no benefit. No optimization cancels the path in CFG because of undefined effect and thus the code will be output (unless someone smarter, like VRP, cleans up later, but it is more an exception than rule.) OK, on deper tought I guess I can add double_int_one always at that spot and once we are done with everything I can walk nb_iter_bound for all statements known to not be executed on last iteration and record them to pointer set. Finally I can walk from header in DFS stopping on loop exits, side effects and those stateemnts. If I visit no loop exit or side effect I know I can lower iteration count by 1 (in estimate_numbers_of_iterations_loop). This will give accurate answer and requires just little extra bookkeeping. I will give this a try. Here is updated patch. It solves the testcase and gives better estimates than before. Here is obvious improvements: record_estimate can put all statements to the list not only those that dominates loop latch and maybe_lower_iteration_bound can track lowest estimate it finds on its walk. This will need bit more work and I am thus sending the bugfix separately, because I think it should go to 4.7, too. Honza * tree-ssa-loop-niter.c (record_estimate): Remove confused dominators check. (maybe_lower_iteration_bound): New function. (estimate_numbers_of_iterations_loop): Use it. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192537) +++ tree-ssa-loop-niter.c (working copy) @@ -2535,7 +2541,6 @@ record_estimate (struct loop *loop, tree gimple at_stmt, bool is_exit, bool realistic, bool upper) { double_int delta; - edge exit; if (dump_file (dump_flags TDF_DETAILS)) { @@ -2570,14 +2577,10 @@ record_estimate (struct loop *loop, tree } /* Update the number of iteration estimates according to the bound. - If at_stmt is an exit or dominates the single exit from the loop, - then the loop latch is executed at most BOUND times, otherwise - it can be executed BOUND + 1 times. */ - exit = single_exit (loop); - if (is_exit - || (exit != NULL -dominated_by_p (CDI_DOMINATORS, - exit-src, gimple_bb (at_stmt + If at_stmt is an exit then the loop latch is executed at most BOUND times, + otherwise it can be executed BOUND + 1 times. We will lower the estimate + later if such statement must be executed on last iteration */ + if (is_exit) delta = double_int_zero; else delta = double_int_one; @@ -2953,6 +2956,87 @@ gcov_type_to_double_int (gcov_type val) return ret; } +/* See if every path cross the loop goes through a statement that is known + to not execute at the last iteration. In that case we can decrese iteration + count by 1. */ + +static void +maybe_lower_iteration_bound (struct loop *loop) +{ + pointer_set_t *not_executed_last_iteration = pointer_set_create (); + pointer_set_t *visited; + struct nb_iter_bound *elt; + bool found = false; + VEC (basic_block, heap) *queue = NULL; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + if (!elt-is_exit +elt-bound.ult (loop-nb_iterations_upper_bound)) + { + found = true; + pointer_set_insert (not_executed_last_iteration, elt-stmt); + } +} So you are looking for all stmts a bound was derived from. + if (!found) +{ + pointer_set_destroy (not_executed_last_iteration); create this on-demand in the above loop? + return; +} + visited = pointer_set_create (); + VEC_safe_push (basic_block, heap, queue, loop-header); + pointer_set_insert (visited, loop-header); pointer-set for BB visited? In most other places we use a [s]bitmap with block numbers. + found = false; + + while (VEC_length (basic_block, queue) !found) looks like a do-while loop should be possible with a !VEC_empty () guard at the end. +{ + basic_block bb = VEC_pop (basic_block, queue); + gimple_stmt_iterator gsi; + bool stmt_found = false; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + if (pointer_set_contains (not_executed_last_iteration, stmt)) + { + stmt_found = true; we found one. + break; + } + if (gimple_has_side_effects (stmt)) + { + found = true; we found sth else? + break; + } + } + if (!stmt_found !found) + { if we found
[Ada] Fix ICE on loop with modular iteration variable
This is a regression at -O present on mainline and 4.7 branch. The compiler inadvertently uses a non-base type for the base type of a modular iteration variable on 32-bit architectures. Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gcc-interface/trans.c (Loop_Statement_to_gnu): Use gnat_type_for_size directly to obtain an unsigned version of the base type. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gnat.dg/modular4.adb: New test. * gnat.dg/modular4_pkg.ads: New helper. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 192648) +++ gcc-interface/trans.c (working copy) @@ -2431,7 +2431,8 @@ Loop_Statement_to_gnu (Node_Id gnat_node { if (TYPE_PRECISION (gnu_base_type) TYPE_PRECISION (size_type_node)) - gnu_base_type = gnat_unsigned_type (gnu_base_type); + gnu_base_type + = gnat_type_for_size (TYPE_PRECISION (gnu_base_type), 1); else gnu_base_type = size_type_node; -- { dg-do compile } -- { dg-options -O } with Modular4_Pkg; use Modular4_Pkg; procedure Modular4 is begin for I in Zero .. F mod 8 loop raise Program_Error; end loop; end; package Modular4_Pkg is type Word is mod 2**48; Zero : constant Word := 0; function F return Word; end Modular4_Pkg;
Re: [Ada] Do not generate special PARM_DECL in LTO mode
On Mon, Oct 22, 2012 at 10:04 AM, Eric Botcazou ebotca...@adacore.com wrote: We generate a special PARM_DECL for Out parameters passed by copy at -O0, but it doesn't play nice with LTO so this patch removes it when LTO is enabled. Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch. Shouldn't it be simply the abstract origin for the VAR_DECL? Or be not 'lowered' here but be a 'proper' PARM_DECL with DECL_VALUE_EXPR? That said, how is debug info emitted in the optimize case? No objection to the patch as-is, but guarding sth with flag_generate_lto always makes me suspicious ;) Richard. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (gnat_to_gnu_entity) E_Out_Parameter: Do not generate the special PARM_DECL for an Out parameter in LTO mode. -- Eric Botcazou
[Ada] Fix ICE on new limited_with use in Ada 2012
Ada 2012 has extended the use of limited_with and incomplete types coming from a limited context may now appear in parameter and result profiles. This of course introduces more circularities, especially in -gnatct mode. Tested on x86_64-suse-linux, applied on the mainline. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (gnat_to_gnu_entity) E_Subprogram_Type: In type annotation mode, break circularities introduced by AI05-0151. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gnat.dg/specs/limited_with4.ads: New test. * gnat.dg/specs/limited_with4_pkg.ads: New helper. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 192667) +++ gcc-interface/decl.c (working copy) @@ -4142,7 +4142,18 @@ gnat_to_gnu_entity (Entity_Id gnat_entit gnu_return_type = void_type_node; else { - gnu_return_type = gnat_to_gnu_type (gnat_return_type); + /* Ada 2012 (AI05-0151): Incomplete types coming from a limited + context may now appear in parameter and result profiles. If + we are only annotating types, break circularities here. */ + if (type_annotate_only + IN (Ekind (gnat_return_type), Incomplete_Kind) + From_With_Type (gnat_return_type) + In_Extended_Main_Code_Unit + (Non_Limited_View (gnat_return_type)) + !present_gnu_tree (Non_Limited_View (gnat_return_type))) + gnu_return_type = ptr_void_type_node; + else + gnu_return_type = gnat_to_gnu_type (gnat_return_type); /* If this function returns by reference, make the actual return type the pointer type and make a note of that. */ @@ -4238,11 +4249,30 @@ gnat_to_gnu_entity (Entity_Id gnat_entit Present (gnat_param); gnat_param = Next_Formal_With_Extras (gnat_param), parmnum++) { + Entity_Id gnat_param_type = Etype (gnat_param); tree gnu_param_name = get_entity_name (gnat_param); - tree gnu_param_type = gnat_to_gnu_type (Etype (gnat_param)); - tree gnu_param, gnu_field; - bool copy_in_copy_out = false; + tree gnu_param_type, gnu_param, gnu_field; Mechanism_Type mech = Mechanism (gnat_param); + bool copy_in_copy_out = false, fake_param_type; + + /* Ada 2012 (AI05-0151): Incomplete types coming from a limited + context may now appear in parameter and result profiles. If + we are only annotating types, break circularities here. */ + if (type_annotate_only + IN (Ekind (gnat_param_type), Incomplete_Kind) + From_With_Type (Etype (gnat_param_type)) + In_Extended_Main_Code_Unit + (Non_Limited_View (gnat_param_type)) + !present_gnu_tree (Non_Limited_View (gnat_param_type))) + { + gnu_param_type = ptr_void_type_node; + fake_param_type = true; + } + else + { + gnu_param_type = gnat_to_gnu_type (gnat_param_type); + fake_param_type = false; + } /* Builtins are expanded inline and there is no real call sequence involved. So the type expected by the underlying expander is @@ -4280,10 +4310,28 @@ gnat_to_gnu_entity (Entity_Id gnat_entit mech = Default; } - gnu_param - = gnat_to_gnu_param (gnat_param, mech, gnat_entity, - Has_Foreign_Convention (gnat_entity), - copy_in_copy_out); + /* Do not call gnat_to_gnu_param for a fake parameter type since + it will try to use the real type again. */ + if (fake_param_type) + { + if (Ekind (gnat_param) == E_Out_Parameter) + gnu_param = NULL_TREE; + else + { + gnu_param + = create_param_decl (gnu_param_name, gnu_param_type, + false); + Set_Mechanism (gnat_param, + mech == Default ? By_Copy : mech); + if (Ekind (gnat_param) == E_In_Out_Parameter) + copy_in_copy_out = true; + } + } + else + gnu_param + = gnat_to_gnu_param (gnat_param, mech, gnat_entity, + Has_Foreign_Convention (gnat_entity), + copy_in_copy_out); /* We are returned either a PARM_DECL or a type if no parameter needs to be passed; in either case, adjust the type. */-- { dg-do compile } -- { dg-options -gnat12 -gnatct } with Ada.Containers.Vectors; with Limited_With4_Pkg; package Limited_With4 is type Object is tagged private; type Object_Ref is access all Object; type Class_Ref is access all Object'Class; package Vec is new Ada.Containers.Vectors (Positive, Limited_With4_Pkg.Object_Ref,Limited_With4_Pkg .=); subtype Vector is Vec.Vector; private type Object is tagged record V : Vector; end record; end Limited_With4; -- { dg-do compile } -- { dg-options -gnat12 -gnatct } limited with Limited_With4; package Limited_With4_Pkg is type Object is tagged null record; type Object_Ref is access all Object; type Class_Ref is access all Object'Class; function Func return Limited_With4.Class_Ref;
[Ada] Adjust rest_of_record_type_compilation to sizetype change
The function does a bit of pattern matching to emit the special encoding for variable-sized record types in the debug info and it needs to be adjusted to the sizetype change. Tested on x86_64-suse-linux, applied on the mainline. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gcc-interface/utils.c (rest_of_record_type_compilation): Simplify and robustify pattern machine code for masking operations. -- Eric BotcazouIndex: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 192648) +++ gcc-interface/utils.c (working copy) @@ -1731,19 +1731,23 @@ rest_of_record_type_compilation (tree re tree offset = TREE_OPERAND (curpos, 0); align = tree_low_cst (TREE_OPERAND (curpos, 1), 1); - /* An offset which is a bitwise AND with a negative power of 2 - means an alignment corresponding to this power of 2. Note - that, as sizetype is sign-extended but nonetheless unsigned, - we don't directly use tree_int_cst_sgn. */ + /* An offset which is a bitwise AND with a mask increases the + alignment according to the number of trailing zeros. */ offset = remove_conversions (offset, true); if (TREE_CODE (offset) == BIT_AND_EXPR - host_integerp (TREE_OPERAND (offset, 1), 0) - TREE_INT_CST_HIGH (TREE_OPERAND (offset, 1)) 0) + TREE_CODE (TREE_OPERAND (offset, 1)) == INTEGER_CST) { - unsigned int pow - = - tree_low_cst (TREE_OPERAND (offset, 1), 0); - if (exact_log2 (pow) 0) - align *= pow; + unsigned HOST_WIDE_INT mask + = TREE_INT_CST_LOW (TREE_OPERAND (offset, 1)); + unsigned int i; + + for (i = 0; i HOST_BITS_PER_WIDE_INT; i++) + { + if (mask 1) + break; + mask = 1; + align *= 2; + } } pos = compute_related_constant (curpos,
[Ada] Plug small hole in handling of volatile components
This pertains only to small arrays, for which we fail to take into account a pragma Volatile on the component type or a pragma Volatile_Component. Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch. 2012-10-22 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (gnat_to_gnu_entity) E_Array_Type: Force BLKmode on the type if it is passed by reference. E_Array_Subtype: Likewise. E_Record_Type: Guard the call to Is_By_Reference_Type predicate. E_Record_Subtype: Likewise. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 192671) +++ gcc-interface/decl.c (working copy) @@ -2248,6 +2248,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit TYPE_MULTI_ARRAY_P (tem) = (index 0); if (array_type_has_nonaliased_component (tem, gnat_entity)) TYPE_NONALIASED_COMPONENT (tem) = 1; + + /* If it is passed by reference, force BLKmode to ensure that + objects of this type will always be put in memory. */ + if (TYPE_MODE (tem) != BLKmode + Is_By_Reference_Type (gnat_entity)) + SET_TYPE_MODE (tem, BLKmode); } /* If an alignment is specified, use it if valid. But ignore it @@ -2588,6 +2594,11 @@ gnat_to_gnu_entity (Entity_Id gnat_entit TYPE_MULTI_ARRAY_P (gnu_type) = (index 0); if (array_type_has_nonaliased_component (gnu_type, gnat_entity)) TYPE_NONALIASED_COMPONENT (gnu_type) = 1; + + /* See the E_Array_Type case for the rationale. */ + if (TYPE_MODE (gnu_type) != BLKmode + Is_By_Reference_Type (gnat_entity)) + SET_TYPE_MODE (gnu_type, BLKmode); } /* Attach the TYPE_STUB_DECL in case we have a parallel type. */ @@ -3161,7 +3172,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit /* If it is passed by reference, force BLKmode to ensure that objects of this type will always be put in memory. */ - if (Is_By_Reference_Type (gnat_entity)) + if (TYPE_MODE (gnu_type) != BLKmode + Is_By_Reference_Type (gnat_entity)) SET_TYPE_MODE (gnu_type, BLKmode); /* We used to remove the associations of the discriminants and _Parent @@ -3527,12 +3539,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit modify it below. */ finish_record_type (gnu_type, nreverse (gnu_field_list), 2, false); + compute_record_mode (gnu_type); /* See the E_Record_Type case for the rationale. */ - if (Is_By_Reference_Type (gnat_entity)) + if (TYPE_MODE (gnu_type) != BLKmode + Is_By_Reference_Type (gnat_entity)) SET_TYPE_MODE (gnu_type, BLKmode); - else - compute_record_mode (gnu_type); TYPE_VOLATILE (gnu_type) = Treat_As_Volatile (gnat_entity);
Re: [Ada] Do not generate special PARM_DECL in LTO mode
Shouldn't it be simply the abstract origin for the VAR_DECL? Or be not 'lowered' here but be a 'proper' PARM_DECL with DECL_VALUE_EXPR? That said, how is debug info emitted in the optimize case? This is a PARM_DECL with DECL_VALUE_EXPR set to the VAR_DECL emitted in the outermost function scope. It doesn't survive with optimization enabled so we don't bother generating it in this case. -- Eric Botcazou
[AARCH64-4.7] Merge from upstream gcc-4_7-branch r192597
Hi, I have just merged upstream gcc-4_7-branch on the aarch64-4.7-branch up to r192597. Thanks Sofiane
[PATCH,ARM] Fix PR55019 Incorrectly use live argument register to save high register in thumb1 prologue
Hi, Attached patch intends to fix bug 55019 which is exposed on 4.7 branch. Although this bug can't be reproduced on trunk, I think this fix is still useful to make trunk more robust. Tested with trunk regression test on cortex-m0 and cortex-m3, no regression found. Also tested with various benchmark like Dhrystone/coremark/eembc_v1 on cortex-m0, no regression on performance and code size. Is it ok to go upstream and 4.7 branch? BR, Terry gcc/ChangeLog 2012-10-22 Terry Guo terry@arm.com PR target/55019 * config/arm/arm.c (thumb1_expand_prologue): Don't push high regs with live argument regs. gcc/testsuite/ChangeLog 2012-10-22 Terry Guo terry@arm.com PR target/55019 * gcc.target/arm/pr55019.c: New. thumb1-argument-register-issue.patch Description: Binary data
[AARCH64] Merge from upstream trunk r192598
Hi, I have merged upstream trunk into ARM/aarch64-branch, up to r192598. Thanks Sofiane
Re: [PATCH] PowerPC VLE port
On 10/19/2012 02:52 PM, David Edelsohn wrote: How do you want to move forward with the VLE patch? Can you localize more of the changes? David, I have been distracted by other tasks. I expect to revisit VLE this week. However, I won't be able to invest much more time on VLE. I'll look at what else I can do. -- Jim Lemke Mentor Graphics / CodeSourcery Orillia Ontario, +1-613-963-1073
Re: Fix array bound niter estimate (PR middle-end/54937)
+static void +maybe_lower_iteration_bound (struct loop *loop) +{ + pointer_set_t *not_executed_last_iteration = pointer_set_create (); + pointer_set_t *visited; + struct nb_iter_bound *elt; + bool found = false; + VEC (basic_block, heap) *queue = NULL; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + if (!elt-is_exit + elt-bound.ult (loop-nb_iterations_upper_bound)) + { + found = true; + pointer_set_insert (not_executed_last_iteration, elt-stmt); + } +} So you are looking for all stmts a bound was derived from. Yes, with bound smaller than the current estimate. + if (!found) +{ + pointer_set_destroy (not_executed_last_iteration); create this on-demand in the above loop? Will do. + return; +} + visited = pointer_set_create (); + VEC_safe_push (basic_block, heap, queue, loop-header); + pointer_set_insert (visited, loop-header); pointer-set for BB visited? In most other places we use a [s]bitmap with block numbers. Will switch to bitmap though I think it is mostly because this tradition was invented before pointer-set. bitmap has liner walk in it, pointer-set should scale better. if we didn't find an exit we reduce count. double_int_one looks magic here, but with the assertion that each queued 'stmt_found' upper bound was less than loop-nb_iterations_upper_bound, subtracting one is certainly conservative. But why not use the maximum estimate from all stmts_found? Because it is always nb_iterations_upper_bound-1, see logic in record_estimate. I plan to change this - basically we can change record_esitmate to record all statements, not only those dominating exit and do Dijkstra's algorithm in this walk looking for largest upper bound we can reach loopback with. But as I wrote in the email, I would like to do this incrementally - fix the bug first (possibly for 4.7, too - the bug is there I am not sure if it can lead to wrong code) and change this next. It means some further changes thorough niter.c, but little challenge to implement Dijkstra with double-int based queue. Thus, please add some comments, use a bitmap for visited and rename variables to be more descriptive. Will do, and need to analyze the bounds fortran failures :) Thanks, Honza
Ping [Patch] Fix PR52945
Could someone commit the patch at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00758.html ? TIA Dominique
[PATCH] Fix PR55011
This fixes PR55011, it seems nothing checks for invalid lattice transitions in VRP, so the following adds that since we now can produce a lot more UNDEFINED than before not doing so triggers issues. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-10-22 Richard Biener rguent...@suse.de PR tree-optimization/55011 * tree-vrp.c (update_value_range): For invalid lattice transitions drop to VARYING. * gcc.dg/torture/pr55011.c: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 192671) --- gcc/tree-vrp.c (working copy) *** update_value_range (const_tree var, valu *** 819,826 || !vrp_bitmap_equal_p (old_vr-equiv, new_vr-equiv); if (is_new) ! set_value_range (old_vr, new_vr-type, new_vr-min, new_vr-max, !new_vr-equiv); BITMAP_FREE (new_vr-equiv); --- 819,837 || !vrp_bitmap_equal_p (old_vr-equiv, new_vr-equiv); if (is_new) ! { ! /* Do not allow transitions up the lattice. The following ! is slightly more awkward than just new_vr-type old_vr-type !because VR_RANGE and VR_ANTI_RANGE need to be considered !the same. We may not have is_new when transitioning to !UNDEFINED or from VARYING. */ ! if (new_vr-type == VR_UNDEFINED ! || old_vr-type == VR_VARYING) ! set_value_range_to_varying (old_vr); ! else ! set_value_range (old_vr, new_vr-type, new_vr-min, new_vr-max, !new_vr-equiv); ! } BITMAP_FREE (new_vr-equiv); Index: gcc/testsuite/gcc.dg/torture/pr55011.c === *** gcc/testsuite/gcc.dg/torture/pr55011.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr55011.c (working copy) *** *** 0 --- 1,22 + /* { dg-do compile } */ + + char a; + + void f(void) + { + char b = 2; + + for(;;) + { + unsigned short s = 1, *p = s, *i; + + for(*i = 0; *i 4; ++*i) + if(a | (*p /= (b += !!a)) = 63739) + return; + + if(!s) + a = 0; + + for(;;); + } + }
Re: [PATCH,ARM] Fix PR55019 Incorrectly use live argument register to save high register in thumb1 prologue
On 22/10/12 12:50, Terry Guo wrote: Hi, Attached patch intends to fix bug 55019 which is exposed on 4.7 branch. Although this bug can't be reproduced on trunk, I think this fix is still useful to make trunk more robust. Tested with trunk regression test on cortex-m0 and cortex-m3, no regression found. Also tested with various benchmark like Dhrystone/coremark/eembc_v1 on cortex-m0, no regression on performance and code size. Is it ok to go upstream and 4.7 branch? BR, Terry gcc/ChangeLog 2012-10-22 Terry Guo terry@arm.com PR target/55019 * config/arm/arm.c (thumb1_expand_prologue): Don't push high regs with live argument regs. gcc/testsuite/ChangeLog 2012-10-22 Terry Guo terry@arm.com PR target/55019 * gcc.target/arm/pr55019.c: New.= thumb1-argument-register-issue.patch N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý The test isn't thumb1 specific. In fact, it isn't even ARM specific. So I think it should be moved to gcc.dg. Otherwise OK for trunk and 4.7. R.
[PATCH, i386]: Fix length attribute calculation for LEA and addr32 addresses, some improvements
Hello! We don't need to check for REG_P on base and index, we are sure that non-null RTXes are registers only. Also, we should determine the mode of RTXes in addr32 calculation from original RTXes. 2012-10-22 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (memory_address_length): Assert that non-null base or index RTXes are registers. Do not check for REG RTXes. Determine addr32 prefix from original base and index RTXes. Simplify code. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 192664) +++ config/i386/i386.c (working copy) @@ -23764,7 +23764,7 @@ memory_address_length (rtx addr, bool lea) { struct ix86_address parts; rtx base, index, disp; - int len = 0; + int len; int ok; if (GET_CODE (addr) == PRE_DEC @@ -23776,15 +23776,26 @@ memory_address_length (rtx addr, bool lea) ok = ix86_decompose_address (addr, parts); gcc_assert (ok); - if (parts.base GET_CODE (parts.base) == SUBREG) -parts.base = SUBREG_REG (parts.base); - if (parts.index GET_CODE (parts.index) == SUBREG) -parts.index = SUBREG_REG (parts.index); + len = (parts.seg == SEG_DEFAULT) ? 0 : 1; + /* If this is not LEA instruction, add the length of addr32 prefix. */ + if (TARGET_64BIT !lea + ((parts.base GET_MODE (parts.base) == SImode) + || (parts.index GET_MODE (parts.index) == SImode))) +len++; + base = parts.base; index = parts.index; disp = parts.disp; + if (base GET_CODE (base) == SUBREG) +base = SUBREG_REG (base); + if (index GET_CODE (index) == SUBREG) +index = SUBREG_REG (index); + + gcc_assert (base == NULL_RTX || REG_P (base)); + gcc_assert (index == NULL_RTX || REG_P (index)); + /* Rule of thumb: - esp as the base always wants an index, - ebp as the base always wants a displacement, @@ -23797,14 +23808,13 @@ memory_address_length (rtx addr, bool lea) /* esp (for its index) and ebp (for its displacement) need the two-byte modrm form. Similarly for r12 and r13 in 64-bit code. */ - if (REG_P (base) - (base == arg_pointer_rtx - || base == frame_pointer_rtx - || REGNO (base) == SP_REG - || REGNO (base) == BP_REG - || REGNO (base) == R12_REG - || REGNO (base) == R13_REG)) - len = 1; + if (base == arg_pointer_rtx + || base == frame_pointer_rtx + || REGNO (base) == SP_REG + || REGNO (base) == BP_REG + || REGNO (base) == R12_REG + || REGNO (base) == R13_REG) + len++; } /* Direct Addressing. In 64-bit mode mod 00 r/m 5 @@ -23814,7 +23824,7 @@ memory_address_length (rtx addr, bool lea) by UNSPEC. */ else if (disp !base !index) { - len = 4; + len += 4; if (TARGET_64BIT) { rtx symbol = disp; @@ -23832,7 +23842,7 @@ memory_address_length (rtx addr, bool lea) || (XINT (symbol, 1) != UNSPEC_GOTPCREL XINT (symbol, 1) != UNSPEC_PCREL XINT (symbol, 1) != UNSPEC_GOTNTPOFF))) - len += 1; + len++; } } else @@ -23841,41 +23851,23 @@ memory_address_length (rtx addr, bool lea) if (disp) { if (base satisfies_constraint_K (disp)) - len = 1; + len += 1; else - len = 4; + len += 4; } /* ebp always wants a displacement. Similarly r13. */ - else if (base REG_P (base) - (REGNO (base) == BP_REG || REGNO (base) == R13_REG)) - len = 1; + else if (base (REGNO (base) == BP_REG || REGNO (base) == R13_REG)) + len++; /* An index requires the two-byte modrm form */ if (index /* ...like esp (or r12), which always wants an index. */ || base == arg_pointer_rtx || base == frame_pointer_rtx - || (base REG_P (base) - (REGNO (base) == SP_REG || REGNO (base) == R12_REG))) - len += 1; + || (base (REGNO (base) == SP_REG || REGNO (base) == R12_REG))) + len++; } - switch (parts.seg) -{ -case SEG_FS: -case SEG_GS: - len += 1; - break; -default: - break; -} - - /* If this is not LEA instruction, add the length of addr32 prefix. */ - if (TARGET_64BIT !lea - ((base GET_MODE (base) == SImode) - || (index GET_MODE (index) == SImode))) -len += 1; - return len; }
[PATCH] Fix PR55021
Somehow bogus truncations slipped through in my LTO overflowed INTEGER_CST streaming patch. Oops. Committed as obvious. Richard. 2012-10-22 Richard Biener rguent...@suse.de PR lto/55021 * tree-streamer-in.c (unpack_ts_int_cst_value_fields): Remove bogus truncations. Index: gcc/tree-streamer-in.c === --- gcc/tree-streamer-in.c (revision 192688) +++ gcc/tree-streamer-in.c (working copy) @@ -146,8 +146,8 @@ unpack_ts_base_value_fields (struct bitp static void unpack_ts_int_cst_value_fields (struct bitpack_d *bp, tree expr) { - TREE_INT_CST_LOW (expr) = (unsigned) bp_unpack_var_len_unsigned (bp); - TREE_INT_CST_HIGH (expr) = (unsigned) bp_unpack_var_len_int (bp); + TREE_INT_CST_LOW (expr) = bp_unpack_var_len_unsigned (bp); + TREE_INT_CST_HIGH (expr) = bp_unpack_var_len_int (bp); }
Remove def operands cache, try 2
Hi, On Tue, 11 Sep 2012, Michael Matz wrote: the operands cache is ugly. This patch removes it at least for the def operands, saving three pointers for roughly each normal statement (the pointer in gsbase, and two pointers from def_optype_d). This is relatively easy to do, because all statements except ASMs have at most one def (and one vdef), which themself aren't pointed to by something else, unlike the use operands which have more structure for the SSA web. Performance wise the patch is a slight improvement (1% for some C++ testcases, but relatively noisy, but at least not slower), bootstrap time is unaffected. As the iterator is a bit larger code size increases by 1 promille. The patch is regstrapped on x86_64-linux. If it's approved I'll adjust the WORD count markers in gimple.h, I left it out in this submission as it's just verbose noise in comments. So, 2nd try after some internal feedback. This version changes the operand order of asms to also have the defs at the beginning, which makes the iterators slightly nicer, and joins some more fields of the iterator, though not all that we could merge. Again, if approved I'll adjust the word count markers. Regstrapping on x86_64-linux in progress, speed similar as before. Okay for trunk? Ciao, Michael. -- * tree-ssa-operands.h (struct def_optype_d, def_optype_p): Remove. (ssa_operands.free_defs): Remove. (DEF_OP_PTR, DEF_OP): Remove. (struct ssa_operand_iterator_d): Remove 'defs', add 'flags' members, rename 'phi_stmt' to 'stmt', 'phi_i' to 'i' and 'num_phi' to 'numops'. * gimple.h (gimple_statement_with_ops.def_ops): Remove. (gimple_def_ops, gimple_set_def_ops): Remove. (gimple_vdef_op): Don't take const gimple, adjust. (gimple_asm_input_op, gimple_asm_input_op_ptr, gimple_asm_set_input_op, gimple_asm_output_op, gimple_asm_output_op_ptr, gimple_asm_set_output_op): Adjust asserts, and rewrite to move def operands to front. (gimple_asm_clobber_op, gimple_asm_set_clobber_op, gimple_asm_label_op, gimple_asm_set_label_op): Correct asserts. * tree-ssa-operands.c (build_defs): Remove. (init_ssa_operands): Don't initialize it. (fini_ssa_operands): Don't free it. (cleanup_build_arrays): Don't truncate it. (finalize_ssa_stmt_operands): Don't assert on it. (alloc_def, add_def_op, append_def): Remove. (finalize_ssa_defs): Remove building of def_ops list. (finalize_ssa_uses): Don't mark for SSA renaming here, ... (add_stmt_operand): ... but here, don't call append_def. (get_indirect_ref_operands): Remove recurse_on_base argument. (get_expr_operands): Adjust call to get_indirect_ref_operands. (verify_ssa_operands): Don't check def operands. (free_stmt_operands): Don't free def operands. * gimple.c (gimple_copy): Don't clear def operands. * tree-flow-inline.h (op_iter_next_use): Adjust to explicitely handle def operand. (op_iter_next_tree, op_iter_next_def): Ditto. (clear_and_done_ssa_iter): Clear new fields. (op_iter_init): Adjust to setup new iterator structure. (op_iter_init_phiuse): Adjust. Index: tree-ssa-operands.h === --- tree-ssa-operands.h.orig2012-09-24 15:24:52.0 +0200 +++ tree-ssa-operands.h 2012-10-22 15:12:30.0 +0200 @@ -34,14 +34,6 @@ typedef ssa_use_operand_t *use_operand_p #define NULL_USE_OPERAND_P ((use_operand_p)NULL) #define NULL_DEF_OPERAND_P ((def_operand_p)NULL) -/* This represents the DEF operands of a stmt. */ -struct def_optype_d -{ - struct def_optype_d *next; - tree *def_ptr; -}; -typedef struct def_optype_d *def_optype_p; - /* This represents the USE operands of a stmt. */ struct use_optype_d { @@ -68,7 +60,6 @@ struct GTY(()) ssa_operands { bool ops_active; - struct def_optype_d * GTY ((skip ())) free_defs; struct use_optype_d * GTY ((skip ())) free_uses; }; @@ -82,9 +73,6 @@ struct GTY(()) ssa_operands { #define USE_OP_PTR(OP) (((OP)-use_ptr)) #define USE_OP(OP) (USE_FROM_PTR (USE_OP_PTR (OP))) -#define DEF_OP_PTR(OP) ((OP)-def_ptr) -#define DEF_OP(OP) (DEF_FROM_PTR (DEF_OP_PTR (OP))) - #define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI) #define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI)) #define SET_PHI_RESULT(PHI, V) SET_DEF (PHI_RESULT_PTR (PHI), (V)) @@ -133,13 +121,13 @@ enum ssa_op_iter_type { typedef struct ssa_operand_iterator_d { - bool done; enum ssa_op_iter_type iter_type; - def_optype_p defs; + bool done; + int flags; + unsigned i; + unsigned numops; use_optype_p uses; - int phi_i; - int num_phi; - gimple phi_stmt; + gimple stmt; } ssa_op_iter; /* These flags are used to
[PATCH, ARM] arm_return_in_msb needs to handle TImode.
Hi, I observed the following failure on arm big-endian: FAIL: tmpdir-g++.dg-struct-layout-1/t024 cp_compat_x_tst.o compile, (internal compiler error) The compiler is configured as: armeb-montavista-linux-gnueabi-gcc -v Using built-in specs. COLLECT_GCC=./armeb-tools/bin/armeb-montavista-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/home/manjunath/NCDtools/mips/toolchain/armeb-tools/bin/../libexec/gcc/armeb-montavista-linux-gnueabi/4.7.0/lto-wrapper Target: armeb-montavista-linux-gnueabi Configured with: /home/manjunath/NCDtools/mips/toolchain/scripts/../src/configure --disable-fixed-point --without-ppl --without-python --disable-werror --enable-checking --with-sysroot --with-local-prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools/armeb-montavista-linux-gnueabi/sys-root --disable-sim --enable-symvers=gnu --enable-__cxa_atexit --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-tune=cortex-a9 --target=armeb-montavista-linux-gnueabi --enable-languages=c,c++ --prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools Thread model: posix gcc version 4.7.0 () Debugging shows that ITmode is not handled by arm_return_in_msb debug snip ... {{{ void test2001() void test2002() void test2003() void test2004() void test2005() Breakpoint 1, shift_return_value (mode=TImode, left_p=0 '\000', value=0x7033b380) at /home/manjunath/NCDtools/mips/toolchain/scripts/../src/gcc/calls.c:2127 2127 gcc_assert (REG_P (value) HARD_REGISTER_P (value)); (gdb) p mode $1 = TImode (gdb) p left_p $2 = 0 '\000' (gdb) p debug_rtx(value) (parallel:TI [ (expr_list:REG_DEP_TRUE (reg:DI 63 s0) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:DI 65 s2) (const_int 8 [0x8])) ]) $3 = void }}} I have attached the patch which fixes the above problem, kindly review the patch and accept it for mainline. Regards, Manjunath S Matti. TImode_fix.patch Description: TImode_fix.patch
Re: [PATCH] Fix PR55011
Hi, On Mon, 22 Oct 2012, Richard Biener wrote: This fixes PR55011, it seems nothing checks for invalid lattice transitions in VRP, That makes sense, because the individual parts of VRP that produce new ranges are supposed to not generate invalid transitions. So if anything such checking should be an assert and the causes be fixed. so the following adds that It's a work around ... since we now can produce a lot more UNDEFINED than before ... for this. We should never produce UNDEFINED when the input wasn't UNDEFINED already. not doing so triggers issues. Hmm? Ciao, Michael.
Re: [PATCH, ARM] arm_return_in_msb needs to handle TImode.
On 22/10/12 15:14, Matti, Manjunath wrote: Hi, I observed the following failure on arm big-endian: FAIL: tmpdir-g++.dg-struct-layout-1/t024 cp_compat_x_tst.o compile, (internal compiler error) The compiler is configured as: armeb-montavista-linux-gnueabi-gcc -v Using built-in specs. COLLECT_GCC=./armeb-tools/bin/armeb-montavista-linux-gnueabi-gcc COLLECT_LTO_WRAPPER=/home/manjunath/NCDtools/mips/toolchain/armeb-tools/bin/../libexec/gcc/armeb-montavista-linux-gnueabi/4.7.0/lto-wrapper Target: armeb-montavista-linux-gnueabi Configured with: /home/manjunath/NCDtools/mips/toolchain/scripts/../src/configure --disable-fixed-point --without-ppl --without-python --disable-werror --enable-checking --with-sysroot --with-local-prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools/armeb-montavista-linux-gnueabi/sys-root --disable-sim --enable-symvers=gnu --enable-__cxa_atexit --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-tune=cortex-a9 --target=armeb-montavista-linux-gnueabi --enable-languages=c,c++ --prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools Thread model: posix gcc version 4.7.0 () Debugging shows that ITmode is not handled by arm_return_in_msb debug snip ... {{{ void test2001() void test2002() void test2003() void test2004() void test2005() Breakpoint 1, shift_return_value (mode=TImode, left_p=0 '\000', value=0x7033b380) at /home/manjunath/NCDtools/mips/toolchain/scripts/../src/gcc/calls.c:2127 2127 gcc_assert (REG_P (value) HARD_REGISTER_P (value)); (gdb) p mode $1 = TImode (gdb) p left_p $2 = 0 '\000' (gdb) p debug_rtx(value) (parallel:TI [ (expr_list:REG_DEP_TRUE (reg:DI 63 s0) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:DI 65 s2) (const_int 8 [0x8])) ]) $3 = void }}} I have attached the patch which fixes the above problem, kindly review the patch and accept it for mainline. Regards, Manjunath S Matti.= TImode_fix.patch N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý That doesn't look right. The test is far too specific. Even if this is the right place for the fix (and I'm yet to be convinced that it is), you should be testing that the size of mode is less than some limit, not that it's not a specific mode. R.
Re: [PATCH] Fix PR55011
On Mon, 22 Oct 2012, Michael Matz wrote: Hi, On Mon, 22 Oct 2012, Richard Biener wrote: This fixes PR55011, it seems nothing checks for invalid lattice transitions in VRP, That makes sense, because the individual parts of VRP that produce new ranges are supposed to not generate invalid transitions. So if anything such checking should be an assert and the causes be fixed. No, the checking should be done in update_value_range which copies the new VR over to the lattice. The job of that function is also to detect lattice changes. so the following adds that It's a work around ... No. since we now can produce a lot more UNDEFINED than before ... for this. We should never produce UNDEFINED when the input wasn't UNDEFINED already. Why? We shouldn't update the lattice this way, yes, but that is what the patch ensures. The workers only compute a new value-range for a stmt based on input value ranges. not doing so triggers issues. Hmm? It oscillates and thus never finishes. Richard.
Re: [Patch, Fortran] PR 54997: -Wunused-function gives false warnings for procedures passed as actual argument
Minor update to the patch: It now also sets TREE_USED for entry masters in order to avoid bogus warnings for procedures with ENTRY (cf. comment 6 in the PR, which like comment 0 is a 4.8 regression). Still regtests cleanly. Ok? Cheers, Janus 2012/10/21 Janus Weil ja...@gcc.gnu.org: Hi all, here is another patch to silence some more of the bogus warnings about unused functions that gfortran is currently throwing (cf. also the previous patch for PR 54224). It fixes the usage of the 'referenced' attribute, which should only be given to procedures which are actually 'used' (called/referenced). Then TREE_USED is set according to this attribute, which in turn silences the warning in the middle-end. The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2012-10-21 Janus Weil ja...@gcc.gnu.org PR fortran/54997 * decl.c (match_procedure_decl): Don't set 'referenced' attribute for PROCEDURE declarations. * parse.c (gfc_fixup_sibling_symbols,parse_contained): Don't set 'referenced' attribute for all contained procedures. * trans-decl.c (gfc_get_symbol_decl): Allow for unreferenced procedures. (build_function_decl): Set TREE_USED for referenced procedures. 2012-10-21 Janus Weil ja...@gcc.gnu.org PR fortran/54997 * gfortran.dg/warn_unused_function_2.f90: New. warn_unused_function_2.f90 Description: Binary data pr54997_v2.diff Description: Binary data
Re: [PATCH] Fix PR55011
Hi, On Mon, 22 Oct 2012, Richard Biener wrote: On Mon, 22 Oct 2012, Michael Matz wrote: Hi, On Mon, 22 Oct 2012, Richard Biener wrote: This fixes PR55011, it seems nothing checks for invalid lattice transitions in VRP, That makes sense, because the individual parts of VRP that produce new ranges are supposed to not generate invalid transitions. So if anything such checking should be an assert and the causes be fixed. No, the checking should be done in update_value_range Exactly. And that's the routine you're changing, but you aren't adding checking, you silently fix invalid transitions. What I tried to say is that the one calling update_value_range with new_vr being UNDEFINED is wrong, and update_value_range shouldn't fix it, but assert, so that this wrong caller may be fixed. which copies the new VR over to the lattice. The job of that function is also to detect lattice changes. Sure, but not to fix invalid input. so the following adds that It's a work around ... No. since we now can produce a lot more UNDEFINED than before ... for this. We should never produce UNDEFINED when the input wasn't UNDEFINED already. Why? Because doing so _always_ means an invalid lattice transition. UNDEFINED is TOP, anything not UNDEFINED is not TOP. So going from something to UNDEFINED is always going upward the lattice and hence in the wrong direction. We shouldn't update the lattice this way, yes, but that is what the patch ensures. An assert ensures. A work around works around a problem. I say that the problem is in those routines that produced the new UNDEFINED range in the first place, and it's not update_value_range's job to fix that after the fact. The workers only compute a new value-range for a stmt based on input value ranges. And if they produce UNDEFINED when the input wasn't so, then _that's_ where the bug is. not doing so triggers issues. Hmm? It oscillates and thus never finishes. I'm not sure I understand. You claim that the workers have to produce UNDEFINED from non-UNDEFINED in some cases, otherwise we oscillate? That sounds strange. Or do you mean that we oscillate without your patch to update_value_range? That I believe, it's the natural result of going a lattice the wrong way, but I say that update_value_range is not the place to silently fix invalid transitions. Ciao, Michael.
Re: Fix array bound niter estimate (PR middle-end/54937)
Hi, here is updated patch with the comments. The fortran failures turned out to be funny interaction in between this patch and my other change that hoped that loop closed SSA is closed on VOPs, but it is not. Regtested x86_64-linux, bootstrap in progress, OK? Honza * tree-ssa-loop-niter.c (record_estimate): Do not try to lower the bound of non-is_exit statements. (maybe_lower_iteration_bound): Do it here. (estimate_numbers_of_iterations_loop): Call it. * gcc.c-torture/execute/pr54937.c: New testcase. * gcc.dg/tree-ssa/cunroll-2.c: Update. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192632) +++ tree-ssa-loop-niter.c (working copy) @@ -2535,7 +2541,6 @@ record_estimate (struct loop *loop, tree gimple at_stmt, bool is_exit, bool realistic, bool upper) { double_int delta; - edge exit; if (dump_file (dump_flags TDF_DETAILS)) { @@ -2570,14 +2577,10 @@ record_estimate (struct loop *loop, tree } /* Update the number of iteration estimates according to the bound. - If at_stmt is an exit or dominates the single exit from the loop, - then the loop latch is executed at most BOUND times, otherwise - it can be executed BOUND + 1 times. */ - exit = single_exit (loop); - if (is_exit - || (exit != NULL - dominated_by_p (CDI_DOMINATORS, -exit-src, gimple_bb (at_stmt + If at_stmt is an exit then the loop latch is executed at most BOUND times, + otherwise it can be executed BOUND + 1 times. We will lower the estimate + later if such statement must be executed on last iteration */ + if (is_exit) delta = double_int_zero; else delta = double_int_one; @@ -2953,6 +2956,110 @@ gcov_type_to_double_int (gcov_type val) return ret; } +/* See if every path cross the loop goes through a statement that is known + to not execute at the last iteration. In that case we can decrese iteration + count by 1. */ + +static void +maybe_lower_iteration_bound (struct loop *loop) +{ + pointer_set_t *not_executed_last_iteration = pointer_set_create (); + struct nb_iter_bound *elt; + bool found_exit = false; + VEC (basic_block, heap) *queue = NULL; + bitmap visited; + + /* Collect all statements with interesting (i.e. lower than + nb_iterations_upper_bound) bound on them. + + TODO: Due to the way record_estimate choose estimates to store, the bounds + will be always nb_iterations_upper_bound-1. We can change this to record + also statements not dominating the loop latch and update the walk bellow + to the shortest path algorthm. */ + for (elt = loop-bounds; elt; elt = elt-next) +{ + if (!elt-is_exit + elt-bound.ult (loop-nb_iterations_upper_bound)) + { + if (!not_executed_last_iteration) + not_executed_last_iteration = pointer_set_create (); + pointer_set_insert (not_executed_last_iteration, elt-stmt); + } +} + if (!not_executed_last_iteration) +return; + + /* Start DFS walk in the loop header and see if we can reach the + loop latch or any of the exits (including statements with side + effects that may terminate the loop otherwise) without visiting + any of the statements known to have undefined effect on the last + iteration. */ + VEC_safe_push (basic_block, heap, queue, loop-header); + visited = BITMAP_ALLOC (NULL); + bitmap_set_bit (visited, loop-header-index); + found_exit = false; + + do +{ + basic_block bb = VEC_pop (basic_block, queue); + gimple_stmt_iterator gsi; + bool stmt_found = false; + + /* Loop for possible exits and statements bounding the execution. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + if (pointer_set_contains (not_executed_last_iteration, stmt)) + { + stmt_found = true; + break; + } + if (gimple_has_side_effects (stmt)) + { + found_exit = true; + break; + } + } + if (found_exit) + break; + + /* If no bounding statement is found, continue the walk. */ + if (!stmt_found) + { + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb-succs) + { + if (loop_exit_edge_p (loop, e) + || e == loop_latch_edge (loop)) + { + found_exit = true; + break; + } + if (bitmap_set_bit (visited, e-dest-index)) + VEC_safe_push (basic_block, heap, queue, e-dest); + } + } +} + while (VEC_length (basic_block, queue) !found_exit); + + /* If every path through the loop reach bounding statement before exit, + then we know
Minor record_upper_bound tweek
Hi, with profile feedback we may misupdate the profile and start to believe that loops iterate more times than they do. This patch makes at least nb_iterations_estimate no greater than nb_iterations_upper_bound. This makes the unrolling/peeling/unswitching heuristics to behave more consistently. Bootstrapped/regtested x86_64-linux, OK? Honza * tree-sssa-loop-niter.c (record_niter_bound): Be sure that realistic estimate is not bigger than upper bound. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192632) +++ tree-ssa-loop-niter.c (working copy) @@ -2506,13 +2506,20 @@ record_niter_bound (struct loop *loop, d { loop-any_upper_bound = true; loop-nb_iterations_upper_bound = i_bound; + if (loop-any_estimate + i_bound.ult (loop-nb_iterations_estimate)) +loop-nb_iterations_estimate = i_bound; } if (realistic (!loop-any_estimate || i_bound.ult (loop-nb_iterations_estimate))) { loop-any_estimate = true; - loop-nb_iterations_estimate = i_bound; + if (loop-nb_iterations_upper_bound.ult (i_bound) + loop-any_upper_bound) +loop-nb_iterations_estimate = loop-nb_iterations_upper_bound; + else +loop-nb_iterations_estimate = i_bound; } /* If an upper bound is smaller than the realistic estimate of the
Loop closed SSA loop update
Hi, this patch updates tree_unroll_loops_completely to update loop closed SSA. WHen unlooping the loop some basic blocks may move out of the other loops and that makes the need to check their use and add PHIs. Fortunately update_loop_close_ssa already support local updates and thus this can be done quite cheaply by recoridng the blocks in fix_bb_placements and passing it along. I tried the patch with TODO_update_ssa_no_phi but that causes weird bug in 3 fortran testcases because VOPS seems to not be in the loop closed form. We can track this incrementally I suppose. Bootstrapped/regtested x86_64-linux, OK? Honza PR middle-end/54967 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Take loop_closed_ssa_invalidated parameter; pass it along. (canonicalize_loop_induction_variables): Update loop closed SSA. (tree_unroll_loops_completely): Likewise. * cfgloop.h (unloop): UPdate prototype. * cfgloopmanip.c (fix_bb_placements): Record BBs updated into optional bitmap. (unloop): Update to pass along loop_closed_ssa_invalidated. * gfortran.dg/pr54967.f90: New testcase. Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 192632) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -390,13 +390,16 @@ loop_edge_to_cancel (struct loop *loop) EXIT is the exit of the loop that should be eliminated. IRRED_INVALIDATED is used to bookkeep if information about irreducible regions may become invalid as a result - of the transformation. */ + of the transformation. + LOOP_CLOSED_SSA_INVALIDATED is used to bookkepp the case + when we need to go into loop closed SSA form. */ static bool try_unroll_loop_completely (struct loop *loop, edge exit, tree niter, enum unroll_level ul, - bool *irred_invalidated) + bool *irred_invalidated, + bitmap loop_closed_ssa_invalidated) { unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; gimple cond; @@ -562,7 +565,7 @@ try_unroll_loop_completely (struct loop locus = latch_edge-goto_locus; /* Unloop destroys the latch edge. */ - unloop (loop, irred_invalidated); + unloop (loop, irred_invalidated, loop_closed_ssa_invalidated); /* Create new basic block for the latch edge destination and wire it in. */ @@ -615,7 +618,8 @@ static bool canonicalize_loop_induction_variables (struct loop *loop, bool create_iv, enum unroll_level ul, bool try_eval, - bool *irred_invalidated) + bool *irred_invalidated, + bitmap loop_closed_ssa_invalidated) { edge exit = NULL; tree niter; @@ -663,7 +667,8 @@ canonicalize_loop_induction_variables (s (int)max_loop_iterations_int (loop)); } - if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated)) + if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated, + loop_closed_ssa_invalidated)) return true; if (create_iv @@ -683,13 +688,15 @@ canonicalize_induction_variables (void) struct loop *loop; bool changed = false; bool irred_invalidated = false; + bitmap loop_closed_ssa_invalidated = BITMAP_ALLOC (NULL); FOR_EACH_LOOP (li, loop, 0) { changed |= canonicalize_loop_induction_variables (loop, true, UL_SINGLE_ITER, true, - irred_invalidated); + irred_invalidated, + loop_closed_ssa_invalidated); } gcc_assert (!need_ssa_update_p (cfun)); @@ -701,6 +708,13 @@ canonicalize_induction_variables (void) evaluation could reveal new information. */ scev_reset (); + if (!bitmap_empty_p (loop_closed_ssa_invalidated)) +{ + gcc_checking_assert (loops_state_satisfies_p (LOOP_CLOSED_SSA)); + rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); +} + BITMAP_FREE (loop_closed_ssa_invalidated); + if (changed) return TODO_cleanup_cfg; return 0; @@ -794,11 +808,15 @@ tree_unroll_loops_completely (bool may_i bool changed; enum unroll_level ul; int iteration = 0; + bool irred_invalidated = false; do { - bool irred_invalidated = false; changed = false; + bitmap loop_closed_ssa_invalidated = NULL; + + if (loops_state_satisfies_p (LOOP_CLOSED_SSA)) + loop_closed_ssa_invalidated = BITMAP_ALLOC (NULL);
Patch ping
Hi! http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01538.html - PR54844 with lots of dups, C++ FE ICE with sizeof in template http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01700.html - PR54970 small DW_OP_GNU_implicit_pointer improvements - the dwarf2out.c and tree-sra.c bits of the patch already acked, but cfgexpand.c and var-tracking.c bits are not Jakub
Re: [PATCH] Intrinsics for fxsave[,64], xsave[,64], xsaveopt[,64]
On Mon, Oct 22, 2012 at 5:25 PM, Alexander Ivchenko aivch...@gmail.com wrote: Please take a look at the updated patch. There is, thanks to Uros, changed expander and asm patterns. Considering H.J.'s comments: 1) Yes, I added new option -mxsaveopt 2) No.The FXSAVE and FXRSTOR instructions are not considered part of the SSE instruction group. 3) Done. 4) Fixed. 5) I'm not sure, there was already BIT_FXSAVE in cpuid.h, that had been using in /libgcc/config/i386/crtfastmath.c. I didn't change that. May be it would be enough to change the option name from -mfxsave to -mfxsr? 6) Not sure about the list of all processors, that support those features. I added to those I know support them. 7) Done. Restore-type insns do not store to memory, but read memory, so they should be defined like: [(unspec_volatile [(match_operand:BLK 0 memory_operand m)] UNSPECV_FXRSTOR)] Where save-type insn should look like: [(set (match_operand:BLK 0 memory_operand =m) (unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE)] When they also read additional registers: [(unspec_volatile:BLK [(match_operand:BLK 0 memory_operand m) (match_operand:SI 1 register_operand a) (match_operand:SI 2 register_operand d)] UNSPECV_XRSTOR)] and [(set (match_operand:BLK 0 memory_operand =m) (unspec_volatile:BLK [(match_operand:SI 1 register_operand a) (match_operand:SI 2 register_operand d)] UNSPECV_XSAVE)] (And in similar way a 32bit patterns with DImode operand). I missed this detail in my previous review. BTW: BLKmode is a bit unusual, so I hope these patterns work as expected. Also, please do not use mem and mask in the headers; use __P and __M for pointer and mask, as is the case in other headers. Uros.
Re: Minimize downward code motion during reassociation
On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. Why make this all so complicated? It seems to me that we should fixup stmt order only after the whole ops vector has been materialized. + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe-op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe-op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb +gimple_uid (insert_stmt) gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just
Re: Fix bugs introduced by switch-case profile propagation
Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m profile_status != PROFILE_ABSENT insn JUMP_P (insn) - any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + any_condjump_p (insn) + !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb-succs) prob_sum += e-probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list-code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge-probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge-probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge-probability = default_prob; + if (default_edge) +default_edge-probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
[PATCH] Fix CSE RTL sharing ICE (PR rtl-optimization/55010)
Hi! On the following testcase we have IF_THEN_ELSE in insn notes, and when folding it, folded_arg1 is a subreg from earlier CC setter, as the other argument has equiv constant, simplify_relational_operation is called on it to simplify it and we end up with invalid RTL sharing of the subreg in between the CC setter insn and the insn with the REG_EQ* note. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-22 Jakub Jelinek ja...@redhat.com PR rtl-optimization/55010 * cse.c (fold_rtx): Call copy_rtx on folded_arg{0,1} before passing it to simplify_relational_operation. * gcc.dg/pr55010.c: New test. --- gcc/cse.c.jj2012-10-16 13:15:45.0 +0200 +++ gcc/cse.c 2012-10-22 10:44:34.100033945 +0200 @@ -3461,8 +3461,8 @@ fold_rtx (rtx x, rtx insn) } { - rtx op0 = const_arg0 ? const_arg0 : folded_arg0; - rtx op1 = const_arg1 ? const_arg1 : folded_arg1; + rtx op0 = const_arg0 ? const_arg0 : copy_rtx (folded_arg0); + rtx op1 = const_arg1 ? const_arg1 : copy_rtx (folded_arg1); new_rtx = simplify_relational_operation (code, mode, mode_arg0, op0, op1); } break; --- gcc/testsuite/gcc.dg/pr55010.c.jj 2012-10-22 10:47:47.289857369 +0200 +++ gcc/testsuite/gcc.dg/pr55010.c 2012-10-22 10:47:33.0 +0200 @@ -0,0 +1,13 @@ +/* PR rtl-optimization/55010 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-additional-options -march=i686 { target { { i?86-*-* x86_64-*-* } ia32 } } } */ + +long long int a; +unsigned long long int b; + +void +foo (void) +{ + a = (a 0) / ((a -= b) ? b = ((b = a) || 0) : 0); +} Jakub
[C++ PATCH] Fix cplus_decl_attributes (PR c++/54988)
Hi! cplus_decl_attributes assumes that if attributes is NULL, there is nothing to do in decl_attributes, unfortunately that call can add implicit attributes based on currently active pragmas, at least for FUNCTION_DECLs. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-22 Jakub Jelinek ja...@redhat.com PR c++/54988 * decl2.c (cplus_decl_attributes): Don't return early if attributes is NULL. * c-c++-common/pr54988.c: New test. --- gcc/cp/decl2.c.jj 2012-10-08 21:37:27.0 +0200 +++ gcc/cp/decl2.c 2012-10-22 12:43:04.994700609 +0200 @@ -1309,8 +1309,7 @@ void cplus_decl_attributes (tree *decl, tree attributes, int flags) { if (*decl == NULL_TREE || *decl == void_type_node - || *decl == error_mark_node - || attributes == NULL_TREE) + || *decl == error_mark_node) return; if (processing_template_decl) @@ -1319,8 +1318,6 @@ cplus_decl_attributes (tree *decl, tree return; save_template_attributes (attributes, decl); - if (attributes == NULL_TREE) - return; } cp_check_const_attributes (attributes); --- gcc/testsuite/c-c++-common/pr54988.c.jj 2012-10-22 12:50:56.332853880 +0200 +++ gcc/testsuite/c-c++-common/pr54988.c2012-10-22 12:50:04.0 +0200 @@ -0,0 +1,20 @@ +/* PR c++/54988 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-additional-options -msse2 { target { i?86-*-* x86_64-*-* } } } */ + +#if defined(__i386__) || defined(__x86_64__) +#pragma GCC target fpmath=sse +#endif + +static inline __attribute__ ((always_inline)) int +foo (int x) +{ + return x; +} + +int +bar (int x) +{ + return foo (x); +} Jakub
[PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
Hi! On the following testcase we have two endless loops before cddce2: Sender_signal (int Connect) { int State; unsigned int occurrence; bb 2: if (Connect_6(D) != 0) goto bb 8; else goto bb 7; bb 3: # occurrence_8 = PHI 0(7), occurrence_12(4) occurrence_12 = occurrence_8 + 1; __builtin_printf (Sender_Signal occurrence %u\n, occurrence_12); bb 4: goto bb 3; bb 5: bb 6: goto bb 5; bb 7: goto bb 3; bb 8: goto bb 5; } The problem are the two empty bbs on the path from the conditional at the end of bb2 and the endless loops (i.e. bb7 and bb8). In presence of infinite loops dominance.c adds fake edges to exit pretty arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs computes post-dominance and adds fake edges to exit), so with the above testcases both bb7 and bb8 have exit block as immediate post-dominator, so find_control_dependence stops at those bb's when starting from the 2-7 resp. 2-8 edges. bb7/bb8 don't have a control stmt at the end, so mark_last_stmt_necessary doesn't mark any stmt as necessary in them and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary and the whole endless loop with printfs in it is removed. The following patch fixes it by detecting such problematic blocks and recursing on them in mark_control_dependence_edges_necessary. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-10-22 Jakub Jelinek ja...@redhat.com PR tree-optimization/55018 * tree-ssa-dce.c (mark_last_stmt_necessary): Return bool whether mark_stmt_necessary was called. (mark_control_dependence_edges_necessary): Recurse on cd_bb if mark_last_stmt_necessary hasn't marked a control stmt, cd_bb has exit block as immediate dominator and a single succ edge. * gcc.dg/torture/pr55018.c: New test. --- gcc/tree-ssa-dce.c.jj 2012-08-15 10:55:33.0 +0200 +++ gcc/tree-ssa-dce.c 2012-10-22 16:50:03.011497546 +0200 @@ -381,7 +381,7 @@ mark_stmt_if_obviously_necessary (gimple /* Mark the last statement of BB as necessary. */ -static void +static bool mark_last_stmt_necessary (basic_block bb) { gimple stmt = last_stmt (bb); @@ -391,7 +391,11 @@ mark_last_stmt_necessary (basic_block bb /* We actually mark the statement only if it is a control statement. */ if (stmt is_ctrl_stmt (stmt)) -mark_stmt_necessary (stmt, true); +{ + mark_stmt_necessary (stmt, true); + return true; +} + return false; } @@ -423,8 +427,18 @@ mark_control_dependent_edges_necessary ( continue; } - if (!TEST_BIT (last_stmt_necessary, cd_bb-index)) - mark_last_stmt_necessary (cd_bb); + if (!TEST_BIT (last_stmt_necessary, cd_bb-index) + !mark_last_stmt_necessary (cd_bb)) + { + /* In presence of infinite loops, some bbs on a path +to an infinite loop might not end with a control stmt, +but due to a fake edge to exit stop find_control_dependence. +Recurse for those. */ + if (get_immediate_dominator (CDI_POST_DOMINATORS, cd_bb) + == EXIT_BLOCK_PTR + single_succ_p (cd_bb)) + mark_control_dependent_edges_necessary (cd_bb, el, false); + } } if (!skipped) --- gcc/testsuite/gcc.dg/torture/pr55018.c.jj 2012-10-22 16:53:56.623083723 +0200 +++ gcc/testsuite/gcc.dg/torture/pr55018.c 2012-10-22 16:54:21.278934668 +0200 @@ -0,0 +1,22 @@ +/* PR tree-optimization/55018 */ +/* { dg-do compile } */ +/* { dg-options -fdump-tree-optimized } */ + +void +foo (int x) +{ + unsigned int a = 0; + int b = 3; + if (x) +b = 0; +lab: + if (x) +goto lab; + a++; + if (b != 2) +__builtin_printf (%u, a); + goto lab; +} + +/* { dg-final { scan-tree-dump printf optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Jakub
Re: Fix PR 53701
On 16.10.2012 11:50, Andrey Belevantsev wrote: The below is the port of this patch to 4.7, took longer than expected but still. Will commit after retesting on x86-64 (testing on ia64 is already fine) and with the fix for PR 53975. Now the same patch is also committed to 4.6 after more wait and testing. Andrey
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 9:35 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! On the following testcase we have two endless loops before cddce2: Sender_signal (int Connect) { int State; unsigned int occurrence; bb 2: if (Connect_6(D) != 0) goto bb 8; else goto bb 7; bb 3: # occurrence_8 = PHI 0(7), occurrence_12(4) occurrence_12 = occurrence_8 + 1; __builtin_printf (Sender_Signal occurrence %u\n, occurrence_12); bb 4: goto bb 3; bb 5: bb 6: goto bb 5; bb 7: goto bb 3; bb 8: goto bb 5; } The problem are the two empty bbs on the path from the conditional at the end of bb2 and the endless loops (i.e. bb7 and bb8). In presence of infinite loops dominance.c adds fake edges to exit pretty arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs computes post-dominance and adds fake edges to exit), so with the above testcases both bb7 and bb8 have exit block as immediate post-dominator, so find_control_dependence stops at those bb's when starting from the 2-7 resp. 2-8 edges. bb7/bb8 don't have a control stmt at the end, so mark_last_stmt_necessary doesn't mark any stmt as necessary in them and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary and the whole endless loop with printfs in it is removed. I'm not sure I'm following this alright, but AFAICT bb7 and bb8 are control-dependent on the if in bb2. To preserve the infinite-loop semantics the control parent of the infinite loop must be inherently preserved (because empty infinite loops can't mark any feeding statements). So shouldn't the code in find_obviously_necessary_stmts that handles infinite loops mark the last statement of control parents necessary? Ciao! Steven
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 09:48:16PM +0200, Steven Bosscher wrote: On Mon, Oct 22, 2012 at 9:35 PM, Jakub Jelinek ja...@redhat.com wrote: On the following testcase we have two endless loops before cddce2: Sender_signal (int Connect) { int State; unsigned int occurrence; bb 2: if (Connect_6(D) != 0) goto bb 8; else goto bb 7; bb 3: # occurrence_8 = PHI 0(7), occurrence_12(4) occurrence_12 = occurrence_8 + 1; __builtin_printf (Sender_Signal occurrence %u\n, occurrence_12); bb 4: goto bb 3; bb 5: bb 6: goto bb 5; bb 7: goto bb 3; bb 8: goto bb 5; } The problem are the two empty bbs on the path from the conditional at the end of bb2 and the endless loops (i.e. bb7 and bb8). In presence of infinite loops dominance.c adds fake edges to exit pretty arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs computes post-dominance and adds fake edges to exit), so with the above testcases both bb7 and bb8 have exit block as immediate post-dominator, so find_control_dependence stops at those bb's when starting from the 2-7 resp. 2-8 edges. bb7/bb8 don't have a control stmt at the end, so mark_last_stmt_necessary doesn't mark any stmt as necessary in them and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary and the whole endless loop with printfs in it is removed. I'm not sure I'm following this alright, but AFAICT bb7 and bb8 are control-dependent on the if in bb2. To preserve the infinite-loop semantics the control parent of the infinite loop must be inherently preserved (because empty infinite loops can't mark any feeding statements). So shouldn't the code in find_obviously_necessary_stmts that handles infinite loops mark the last statement of control parents necessary? If bb7 and bb8 aren't there and bb2 branches directly to bb3 and bb5, then things work correctly, find_control_dependence then says that the 2-3 edge is control parent of bb3 and bb4 (bb3 immediate post-dominator is bb4, bb4 is immediately post-dominated through fake edge by exit) and similarly 2-5 edge is control parent of bb5 and bb6. Then find_obviously_necessary_stmts does: FOR_EACH_LOOP (li, loop, 0) if (!finite_loop_p (loop)) { if (dump_file) fprintf (dump_file, can not prove finiteness of loop %i\n, loop-num); mark_control_dependent_edges_necessary (loop-latch, el, false); } and that marks the control stmt in bb2 as necessary, because edge 2-3 is in bb3 and bb4 bitmap and edge 2-5 is in bb5 and bb6 control dependence bitmap. The problem with bb7/bb8 is that because they have fake edges to exit too, find_control_dependence stops at them, thus 2-7 is considered control parent of bb7 and 2-8 control parent of bb8, and 7-3 is considered control parent of bb3 and bb4 and 8-5 of bb5 and bb6. Thus, mark_control_dependent_edges_necessary called on say the bb4 latch calls marks_last_stmt_necessary on bb7, but, there is no last stmt in that bb, nothing to mark necessary and it silently stops there. What my patch does is change it so that in that case it doesn't stop there, but recurses. Jakub
Re: unordered set design modification
Attached patch applied. 2012-10-22 François Dumont fdum...@gcc.gnu.org * include/bits/unordered_set.h (unordered_set): Prefer aggregation to inheritance with _Hashtable. (unordered_multiset): Likewise. * include/debug/unordered_set (operator==): Adapt. * include/profile/unordered_set (operator==): Adapt. I will now take care of unordered_map and unordered_multimap. François On 10/22/2012 12:21 AM, Jonathan Wakely wrote: On 21 October 2012 20:43, François Dumont wrote: On 10/21/2012 06:21 PM, Jonathan Wakely wrote: On 20 October 2012 22:07, François Dumont wrote: Hi Following remarks in PR 53067 regarding design of unordered containers Which remarks specifically? My understanding was that Paolo's suggestion to redesign things was to avoid public inheritance, which we now do anyway. here is a patch to prefer aggregation to inheritance with _Hashtable. I hope it is what you had in mind Jonathan. If so I will do the same for unordered_[multi]map. Are you referring to my comments in the hashtable local iterator thread last December? Because IIRC my concern was about deriving from the user-supplied Hash and Pred types and this new patch doesn't alter that. What is the advantage of this new patch? (Apologies if I'm forgetting some other suggestion of mine.) I think my concerns about deriving from user-supplied types are addressed by using the EBO helper (which prevents deriving from types with virtual functions, as the vptr makes the class non-empty) and by using private inheritance. This patch is coming from this remark: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52942#c4 You should be careful when you do remarks, they can have a strong impact ;-) Ah yes, that comment. As hinted at there, I was concerned about inheriting virtual functions, but that's avoided by the EBO helper. And I still think that using std::tuple would have avoided all the issues with inheritance and kept the advantages of the EBO. That would be too big a redesign now though. I fully agree with this remark just because for me encapsulation is a very important concept and aggregation offers better encapsulation than inheritance. This way unordered containers will expose only Standard methods. It doesn't fix any known issue at the moment even if this clean design would have avoid the 53067 issue. It doesn't expose any non-standard members now that we use private inheritance. I do think composition is better than inheritance, but I'm concerned about more churn to that code, it would be nice if it settled down soon! But since we still need to exploit the EBO for the node allocator, I guess the code still needs to change anyway, so I'm ok with your patch. Using the EBO for empty allocators reduces sizeof(unordered_setint) from 64 to 56, although it obviously changes the layout of the class in an incompatible way. It's unfortunate the allocator is the first member. Some comments on the comments: + * @param __n Minimal initial number of bucket. Should be buckets + * @param __x An %unordere_set of identical element and allocator unordered_set + * The newly-created %unordered_set contains the exact contents of @a x. Should be __x not x. This comment won't always be true once we add C++11 allocator support, but we can fix the comment when that happens. + * All the elements of @a __x are copied, but unlike the copy + * constructor, the allocator object is not copied. This might not be true either, depending on the allocator. + * This function fills a %unordered_set with copies of the elements in an not a + /// Returns the allocator object with which the %unordered_set was + /// constructed. In C++11 allocators can be replaced after construction. + * Insertion requires atmortized constant time. amortized (in several places) + * This function only makes sense for unordered_multisets; for + * unordered_set the result will either be 0 (not present) or 1 + * (present). I don't like these only makes sense comments, but I realise they're just copied from std:set so nevermind. + * @brief Returns the number of element in a given bucket. elements The same issues occur in the unordered_multiset comments. Unless Paolo has any other comments about the patch then it's OK with the comment fixes. Thanks! Index: include/bits/unordered_set.h === --- include/bits/unordered_set.h (revision 192694) +++ include/bits/unordered_set.h (working copy) @@ -91,41 +91,624 @@ class _Pred = std::equal_to_Value, class _Alloc = std::allocator_Value class unordered_set -: public __uset_hashtable_Value, _Hash, _Pred, _Alloc { - typedef __uset_hashtable_Value, _Hash, _Pred, _Alloc _Base; + typedef __uset_hashtable_Value, _Hash, _Pred, _Alloc _Hashtable; + _Hashtable _M_h; public: - typedef
Re: [MIPS] Implement static stack checking
Eric Botcazou ebotca...@adacore.com writes: This implements static stack checking for MIPS, i.e. checking of the static part of the frame in the prologue when -fstack-check is specified. This is very similar to the PowerPC and SPARC implementations and makes it possible to pass the full ACATS testsuite with -fstack-check. Tested on mips64el-linux-gnu (n32/32/64), OK for the mainline? The Ada bits I'll leave to you. :-) The config/mips stuff looks good, but a couple of nits: +(define_insn probe_stack_rangeP:mode + [(set (match_operand:P 0 register_operand =r) + (unspec_volatile:P [(match_operand:P 1 register_operand 0) + (match_operand:P 2 register_operand r)] + UNSPEC_PROBE_STACK_RANGE))] + + * return mips_output_probe_stack_range (operands[0], operands[2]); + [(set_attr type unknown) + (set_attr can_delay no) + (set_attr mode MODE)]) Please use d rather than r in these constraints. Please use: { return mips_output_probe_stack_range (operands[0], operands[2]); } for the output line. +/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE, + inclusive. These are offsets from the current stack pointer. */ + +static void +mips_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size) +{ This function doesn't work with MIPS16 mode. Maybe just: if (TARGET_MIPS16) sorry (MIPS16 stack probes); (We can't test TARGET_MIPS16 in something like STACK_CHECK_STATIC_BUILTIN because MIPS16ness is a per-function property.) + /* See if we have a constant small number of probes to generate. If so, + that's the easy case. */ + if (first + size = 32768) +{ + HOST_WIDE_INT i; + + /* Probe at FIRST + N * PROBE_INTERVAL for values of N from 1 until + it exceeds SIZE. If only one probe is needed, this will not + generate any code. Then probe at FIRST + SIZE. */ + for (i = PROBE_INTERVAL; i size; i += PROBE_INTERVAL) +emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, + -(first + i))); + + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, +-(first + size))); +} + + /* Otherwise, do the same as above, but in a loop. Note that we must be + extra careful with variables wrapping around because we might be at + the very top (or the very bottom) of the address space and we have + to be able to handle this case properly; in particular, we use an + equality test for the loop condition. */ + else +{ + HOST_WIDE_INT rounded_size; + rtx r3 = gen_rtx_REG (Pmode, GP_REG_FIRST + 3); + rtx r12 = gen_rtx_REG (Pmode, GP_REG_FIRST + 12); Please use MIPS_PROLOGUE_TEMP for r3 (and probably rename r3). I suppose GP_REG_FIRST + 12 should be MIPS_PROLOGUE_TEMP2, probably as: #define MIPS_PROLOGUE_TEMP2_REGNUM \ (TARGET_MIPS16 ? gcc_unreachable () \ cfun-machine-interrupt_handler_p ? K1_REG_NUM : GP_REG_FIRST + 12) #define MIPS_PROLOGUE_TEMP2(MODE) \ gen_rtx_REG (MODE, MIPS_PROLOGUE_TEMP2_REGNUM) and update the block comment above the MIPS_PROLOGUE_TEMP_REGNUM definition. + /* Sanity check for the addressing mode we're going to use. */ + gcc_assert (first = 32768); + + + /* Step 1: round SIZE to the previous multiple of the interval. */ + + rounded_size = size -PROBE_INTERVAL; + + + /* Step 2: compute initial and final value of the loop counter. */ + + /* TEST_ADDR = SP + FIRST. */ + emit_insn (gen_rtx_SET (VOIDmode, r3, + plus_constant (Pmode, stack_pointer_rtx, + -first))); + + /* LAST_ADDR = SP + FIRST + ROUNDED_SIZE. */ + if (rounded_size 32768) + { + emit_move_insn (r12, GEN_INT (rounded_size)); + emit_insn (gen_rtx_SET (VOIDmode, r12, + gen_rtx_MINUS (Pmode, r3, r12))); + } + else + emit_insn (gen_rtx_SET (VOIDmode, r12, + plus_constant (Pmode, r3, -rounded_size))); + + + /* Step 3: the loop + + while (TEST_ADDR != LAST_ADDR) + { + TEST_ADDR = TEST_ADDR + PROBE_INTERVAL + probe at TEST_ADDR + } + + probes at FIRST + N * PROBE_INTERVAL for values of N from 1 + until it is equal to ROUNDED_SIZE. */ + + if (TARGET_64BIT TARGET_LONG64) + emit_insn (gen_probe_stack_rangedi (r3, r3, r12)); + else + emit_insn (gen_probe_stack_rangesi (r3, r3, r12)); + + + /* Step 4: probe at FIRST + SIZE if we cannot assert at compile-time + that SIZE is equal to ROUNDED_SIZE. */ + + if (size != rounded_size) + emit_stack_probe (plus_constant (Pmode, r12, rounded_size - size)); I Might Be Wrong, but it looks like this won't probe at FIRST + SIZE in the case where SIZE ==
Re: Constant-fold vector comparisons
On Mon, 15 Oct 2012, Richard Biener wrote: On Fri, Oct 12, 2012 at 4:07 PM, Marc Glisse marc.gli...@inria.fr wrote: On Sat, 29 Sep 2012, Marc Glisse wrote: 1) it handles constant folding of vector comparisons, 2) it fixes another place where vectors are not expected Here is a new version of this patch. In a first try, I got bitten by the operator priorities in a b?c:d, which g++ doesn't warn about. 2012-10-12 Marc Glisse marc.gli...@inria.fr gcc/ * tree-ssa-forwprop.c (forward_propagate_into_cond): Handle vectors. * fold-const.c (fold_relational_const): Handle VECTOR_CST. gcc/testsuite/ * gcc.dg/tree-ssa/foldconst-6.c: New testcase. Here is a new version, with the same ChangeLog plus * doc/generic.texi (VEC_COND_EXPR): Document current policy. Which means I'd prefer if you simply condition the existing ~ and ^ handling on COND_EXPR. Done. - if (integer_onep (tmp)) + if ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR) + ? integer_all_onesp (tmp) : integer_onep (tmp)) and cache gimple_assign_rhs_code as a 'code' variable at the beginning of the function. Done. + if (TREE_CODE (op0) == VECTOR_CST TREE_CODE (op1) == VECTOR_CST) +{ + int count = VECTOR_CST_NELTS (op0); + tree *elts = XALLOCAVEC (tree, count); + gcc_assert (TREE_CODE (type) == VECTOR_TYPE); A better check would be that VECTOR_CST_NELTS of type is the same as that of op0. I wasn't sure which check you meant, so I added both possibilities. I am fine with removing either or both, actually. Ok with these changes. A few too many changes, I prefer to re-post, in case. On Tue, 16 Oct 2012, Richard Biener wrote: I liked your idea of the signed boolean vector, as a way to express that we know some vector can only have values 0 and -1, but I am not sure how to use it. Ah no, I didn't mean to suggest that ;) Maybe you didn't, but I still took the idea from your words ;-) Thus, as we defined true to -1 and false to 0 we cannot, unless relaxing what VEC_COND_EXRP treats as true or false, optimize any of ~ or ^ -1 away. It seems to me that what prevents from optimizing is if we want to keep the door open for a future relaxation of what VEC_COND_EXPR accepts as its first argument. Which means: produce only -1 and 0, but don't assume we are only reading -1 and 0 (unless we have a reason to know it, for instance because it is the result of a comparison), and don't assume any specific interpretation on those other values. Not sure how much that limits possible optimizations. I'm not sure either - I'd rather leave the possibility open until we see a compelling reason to go either way (read: a testcase where it matters in practice). Ok, I implemented the safe way. My current opinion is that we should go with a VEC_COND_EXPR that only accepts 0 and -1 (it is easy to pass a LT_EXPR or NE_EXPR as first argument if that is what one wants), but it can wait. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c === --- gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-ccp1 } */ + +typedef long vec __attribute__ ((vector_size (2 * sizeof(long; + +vec f () +{ + vec a = { -2, 666 }; + vec b = { 3, 2 }; + return a b; +} + +/* { dg-final { scan-tree-dump-not 666 ccp1} } */ +/* { dg-final { cleanup-tree-dump ccp1 } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 192695) +++ gcc/fold-const.c(working copy) @@ -16123,20 +16123,45 @@ fold_relational_const (enum tree_code co TREE_IMAGPART (op0), TREE_IMAGPART (op1)); if (code == EQ_EXPR) return fold_build2 (TRUTH_ANDIF_EXPR, type, rcond, icond); else if (code == NE_EXPR) return fold_build2 (TRUTH_ORIF_EXPR, type, rcond, icond); else return NULL_TREE; } + if (TREE_CODE (op0) == VECTOR_CST TREE_CODE (op1) == VECTOR_CST) +{ + unsigned count = VECTOR_CST_NELTS (op0); + tree *elts = XALLOCAVEC (tree, count); + gcc_assert (VECTOR_CST_NELTS (op1) == count); + gcc_assert (TYPE_VECTOR_SUBPARTS (type) == count); + + for (unsigned i = 0; i count; i++) + { + tree elem_type = TREE_TYPE (type); + tree elem0 = VECTOR_CST_ELT (op0, i); + tree elem1 = VECTOR_CST_ELT (op1, i); + + tree tem = fold_relational_const (code, elem_type, + elem0,
Re: [PATCH][RFC] Re-organize how we stream trees in LTO
On 10/16/12, Diego Novillo dnovi...@google.com wrote: On 2012-10-16 10:43 , Richard Biener wrote: Diego - is PTH still live? Thus, do I need to bother about inventing things in a way that can be hook-ized? We will eventually revive PPH. But not in the short term. I think it will come back when/if we start implementing C++ modules. Jason, Lawrence, is that something that you see coming for the next standard? There are some people working on it, though not very publically. Many folks would like to see modules in the next full standard, probably circa 2017. It is likely that the design point for standard modules will differ from PPH, and so I don't think that the current PPH implementation should serve as a constraint on other work. I suspect that the front end will need to distance itself from 'tree' and have its own streamable IL. So, the hooks may not be something we need to keep long term. Emitting the trees in SCC groups should not affect the C++ streamer too much. It already is doing its own strategy of emitting tree headers so it can do declaration and type merging. As long as the trees can be fully materialized from the SCC groups, it should be fine. -- Lawrence Crowl
Re: unordered set design modification
On 22 October 2012 20:59, François Dumont wrote: Attached patch applied. 2012-10-22 François Dumont fdum...@gcc.gnu.org * include/bits/unordered_set.h (unordered_set): Prefer aggregation to inheritance with _Hashtable. (unordered_multiset): Likewise. * include/debug/unordered_set (operator==): Adapt. * include/profile/unordered_set (operator==): Adapt. + //@{ Do these comments work correctly? I think it needs to be ///@{ for Doxygen to recognise it. I will now take care of unordered_map and unordered_multimap. Thanks. It occurs to me now that the copy and move operations could be defaulted, since all they do is forward to the member, which is both copyable and movable.
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 10:27:52PM +0200, Steven Bosscher wrote: I understand what your patch does, but I don't understand why it is correct. Why are there fake edges from bb7 and bb8 to exit when both are reverse-reachable from exit via the infinite loops? The infinite loops should be connected to exit, and bb7 and bb8 should be found by the DFS from the really dead ends in the cfg. See what dominance.c does: if (saw_unconnected) { FOR_EACH_BB_REVERSE (b) { if (di-dfs_order[b-index]) continue; bitmap_set_bit (di-fake_exit_edge, b-index); di-dfs_order[b-index] = di-dfsnum; di-dfs_to_bb[di-dfsnum] = b; di-dfs_parent[di-dfsnum] = di-dfs_order[last_basic_block]; di-dfsnum++; calc_dfs_tree_nonrec (di, b, reverse); } } bb7/bb8 (i.e. all bbs that are always in the end followed by infinite loops) as well as all the bbs on the infinite loops are processed the above way, they have no real path to exit, so aren't processed on the first iteration, they aren't processed even after adding fake edges from zero successor bbs. calc_dfs_tree then picks pretty much random bbs (one with highest index), adds fake edge to it, walks it, then goes on with other bbs that are still unconnected. dominance.c doesn't use cfgloop.h (can it? Isn't it used before loops are computed, perhaps after loops destroyed, etc.), so there is no guarantee that loop-latch of endless loop will have the fake edge added and no other bb before it. As 7 and 8 are bigger than 4 or 6, the above loop starts with bb 8, finds that its predecessor has already been searched and stops there, similarly for 7, then goes on with 6 with another fake edge to exit. Jakub
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 10:39 PM, Jakub Jelinek ja...@redhat.com wrote: dominance.c doesn't use cfgloop.h (can it? Isn't it used before loops are computed, perhaps after loops destroyed, etc.), so there is no guarantee that loop-latch of endless loop will have the fake edge added and no other bb before it. As 7 and 8 are bigger than 4 or 6, the above loop starts with bb 8, finds that its predecessor has already been searched and stops there, similarly for 7, then goes on with 6 with another fake edge to exit. At least it looks like some of the cfganal DFS code could be used in dominance.c. I will have a look. A hack like the following should result in no fake edges for bb7 and bb8. Ciao! Steven Index: dominance.c === --- dominance.c (revision 192517) +++ dominance.c (working copy) @@ -353,12 +353,15 @@ pretend that there is an edge to the exit block. In the second case, we wind up with a forest. We need to process all noreturn blocks before we know if we've got any infinite loops. */ - + int *revcfg_postorder = XNEWVEC (int, n_basic_blocks); + int n = inverted_post_order_compute (revcfg_postorder); + unsigned int i = (unsigned) n; basic_block b; bool saw_unconnected = false; - FOR_EACH_BB_REVERSE (b) + while (i) { + basic_block b = revcfg_postorder[--i]; if (EDGE_COUNT (b-succs) 0) { if (di-dfs_order[b-index] == 0) @@ -375,8 +378,10 @@ if (saw_unconnected) { - FOR_EACH_BB_REVERSE (b) + i = n; + while (i) { + basic_block b = revcfg_postorder[--i]; if (di-dfs_order[b-index]) continue; bitmap_set_bit (di-fake_exit_edge, b-index);
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 10:51:43PM +0200, Steven Bosscher wrote: On Mon, Oct 22, 2012 at 10:39 PM, Jakub Jelinek ja...@redhat.com wrote: dominance.c doesn't use cfgloop.h (can it? Isn't it used before loops are computed, perhaps after loops destroyed, etc.), so there is no guarantee that loop-latch of endless loop will have the fake edge added and no other bb before it. As 7 and 8 are bigger than 4 or 6, the above loop starts with bb 8, finds that its predecessor has already been searched and stops there, similarly for 7, then goes on with 6 with another fake edge to exit. At least it looks like some of the cfganal DFS code could be used in dominance.c. I will have a look. A hack like the following should result in no fake edges for bb7 and bb8. Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c and call it in calc_dfs_tree on each unconnected bb? I.e. (untested with the exception of the testcase): 2012-10-22 Jakub Jelinek ja...@redhat.com PR tree-optimization/55018 * cfganal.c (dfs_find_deadend): No longer static. * basic-block.h (dfs_find_deadend): New prototype. * dominance.c (calc_dfs_tree): If saw_unconnected, traverse from dfs_find_deadend of unconnected b instead of b directly. * gcc.dg/torture/pr55018.c: New test. --- gcc/cfganal.c.jj2012-08-14 08:45:00.0 +0200 +++ gcc/cfganal.c 2012-10-22 23:04:29.620117666 +0200 @@ -593,7 +593,7 @@ post_order_compute (int *post_order, boo that all blocks in the region are reachable by starting an inverted traversal from the returned block. */ -static basic_block +basic_block dfs_find_deadend (basic_block bb) { sbitmap visited = sbitmap_alloc (last_basic_block); --- gcc/basic-block.h.jj2012-10-17 17:18:21.0 +0200 +++ gcc/basic-block.h 2012-10-17 17:18:21.0 +0200 @@ -787,6 +787,7 @@ extern void remove_fake_exit_edges (void extern void add_noreturn_fake_exit_edges (void); extern void connect_infinite_loops_to_exit (void); extern int post_order_compute (int *, bool, bool); +extern basic_block dfs_find_deadend (basic_block); extern int inverted_post_order_compute (int *); extern int pre_and_rev_post_order_compute (int *, int *, bool); extern int dfs_enumerate_from (basic_block, int, --- gcc/dominance.c.jj 2012-08-15 10:55:26.0 +0200 +++ gcc/dominance.c 2012-10-22 23:07:00.941220792 +0200 @@ -377,14 +377,18 @@ calc_dfs_tree (struct dom_info *di, bool { FOR_EACH_BB_REVERSE (b) { + basic_block b2; if (di-dfs_order[b-index]) continue; - bitmap_set_bit (di-fake_exit_edge, b-index); - di-dfs_order[b-index] = di-dfsnum; - di-dfs_to_bb[di-dfsnum] = b; + b2 = dfs_find_deadend (b); + gcc_checking_assert (di-dfs_order[b2-index] == 0); + bitmap_set_bit (di-fake_exit_edge, b2-index); + di-dfs_order[b2-index] = di-dfsnum; + di-dfs_to_bb[di-dfsnum] = b2; di-dfs_parent[di-dfsnum] = di-dfs_order[last_basic_block]; di-dfsnum++; - calc_dfs_tree_nonrec (di, b, reverse); + calc_dfs_tree_nonrec (di, b2, reverse); + gcc_checking_assert (di-dfs_order[b-index]); } } } --- gcc/testsuite/gcc.dg/torture/pr55018.c.jj 2012-10-22 16:53:56.623083723 +0200 +++ gcc/testsuite/gcc.dg/torture/pr55018.c 2012-10-22 16:54:21.278934668 +0200 @@ -0,0 +1,22 @@ +/* PR tree-optimization/55018 */ +/* { dg-do compile } */ +/* { dg-options -fdump-tree-optimized } */ + +void +foo (int x) +{ + unsigned int a = 0; + int b = 3; + if (x) +b = 0; +lab: + if (x) +goto lab; + a++; + if (b != 2) +__builtin_printf (%u, a); + goto lab; +} + +/* { dg-final { scan-tree-dump printf optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Jakub
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek ja...@redhat.com wrote: Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c and call it in calc_dfs_tree on each unconnected bb? I.e. (untested with the exception of the testcase): Better yet, I have a patch in testing now to use cfganal's machinery to compute the DFS forest. Hold on a bit, I'll post it ASAP (probably Wednesday) if that's early enough for you. (Oh, and feel free to assign the PR to me ;-) Ciao! Steven
Re: [MIPS] Implement static stack checking
This function doesn't work with MIPS16 mode. Maybe just: if (TARGET_MIPS16) sorry (MIPS16 stack probes); (We can't test TARGET_MIPS16 in something like STACK_CHECK_STATIC_BUILTIN because MIPS16ness is a per-function property.) I put if (TARGET_MIPS16) sorry (-fstack-check=specific not implemented for MIPS16); Please use MIPS_PROLOGUE_TEMP for r3 (and probably rename r3). I suppose GP_REG_FIRST + 12 should be MIPS_PROLOGUE_TEMP2, probably as: #define MIPS_PROLOGUE_TEMP2_REGNUM \ (TARGET_MIPS16 ? gcc_unreachable () \ cfun-machine-interrupt_handler_p ? K1_REG_NUM : GP_REG_FIRST + 12) #define MIPS_PROLOGUE_TEMP2(MODE) \ gen_rtx_REG (MODE, MIPS_PROLOGUE_TEMP2_REGNUM) and update the block comment above the MIPS_PROLOGUE_TEMP_REGNUM definition. Done. I Might Be Wrong, but it looks like this won't probe at FIRST + SIZE in the case where SIZE == ROUNDED_SIZE, because the loop exits on that value without probing it. Should the last line be unconditional, or does the loop need to be a do-while instead? (I suppose the latter, so that there isn't a hole bigger than PROBE_INTERVAL in the SIZE != ROUNDED_SIZE case?) The loop probes at FIRST + N * PROBE_INTERVAL for values of N from 1 until it is equal to ROUNDED_SIZE, inclusive, so FIRST + SIZE is always probed. This only works in noreorder mode. If there's an asm in the function, or something else that forces reorder mode (e.g. a -mfix-* option), the addition won't be put in the delay slot. %(%beq\t%0,%1, and daddiu\t%0,%0,%1%%) should work. (Note that our MIPS asm output doesn't have a space before the delay slot; there's a blank line after it instead. That's all handled by output_asm_insn though.) Thanks for the incantation! OK with those changes, thanks. I'll retest with the changes tomorrow. Thanks for the review. -- Eric Botcazou
Re: wide int patch #6: Replacement of hwi extraction from int-csts.
On 10/19/12, Richard Biener richard.guent...@gmail.com wrote: The existing tree_low_cst function performs checking, so tree_to_hwi should as well. I don't think mismatch of signedness of the variable assigned to with the sign we use for hwi extraction is any good. C++ isn't type-safe here for the return value but if we'd use a reference as return slot we could make it so ... (in exchange for quite some ugliness IMNSHO): void tree_to_shwi (const_tree tree, HOST_WIDE_INT hwi); vs. void tree_to_uhwi (const_tree tree, unsigned HOST_WIDE_INT hwi); maybe more natural would be void hwi_from_tree (HOST_WIDE_INT hwi, const_tree tree); void hwi_from_tree (unsigned HOST_WIDE_INT hwi, const_tree tree); let the C++ bikeshedding begin! (the point is to do appropriate checking for a conversion of (INTEGER_CST) tree to HOST_WIDE_INT vs. unsigned HOST_WIDE_INT) We could add conversion operators to achieve the effect. However, we probably don't want to do so until we can make them explicit. Unfortunately, explicit conversion operators are not available until C++11. No, I don't want you to do the above transform with this patch ;) -- Lawrence Crowl
Re: [MIPS] Implement static stack checking
Doh! But in that case, rather than: 1: beq r1,r2,2f addiu r1,r1,interval b 1b sw $0,0(r1) 2: why not just: 1: addiu r1,r1,interval bne r1,r2,1b sw $0,0(r1) ? The latter will always probe once, the former won't, if ROUNDED_SIZE == 0. -- Eric Botcazou
Re: [MIPS] Implement static stack checking
Eric Botcazou ebotca...@adacore.com writes: Doh! But in that case, rather than: 1: beq r1,r2,2f addiu r1,r1,interval b 1b sw $0,0(r1) 2: why not just: 1: addiu r1,r1,interval bne r1,r2,1b sw $0,0(r1) ? The latter will always probe once, the former won't, if ROUNDED_SIZE == 0. But why do we want the loop at all if the rounded size is zero? It's a compile-time constant after all. Richard
[PATCH, committed] Fix PR55008
In straight-line strength reduction, a candidate expression of the form (type1)x + (type2)x, where type1 and type2 are compatible, results in two interpretations of the candidate with different result types. Because the types are compatible, the first interpretation can appear to be a legal basis for the second, resulting in an invalid replacement. The obvious solution is to keep a statement from serving as its own basis. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions, committed as obvious. Thanks, Bill -- Bill Schmidt, Ph.D. IBM Advance Toolchain for PowerLinux IBM Linux Technology Center wschm...@linux.vnet.ibm.com wschm...@us.ibm.com gcc: 2012-10-22 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/55008 * gimple-ssa-strength-reduction.c (find_basis_for_candidate): Don't allow a candidate to be a basis for itself under another interpretation. gcc/testsuite: 2012-10-22 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/55008 * gcc.dg/tree-ssa/pr55008.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr55008.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0) @@ -0,0 +1,17 @@ +/* This used to fail to compile; see PR55008. */ +/* { dg-do compile } */ +/* { dg-options -O2 -w } */ + +typedef unsigned long long T; + +void f(void) +{ +int a, *p; + +T b = 6309343725; + +if(*p ? (b = 1) : 0) +if(b - (a = b /= 0) ? : (a + b)) +while(1); +} + Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 192691) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -366,6 +366,7 @@ find_basis_for_candidate (slsr_cand_t c) slsr_cand_t one_basis = chain-cand; if (one_basis-kind != c-kind + || one_basis-cand_stmt == c-cand_stmt || !operand_equal_p (one_basis-stride, c-stride, 0) || !types_compatible_p (one_basis-cand_type, c-cand_type) || !dominated_by_p (CDI_DOMINATORS,
Re: [MIPS] Implement static stack checking
Sorry, one more thing (obviously a bad night) Eric Botcazou ebotca...@adacore.com writes: + if (TARGET_64BIT TARGET_LONG64) + emit_insn (gen_probe_stack_rangedi (r3, r3, r12)); + else + emit_insn (gen_probe_stack_rangesi (r3, r3, r12)); Please use: emit_insn (PMODE_INSN (gen_probe_stack_range, (r3, r3, r12))); for this. The patterns will need to be _P:mode rather than just P:mode. Richard
Re: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver
This patch (r192676) is probably causing FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/stpcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/stpncpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strcat-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strncpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Os on i?86 (see http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg02350.html ). TIA Dominique
[C++ Patch] PR 54922
Hi, today I spent quite a bit of time on this reject legal issue filed by Daniel, having to do with constrexpr constructors and anonymous union members: I didn't want to make the loop much more complex but we have to handle correctly multiple anonymous union too and of course produce correct diagnostics in all cases (eg, together with multiple members initialization diagnostics too). I figured out the below. Tested x86_64-linux, as usual. Thanks, Paolo. /cp 2012-10-22 Paolo Carlini paolo.carl...@oracle.com PR c++/54922 * semantics.c (cx_check_missing_mem_inits): Handle anonymous union members. /testsuite 2012-10-22 Paolo Carlini paolo.carl...@oracle.com PR c++/54922 * g++.dg/cpp0x/constexpr-union4.C: New. Index: testsuite/g++.dg/cpp0x/constexpr-union4.C === --- testsuite/g++.dg/cpp0x/constexpr-union4.C (revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-union4.C (working copy) @@ -0,0 +1,13 @@ +// PR c++/54922 +// { dg-do compile { target c++11 } } + +class nullable_int +{ + bool init_; + union { +unsigned char for_value_init; +int value_; + }; +public: + constexpr nullable_int() : init_(false), for_value_init() {} +}; Index: cp/semantics.c === --- cp/semantics.c (revision 192692) +++ cp/semantics.c (working copy) @@ -6139,17 +6139,23 @@ cx_check_missing_mem_inits (tree fun, tree body, b for (i = 0; i = nelts; ++i) { tree index; + tree anon_union_init_type = NULL_TREE; if (i == nelts) index = NULL_TREE; else { index = CONSTRUCTOR_ELT (body, i)-index; + /* Handle anonymous union members. */ + if (TREE_CODE (index) == COMPONENT_REF + ANON_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (index, 0 + anon_union_init_type = TREE_TYPE (TREE_OPERAND (index, 0)); /* Skip base and vtable inits. */ - if (TREE_CODE (index) != FIELD_DECL - || DECL_ARTIFICIAL (index)) + else if (TREE_CODE (index) != FIELD_DECL + || DECL_ARTIFICIAL (index)) continue; } - for (; field != index; field = DECL_CHAIN (field)) + for (; field != index TREE_TYPE (field) != anon_union_init_type; + field = DECL_CHAIN (field)) { tree ftype; if (TREE_CODE (field) != FIELD_DECL
Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)
On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek ja...@redhat.com wrote: Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c and call it in calc_dfs_tree on each unconnected bb? I.e. (untested with the exception of the testcase): FWIW, dfs_find_deadend looks broken to me for this usage case. It could return a self-loop block with more than one successor. For a pre-order search like dominance.c needs, you'd have to look as deep as possible, something like this: Index: cfganal.c === --- cfganal.c (revision 192696) +++ cfganal.c (working copy) @@ -598,18 +598,26 @@ dfs_find_deadend (basic_block bb) { sbitmap visited = sbitmap_alloc (last_basic_block); sbitmap_zero (visited); + basic_block next_bb = NULL; + edge_iterator ei; + edge e; for (;;) { SET_BIT (visited, bb-index); - if (EDGE_COUNT (bb-succs) == 0 - || TEST_BIT (visited, EDGE_SUCC (bb, 0)-dest-index)) + /* Look for any not yet visited successors. +If all successors have been visited then +this is the dead end we're looking for. */ + FOR_EACH_EDGE (e, ei, bb-succs) + if (! TEST_BIT (visited, e-dest-index)) + break; + if (e == NULL) { sbitmap_free (visited); return bb; } - bb = EDGE_SUCC (bb, 0)-dest; + bb = e-dest; } gcc_unreachable (); (And the (EDGE_COUNT(bb-succs) == 0) is unnecessary for inverted_post_order_compute because it already puts all such blocks on the initial work list :-) Ciao! Steven
Re: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver
On Tue, 23 Oct 2012, Dominique Dhumieres wrote: This patch (r192676) is probably causing FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/stpcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/stpncpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strcat-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strcpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/strncpy-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Os FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Os on i?86 (see http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg02350.html ). Confirmed, now PR55030. I'll revert that patch pending further investigation. Feel free to open PR's whenever something like this happens. brgds, H-P
[lra] patch to fix several testsuite failures
The following patch fixes several new testsuite failures. Committed as rev. 192657. 2012-10-22 Vladimir Makarov vmaka...@redhat.com * inherit_reload_reg (inherit_reload_reg): Print bb numbers too. (need_for_split_p): Don't split eliminable registers. (fix_bb_live_info): Don't use EXECUTE_IF_AND_IN_BITMAP. Index: lra-constraints.c === --- lra-constraints.c (revision 192689) +++ lra-constraints.c (working copy) @@ -3939,8 +3939,8 @@ inherit_reload_reg (bool def_p, int orig /* We now have a new usage insn for original regno. */ setup_next_usage_insn (original_regno, new_insns, reloads_num, false); if (lra_dump_file != NULL) -fprintf (lra_dump_file,Original reg change %d-%d:\n, -original_regno, REGNO (new_reg)); +fprintf (lra_dump_file,Original reg change %d-%d (bb%d):\n, +original_regno, REGNO (new_reg), BLOCK_FOR_INSN (insn)-index); lra_reg_info[REGNO (new_reg)].restore_regno = original_regno; bitmap_set_bit (check_only_regs, REGNO (new_reg)); bitmap_set_bit (check_only_regs, original_regno); @@ -3969,8 +3969,10 @@ inherit_reload_reg (bool def_p, int orig lra_update_insn_regno_info (usage_insn); if (lra_dump_file != NULL) { - fprintf (lra_dump_file, Inheritance reuse change %d-%d:\n, - original_regno, REGNO (new_reg)); + fprintf (lra_dump_file, + Inheritance reuse change %d-%d (bb%d):\n, + original_regno, REGNO (new_reg), + BLOCK_FOR_INSN (usage_insn)-index); debug_rtl_slim (lra_dump_file, usage_insn, usage_insn, -1, 0); } @@ -4015,6 +4017,13 @@ need_for_split_p (HARD_REG_SET potential lra_assert (hard_regno = 0); return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno) + /* Don't split eliminable hard registers, otherwise we can + split hard registers like hard frame pointer, which + lives on BB start/end according to DF-infrastructure, + when there is a pseudo assigned to the register and + living in the same BB. */ + (regno = FIRST_PSEUDO_REGISTER + || ! TEST_HARD_REG_BIT (eliminable_regset, hard_regno)) ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno) /* We need at least 2 reloads to make pseudo splitting profitable. We should provide hard regno splitting in @@ -4284,7 +4293,7 @@ update_ebb_live_info (rtx head, rtx tail edge e; edge_iterator ei; - last_bb = BLOCK_FOR_INSN (tail); + last_bb = BLOCK_FOR_INSN (tail); prev_bb = NULL; for (curr_insn = tail; curr_insn != PREV_INSN (head); @@ -4492,7 +4501,7 @@ inherit_in_ebb (rtx head, rtx tail) after_p = (! JUMP_P (last_insn) (! CALL_P (last_insn) || (find_reg_note (last_insn, - REG_NORETURN, NULL) == NULL_RTX + REG_NORETURN, NULL_RTX) == NULL_RTX ! SIBLING_CALL_P (last_insn; REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (curr_bb)); IOR_HARD_REG_SET (live_hard_regs, eliminable_regset); @@ -4800,7 +4809,6 @@ lra_inheritance (void) edge e; timevar_push (TV_LRA_INHERITANCE); - lra_inheritance_iter++; if (lra_dump_file != NULL) fprintf (lra_dump_file, \n** Inheritance #%d: **\n\n, @@ -4867,11 +4875,9 @@ fix_bb_live_info (bitmap live, bitmap re unsigned int regno; bitmap_iterator bi; - EXECUTE_IF_AND_IN_BITMAP (removed_pseudos, live, 0, regno, bi) -{ - bitmap_clear_bit (live, regno); + EXECUTE_IF_SET_IN_BITMAP (removed_pseudos, 0, regno, bi) +if (bitmap_clear_bit (live, regno)) bitmap_set_bit (live, lra_reg_info[regno].restore_regno); -} } /* Return regno of the (subreg of) REG. Otherwise, return a negative
[PATCH v3] Add support for sparc compare-and-branch
Differences from v2: 1) If another control transfer comes right after a cbcond we take an enormous performance penalty, some 20 cycles or more. The documentation specifically warns about this, so emit a nop when we encounter this scenerio. 2) Add a heuristic to avoid using cbcond if we know at RTL emit time that we're going to compare against a constant that does not fit in the tiny 5-bit signed immediate field. 3) Use cbcond for unconditional jumps too. Regstrapped on sparc-unknown-linux-gnu w/--with-cpu=niagara4. Eric and Rainer, I think that functionally this patch is fully ready to go into the tree except for the Solaris aspects which I do not have the means to work on. Have either of you made any progress in this area? Thanks! gcc/ 2012-10-12 David S. Miller da...@davemloft.net * configure.ac: Add check for assembler SPARC4 instruction support. * configure: Rebuild. * config.in: Add HAVE_AS_SPARC4 section. * config/sparc/sparc.opt (mcbcond): New option. * doc/invoke.texi: Document it. * config/sparc/constraints.md: New constraint 'A' for 5-bit signed immediates. * doc/md.texi: Document it. * config/sparc/predicates.md (arith5_operand): New predicate. * config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_CBCOND. (sparc_option_override): Likewise. (emit_cbcond_insn): New function. (emit_conditional_branch_insn): Call it. (emit_cbcond_nop): New function. (output_ubranch): Use cbcond, remove label arg. (output_cbcond): New function. * config/sparc/sparc-protos.h (output_ubranch): Update. (output_cbcond): Declare it. (emit_cbcond_nop): Likewise. * config/sparc/sparc.md (type attribute): New types 'cbcond' and uncond_cbcond. (emit_cbcond_nop): New attribute. (length attribute): Handle cbcond and uncond_cbcond. (in_call_delay attribute): Reject cbcond and uncond_cbcond. (in_branch_delay attribute): Likewise. (in_uncond_branch_delay attribute): Likewise. (in_annul_branch_delay attribute): Likewise. (*cbcond_sp32, *cbcond_sp64): New insn patterns. (jump): Rewrite into an expander. (*jump_ubranch, *jump_cbcond): New patterns. * config/sparc/niagara4.md: Match 'cbcond' and 'uncond_cbcond' in 'n4_cti'. * config/sparc/sparc.h (AS_NIAGARA4_FLAG): New macro, use it when target default is niagara4. (SPARC_SIMM5_P): Define. * config/sparc/sol2.h (AS_SPARC64_FLAG): Adjust. (AS_SPARC32_FLAG): Define. (ASM_CPU32_DEFAULT_SPEC, ASM_CPU64_DEFAULT_SPEC): Use AS_NIAGARA4_FLAG as needed. diff --git a/gcc/config.in b/gcc/config.in index b13805d..791d14a 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -266,6 +266,12 @@ #endif +/* Define if your assembler supports SPARC4 instructions. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_SPARC4 +#endif + + /* Define if your assembler supports fprnd. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_FPRND diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md index 472490f..8862ea1 100644 --- a/gcc/config/sparc/constraints.md +++ b/gcc/config/sparc/constraints.md @@ -18,7 +18,7 @@ ;; http://www.gnu.org/licenses/. ;;; Unused letters: -;;;AB +;;; B ;;;ajklq tuv xyz @@ -62,6 +62,11 @@ ;; Integer constant constraints +(define_constraint A + Signed 5-bit integer constant + (and (match_code const_int) + (match_test SPARC_SIMM5_P (ival + (define_constraint H Valid operand of double arithmetic operation (and (match_code const_double) diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md index 272c8ff..61ca801 100644 --- a/gcc/config/sparc/niagara4.md +++ b/gcc/config/sparc/niagara4.md @@ -56,7 +56,7 @@ (define_insn_reservation n4_cti 2 (and (eq_attr cpu niagara4) -(eq_attr type branch,call,sibcall,call_no_delay_slot,uncond_branch,return)) +(eq_attr type cbcond,uncond_cbcond,branch,call,sibcall,call_no_delay_slot,uncond_branch,return)) n4_slot1, nothing) (define_insn_reservation n4_fp 11 diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md index 326524b..b64e109 100644 --- a/gcc/config/sparc/predicates.md +++ b/gcc/config/sparc/predicates.md @@ -391,6 +391,14 @@ (ior (match_operand 0 register_operand) (match_operand 0 uns_small_int_operand))) +;; Return true if OP is a register, or is a CONST_INT that can fit in a +;; signed 5-bit immediate field. This is an acceptable second operand for +;; the cbcond instructions. +(define_predicate arith5_operand + (ior (match_operand 0 register_operand) + (and (match_code const_int) +(match_test SPARC_SIMM5_P (INTVAL (op)) + ;; Predicates for miscellaneous instructions. diff --git a/gcc/config/sparc/sol2.h
gcc 4.7 libgo patch committed: Set libgo version number
PR 54918 points out that libgo is not using version numbers as it should. At present none of libgo in 4.6, 4.7 and mainline are compatible with each other. This patch to the 4.7 branch sets the version number for libgo there. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to 4.7 branch. Ian Index: configure.ac === --- configure.ac (revision 191576) +++ configure.ac (working copy) @@ -11,7 +11,7 @@ AC_INIT(package-unused, version-unused,, AC_CONFIG_SRCDIR(Makefile.am) AC_CONFIG_HEADER(config.h) -libtool_VERSION=1:0:0 +libtool_VERSION=2:1:0 AC_SUBST(libtool_VERSION) AM_ENABLE_MULTILIB(, ..) Index: Makefile.am === --- Makefile.am (revision 192024) +++ Makefile.am (working copy) @@ -1753,7 +1753,8 @@ libgo_go_objs = \ libgo_la_SOURCES = $(runtime_files) -libgo_la_LDFLAGS = $(PTHREAD_CFLAGS) $(AM_LDFLAGS) +libgo_la_LDFLAGS = \ + -version-info $(libtool_VERSION) $(PTHREAD_CFLAGS) $(AM_LDFLAGS) libgo_la_LIBADD = \ $(libgo_go_objs) $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)
Rebased gccgo branch on trunk
For the last few months the gccgo branch has been based on the 4.7 branch. I just rebased it to be on trunk. I did this by removing the branch (revision 192707) and creating a new copy of it based on trunk (committed as revision 192708). Ian