Re: [5/8] Add narrow_bit_field_mem
Eric Botcazou ebotca...@adacore.com writes: This patch splits out a fairly common operation: that of narrowing a MEM to a particular mode and adjusting the bit number accordingly. I've kept with bit_field rather than bitfield for consistency with the callers, although we do have bitfield in adjust_bitfield_address. My bad. Feel free to rename it. TBH, I prefer bitfield, so I'll leave it be :-) gcc/ * expmed.c (narrow_bit_field_mem): New function. (store_bit_field_using_insv, store_bit_field_1, store_fixed_bit_field) (extract_bit_field_1): Use it. This is a good cleanup but... Index: gcc/expmed.c === --- gcc/expmed.c 2012-10-30 19:25:44.797368678 + +++ gcc/expmed.c 2012-10-30 19:25:47.730368671 + @@ -387,6 +387,23 @@ mode_for_extraction (enum extraction_pat return data-operand[opno].mode; } +/* Adjust bitfield memory MEM so that it points to the first unit of + mode MODE that contains the bitfield at bit position BITNUM. + Set NEW_BITNUM to the bit position of the field within the + new memory. */ + +static rtx +narrow_bit_field_mem (rtx mem, enum machine_mode mode, + unsigned HOST_WIDE_INT bitnum, + unsigned HOST_WIDE_INT new_bitnum) +{ + unsigned int unit = GET_MODE_BITSIZE (mode); + unsigned HOST_WIDE_INT offset = bitnum / unit * GET_MODE_SIZE (mode); + mem = adjust_bitfield_address (mem, mode, offset); + new_bitnum = bitnum % unit; + return mem; +} Ugh. :-) I cannot see any advantages in using references here except for... +/* Get a reference to the first byte of the field. */ +xop0 = narrow_bit_field_mem (xop0, byte_mode, bitnum, bitnum); +op0 = narrow_bit_field_mem (op0, byte_mode, bitnum, bitnum); ... mightily confusing the reader here. Would it be OK with a pointer, but keeping the interface the same? That's certainly fine by me. That's one of the things I'm not sure about after the C++ conversion: I've noticed some references creep in, but when should we use references and when pointers? I think Richard B made a comment about using references for things that can't be null. Richard
[ping] Fix unwind/debug info on x86-64/Windows
Original message at: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00013.html Thanks in advance. -- Eric Botcazou
Re: [5/8] Add narrow_bit_field_mem
Would it be OK with a pointer, but keeping the interface the same? That's certainly fine by me. Yes, a pointer would make things more legible here. That's one of the things I'm not sure about after the C++ conversion: I've noticed some references creep in, but when should we use references and when pointers? I think Richard B made a comment about using references for things that can't be null. I'm personally dubious about using references, especially in the middle-end of a file where all other functions use pointers; consistency should come first. -- Eric Botcazou
Re: [PATCH] Inter-bb range test optimization (PRs tree-optimization/19105, tree-optimization/21643, tree-optimization/46309)
On Tue, Oct 30, 2012 at 03:56:08PM +0100, Richard Biener wrote: Ok, but the code could really really have some more comments - functions not fitting in my 80x24 terminal without seeing any comment what happens here are a maintainance nightmare. Here is the updated patch I'm about to commit: 2012-10-31 Jakub Jelinek ja...@redhat.com PR tree-optimization/19105 PR tree-optimization/21643 PR tree-optimization/46309 * tree-ssa-reassoc.c (init_range_entry): Add STMT argument and use it if EXP is NULL. (update_range_test): Handle OPCODE equal to ERROR_MARK and oe-op NULL. (optimize_range_tests): Likewise. (final_range_test_p, suitable_cond_bb, no_side_effect_bb, get_ops, maybe_optimize_range_tests): New functions. (reassociate_bb): Call maybe_optimize_range_tests if last stmt of bb is GIMPLE_COND that hasn't been visited yet. * gcc.dg/pr19105.c: New test. * gcc.dg/pr21643.c: New test. * gcc.dg/pr46309-2.c: New test. * gcc.c-torture/execute/pr46309.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2012-10-30 18:45:07.581366165 +0100 +++ gcc/tree-ssa-reassoc.c 2012-10-31 00:13:17.173319482 +0100 @@ -1,5 +1,5 @@ /* Reassociation for trees. - Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011 + Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. Contributed by Daniel Berlin d...@dberlin.org @@ -1713,10 +1713,12 @@ struct range_entry }; /* This is similar to make_range in fold-const.c, but on top of - GIMPLE instead of trees. */ + GIMPLE instead of trees. If EXP is non-NULL, it should be + an SSA_NAME and STMT argument is ignored, otherwise STMT + argument should be a GIMPLE_COND. */ static void -init_range_entry (struct range_entry *r, tree exp) +init_range_entry (struct range_entry *r, tree exp, gimple stmt) { int in_p; tree low, high; @@ -1727,7 +1729,8 @@ init_range_entry (struct range_entry *r, r-strict_overflow_p = false; r-low = NULL_TREE; r-high = NULL_TREE; - if (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp))) + if (exp != NULL_TREE + (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp return; /* Start with simply saying EXP != 0 and then look at the code of EXP @@ -1735,12 +1738,14 @@ init_range_entry (struct range_entry *r, happen, but it doesn't seem worth worrying about this. We continue the outer loop when we've changed something; otherwise we break the switch, which will break the while. */ - low = build_int_cst (TREE_TYPE (exp), 0); + low = exp ? build_int_cst (TREE_TYPE (exp), 0) : boolean_false_node; high = low; in_p = 0; strict_overflow_p = false; is_bool = false; - if (TYPE_PRECISION (TREE_TYPE (exp)) == 1) + if (exp == NULL_TREE) +is_bool = true; + else if (TYPE_PRECISION (TREE_TYPE (exp)) == 1) { if (TYPE_UNSIGNED (TREE_TYPE (exp))) is_bool = true; @@ -1752,25 +1757,35 @@ init_range_entry (struct range_entry *r, while (1) { - gimple stmt; enum tree_code code; tree arg0, arg1, exp_type; tree nexp; location_t loc; - if (TREE_CODE (exp) != SSA_NAME) - break; + if (exp != NULL_TREE) + { + if (TREE_CODE (exp) != SSA_NAME) + break; - stmt = SSA_NAME_DEF_STMT (exp); - if (!is_gimple_assign (stmt)) - break; + stmt = SSA_NAME_DEF_STMT (exp); + if (!is_gimple_assign (stmt)) + break; + + code = gimple_assign_rhs_code (stmt); + arg0 = gimple_assign_rhs1 (stmt); + arg1 = gimple_assign_rhs2 (stmt); + exp_type = TREE_TYPE (exp); + } + else + { + code = gimple_cond_code (stmt); + arg0 = gimple_cond_lhs (stmt); + arg1 = gimple_cond_rhs (stmt); + exp_type = boolean_type_node; + } - code = gimple_assign_rhs_code (stmt); - arg0 = gimple_assign_rhs1 (stmt); if (TREE_CODE (arg0) != SSA_NAME) break; - arg1 = gimple_assign_rhs2 (stmt); - exp_type = TREE_TYPE (exp); loc = gimple_location (stmt); switch (code) { @@ -1916,7 +1931,11 @@ range_entry_cmp (const void *a, const vo [EXP, IN_P, LOW, HIGH, STRICT_OVERFLOW_P] is a merged range for RANGE and OTHERRANGE through OTHERRANGE + COUNT - 1 ranges, OPCODE and OPS are arguments of optimize_range_tests. Return - true if the range merge has been successful. */ + true if the range merge has been successful. + If OPCODE is ERROR_MARK, this is called from within + maybe_optimize_range_tests and is performing inter-bb range optimization. + Changes should be then performed right away, and whether an op is + BIT_AND_EXPR or BIT_IOR_EXPR is found in oe-rank. */ static bool update_range_test (struct range_entry *range, struct range_entry *otherrange, @@
Re: Adapt one fold-const optimization for vectors
On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote: On Tue, 30 Oct 2012, Marek Polacek wrote: On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Capital C instead of lowercase. Patch missing? Indeed, thanks for the notice. This regressed FAIL: g++.dg/other/vector-compare.C (internal compiler error) FAIL: g++.dg/other/vector-compare.C (test for excess errors) on i686-linux. The problem is that tree-vect-generic.c doesn't expand BLKmode VEC_COND_EXPR piecewise. So, either fold-const on this optimization needs to test whether expand_vec_cond_expr_p is true for the chosen pair of comparison type and operand type, or tree-vect-generic.c needs to be tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not true for the particular VEC_COND_EXPR. Jakub
RE: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: Tuesday, October 30, 2012 1:57 AM To: g...@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org Subject: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Quality Data Priority # Change from Last Report --- --- P1 23 + 23 P2 77 + 8 P3 85 + 84 --- --- Total 185 +115 Previous Report === http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html The next report will be sent by me again, announcing end of stage 1. Hi, I am working on register pressure directed hoist pass and have committed the main patch in trunk. Here I still have two patches in this area improving it. I will send these two patches recently and hope it can be included in 4.8 if OK. Thanks.
Re: [C++] Omit overflow check for new char[n]
On Mon, Oct 08, 2012 at 05:50:34PM +0200, Florian Weimer wrote: gcc/: 2012-10-08 Florian Weimer fwei...@redhat.com * init.c (build_new_1): Do not check for arithmetic overflow if inner array size is 1. gcc/testsuite/: 2012-10-08 Florian Weimer fwei...@redhat.com * g++.dg/init/new40.C: New. @@ -2450,7 +2450,13 @@ if (array_p TYPE_VEC_NEW_USES_COOKIE (elt_type)) size = size_binop (PLUS_EXPR, size, cookie_size); else - cookie_size = NULL_TREE; + { + cookie_size = NULL_TREE; + /* No size arithmetic necessary, so the size check is + not needed. */ + if (outer_nelts_check != NULL inner_size == double_int_one) + outer_nelts_check = NULL_TREE; + } I don't see any == double_int_one (or zero) comparisons in grep, so I'd say inner_size.is_one () should be used instead (which is used pretty frequently). Ditto in the second spot. Otherwise the patch looks good to me, but I'd like Jason to chime in too. /* Perform the overflow check. */ if (outer_nelts_check != NULL_TREE) size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check, @@ -2486,7 +2492,13 @@ /* Use a global operator new. */ /* See if a cookie might be required. */ if (!(array_p TYPE_VEC_NEW_USES_COOKIE (elt_type))) - cookie_size = NULL_TREE; + { + cookie_size = NULL_TREE; + /* No size arithmetic necessary, so the size check is + not needed. */ + if (outer_nelts_check != NULL inner_size == double_int_one) + outer_nelts_check = NULL_TREE; + } alloc_call = build_operator_new_call (fnname, placement, size, cookie_size, Jakub
Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4
On Tue, 30 Oct 2012, Jan Hubicka wrote: Hi, this patch implements the second part of planned change - to determine loop bounds based by shortest path discovery. This allows to bound number of iterations on loops with bounds in statements that do not dominate the latch. I originally planned to implement this as part of maybe_lower_iteration_bound, but I found doing it separately is more readable. While both performs walk of the loop body, both do that for different reasons. discover_iteration_bound_by_body_walk walks from header to latch, while maybe_lower_iteration_bound walks from header to first statement with side effects looking if there is exit on the way. Both walks can be skipped for many loops, but each with different reasons. Bootstrapped/regtested x86_64-linux, OK? * tree-ssa-loop-niter.c (double_int_cmp, bound_index, discover_iteration_bound_by_body_walk): New functions. (discover_iteration_bound_by_body_walk): Use it. * gcc.dg/tree-ssa/loop-38.c: New testcase. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) +++ tree-ssa-loop-niter.c (working copy) @@ -2955,6 +2955,234 @@ gcov_type_to_double_int (gcov_type val) return ret; } +/* Compare double ints, callback for qsort. */ + +int +double_int_cmp (const void *p1, const void *p2) +{ + const double_int *d1 = (const double_int *)p1; + const double_int *d2 = (const double_int *)p2; + if (*d1 == *d2) +return 0; + if (d1-ult (*d2)) +return -1; + return 1; +} + +/* Return index of BOUND in BOUNDS array sorted in increasing order. + Lookup by binary search. */ + +int +bound_index (VEC (double_int, heap) *bounds, double_int bound) +{ + unsigned int end = VEC_length (double_int, bounds); + unsigned int begin = 0; + + /* Find a matching index by means of a binary search. */ + while (begin != end) +{ + unsigned int middle = (begin + end) / 2; + double_int index = VEC_index (double_int, bounds, middle); + + if (index == bound) + return middle; + else if (index.ult (bound)) + begin = middle + 1; + else + end = middle; +} + gcc_unreachable (); +} + +/* Used to hold vector of queues of basic blocks bellow. */ +typedef VEC (basic_block, heap) *bb_queue; +DEF_VEC_P(bb_queue); +DEF_VEC_ALLOC_P(bb_queue,heap); + +/* We recorded loop bounds only for statements dominating loop latch (and thus + executed each loop iteration). If there are any bounds on statements not + dominating the loop latch we can improve the estimate by walking the loop + body and seeing if every path from loop header to loop latch contains + some bounded statement. */ + +static void +discover_iteration_bound_by_body_walk (struct loop *loop) +{ + pointer_map_t *bb_bounds; + struct nb_iter_bound *elt; + VEC (double_int, heap) *bounds = NULL; + VEC (bb_queue, heap) *queues = NULL; + bb_queue queue = NULL; + ptrdiff_t queue_index; + ptrdiff_t latch_index = 0; + pointer_map_t *visited; + + /* Discover what bounds may interest us. */ + for (elt = loop-bounds; elt; elt = elt-next) +{ + double_int bound = elt-bound; + + /* Exit terminates loop at given iteration, while non-exits produce undefined + effect on the next iteration. */ + if (!elt-is_exit) + { + bound += double_int_one; + /* Overflow, give up on this bound. */ + if (bound == double_int_zero) + continue; + } + + if (!loop-any_upper_bound + || bound.ult (loop-nb_iterations_upper_bound)) +VEC_safe_push (double_int, heap, bounds, bound); +} + + /* Exit early if there is nothing to do. */ + if (!bounds) +return; + + if (dump_file (dump_flags TDF_DETAILS)) +fprintf (dump_file, Trying to walk loop body to reduce the bound.\n); + + /* Sort the bounds in decreasing order. */ + qsort (VEC_address (double_int, bounds), VEC_length (double_int, bounds), + sizeof (double_int), double_int_cmp); + + /* For every basic block record the lowest bound that is guaranteed to + terminate the loop. */ + + bb_bounds = pointer_map_create (); + for (elt = loop-bounds; elt; elt = elt-next) +{ + double_int bound = elt-bound; + if (!elt-is_exit) + { + bound += double_int_one; + /* Overflow, give up on this bound. */ + if (bound == double_int_zero) + continue; + } + + if (!loop-any_upper_bound + || bound.ult (loop-nb_iterations_upper_bound)) + { + ptrdiff_t index = bound_index (bounds, bound); + void **entry = pointer_map_contains (bb_bounds, +gimple_bb (elt-stmt)); + if (!entry) + *pointer_map_insert (bb_bounds, + gimple_bb
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Tue, Oct 30, 2012 at 9:23 PM, Peter Bergner berg...@vnet.ibm.com wrote: On Tue, 2012-10-30 at 20:37 +0100, Jakub Jelinek wrote: On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote: Ok, then I'll bootstrap and regtest your suggested change while we wait for richi to comment. I'm fine with whatever you and richi decide is best. The ObjC guys should probably test it though too. I assume you think we should change the current trunk code as well? Well, I haven't looked at the ObjC failures (guess they are darwin only anyway), so have no idea whether those set section name or not. So, if my proposed test instead of the trunk one doesn't work for darwin, perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) !...). I think DECL_USER_ALIGN test is undesirable for that though, it is just fine to increase alignment of anything, after all, it is still aligned properly then. I can confirm it bootstraps and regtests without any errors...and it fixes my problem. I'm fine with that, but please give it some time on trunk before backporting it. Thanks, Richard. Peter
Re: [1/8] Remove redundant BLKmode test
On Tue, Oct 30, 2012 at 11:00 PM, Eric Botcazou ebotca...@adacore.com wrote: gcc/ * expmed.c (store_bit_field_1): Remove test for BLKmode values. This looks fine to me. Btw, I consider Eric the best person to approve changes in this area. Thus if there is any doubt all patches in this series are ok if Eric thinks they are ;) Thanks, Richard. -- Eric Botcazou
Re: [PATCH] Update source location for PRE inserted stmt
On Wed, Oct 31, 2012 at 12:57 AM, Xinliang David Li davi...@google.com wrote: It will make the location info for the newly synthesized stmt more deterministic, I think. Maybe, but it will increase the jumpiness in the debugger without actually being accurate, no? For example if the partially redundant expression is i + j; then when computed at the insertion point the values of i and j do not necessarily reflect the computed value! Instead we may compute the result of i + j using completely different components / operation. Thus I think inserted expressions should not have any debug information at all because they do not correspond to a source line. Richard. David On Tue, Oct 30, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Oct 31, 2012 at 12:00 AM, Dehao Chen wrote: This patch aims to improve debugging of optimized code. It ensures that PRE inserted statements have the same source location as the statement at the insertion point, instead of UNKNOWN_LOCATION. Wrong patch attached. However, is it really better to have the location of the insertion point than to have UNKNOWN_LOCATION? It's not where the value is computed in the source program... Ciao! Steven
Re: [5/8] Add narrow_bit_field_mem
On Wed, Oct 31, 2012 at 9:16 AM, Eric Botcazou ebotca...@adacore.com wrote: Would it be OK with a pointer, but keeping the interface the same? That's certainly fine by me. Yes, a pointer would make things more legible here. That's one of the things I'm not sure about after the C++ conversion: I've noticed some references creep in, but when should we use references and when pointers? I think Richard B made a comment about using references for things that can't be null. I'm personally dubious about using references, especially in the middle-end of a file where all other functions use pointers; consistency should come first. I agree. My comment was for isolated code parts that are being rewritten (I think it was the wide-int class). Consistency comes first. Thanks, Richard. -- Eric Botcazou
Re: Adapt one fold-const optimization for vectors
On Wed, Oct 31, 2012 at 9:24 AM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote: On Tue, 30 Oct 2012, Marek Polacek wrote: On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Capital C instead of lowercase. Patch missing? Indeed, thanks for the notice. This regressed FAIL: g++.dg/other/vector-compare.C (internal compiler error) FAIL: g++.dg/other/vector-compare.C (test for excess errors) on i686-linux. The problem is that tree-vect-generic.c doesn't expand BLKmode VEC_COND_EXPR piecewise. So, either fold-const on this optimization needs to test whether expand_vec_cond_expr_p is true for the chosen pair of comparison type and operand type, or tree-vect-generic.c needs to be tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not true for the particular VEC_COND_EXPR. The latter (tree-vect-generic.c needs to expand it). Richard. Jakub
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. I suppose you are not going to merge your private port for 4.8 and thus the wide-int changes are not a show-stopper for you. That said, I considered the main conversion to be appropriate to be defered for the next stage1. There is no advantage in disrupting the tree more at this stage. Thanks, Richard. kenny On 10/29/2012 01:56 PM, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Quality Data Priority # Change from Last Report --- --- P1 23 + 23 P2 77 + 8 P3 85 + 84 --- --- Total 185 +115 Previous Report === http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html The next report will be sent by me again, announcing end of stage 1.
Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)
On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Apart from the iteration_threshold the hookization would be straight-forward. Now I cannot decipher from the patch what functional change it introduces ;) The only change occurs if we reach an iteration count of MAX_INT iterations - which should already be indicative of a problem. At the MAX_INTth iteration, we will apply the length locking logic to instructions inside a delay slot sequence as well. If we disregard this exceptional case, there should be no functional changes unless someone defines TARGET_ADJUST_INSN_LENGTH. uid_lock_length gets initialized to allocated with XCNEWVEC. So, in particular, the elements corresponding to instructions inside delay slot sequences are initialized to zero. As default hook sets *iter_threshold to MAX_INT when inside a delay sequence, and doesn't change length, the max operation with uid_lock_length is a no-op, and *niter iter_threshold is true, hence a length change results in updating the length immediately, without changing uid_lock_length. In the case that we are not inside a delay slot sequence, the default hook leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH. Thus, when niter is 0 or larger, as is the case for the ordinary looping operation, we always find *niter = iter_threshold, and thus apply the length locking mechanism. If we are in the preliminary pass, or doing the single !increasing iteration, niter is set to -1, so *inter iter_threshold is always true, so again we update the length immediately. Ok, I see. The patch is ok then. There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to warrant introducing a hook without removing the macro IMHO. Ok, I can prepare a patch for that, even though it makes it even a bit less obvious that there's no functional change. That would be nice (you can do that as followup). What about the preparatory patch http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ? Can I check that in now? Richard S. had some comments there so I defer to him. Richard.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Richard Biener richard.guent...@gmail.com writes: On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. I suppose you are not going to merge your private port for 4.8 and thus the wide-int changes are not a show-stopper for you. That said, I considered the main conversion to be appropriate to be defered for the next stage1. There is no advantage in disrupting the tree more at this stage. I would like the wide_int class and rtl stuff to go in 4.8 though. IMO it's a significant improvement in its own right, and Kenny submitted it well before the deadline. Richard
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 6:27 PM, Dehao Chen de...@google.com wrote: gcc/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * tree-eh.c (do_return_redirection): Set location for jump statement. (do_goto_redirection): Likewise. (frob_into_branch_around): Likewise. (lower_try_finally_nofallthru): Likewise. (lower_try_finally_copy): Likewise. (lower_try_finally_switch): Likewise. * expr.c (store_expr): Use current insn location instead of expr location. (expand_expr_real): Likewise. (expand_expr_real_1): Likewise. gcc/testsuite/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * g++.dg/debug/dwarf2/block.C: New testcase. This patch bootstrapped and passed gcc regression tests. Okay for trunk? It looks sensible to me. If nobody objects within 24h the patch is ok. Thanks, Richard. Thanks, Dehao
Re: [PATCH] Fix debug info for expr and jump stmt
On Wed, Oct 31, 2012 at 11:00:26AM +0100, Richard Biener wrote: On Tue, Oct 30, 2012 at 6:27 PM, Dehao Chen de...@google.com wrote: gcc/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * tree-eh.c (do_return_redirection): Set location for jump statement. (do_goto_redirection): Likewise. (frob_into_branch_around): Likewise. (lower_try_finally_nofallthru): Likewise. (lower_try_finally_copy): Likewise. (lower_try_finally_switch): Likewise. * expr.c (store_expr): Use current insn location instead of expr location. (expand_expr_real): Likewise. (expand_expr_real_1): Likewise. gcc/testsuite/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * g++.dg/debug/dwarf2/block.C: New testcase. This patch bootstrapped and passed gcc regression tests. Okay for trunk? It looks sensible to me. If nobody objects within 24h the patch is ok. Yeah. But please also check gdb testsuite for this kind of patches. Jakub
Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4
visited is a poor name for a map ... Hmm, visited_with_priority? Thanks, Honza Otherwise looks ok. Thanks, Richard. + + /* Perform shortest path discovery loop-header ... loop-latch. + + The distance is given by the smallest loop bound of basic block + present in the path and we look for path with largest smallest bound + on it. + + To avoid the need for fibonaci heap on double ints we simply compress + double ints into indexes to BOUNDS array and then represent the queue + as arrays of queues for every index. + Index of VEC_length (BOUNDS) means that the execution of given BB has + no bounds determined. + + VISITED is a pointer map translating basic block into smallest index + it was inserted into the priority queue with. */ + latch_index = -1; + + /* Start walk in loop header with index set to infinite bound. */ + queue_index = VEC_length (double_int, bounds); + VEC_safe_grow_cleared (bb_queue, heap, queues, queue_index + 1); + VEC_safe_push (basic_block, heap, queue, loop-header); + VEC_replace (bb_queue, queues, queue_index, queue); + *pointer_map_insert (visited, loop-header) = (void *)queue_index; + + for (; queue_index = 0; queue_index--) +{ + if (latch_index queue_index) + { + while (VEC_length (basic_block, +VEC_index (bb_queue, queues, queue_index))) + { + basic_block bb; + ptrdiff_t bound_index = queue_index; + void **entry; + edge e; + edge_iterator ei; + + queue = VEC_index (bb_queue, queues, queue_index); + bb = VEC_pop (basic_block, queue); + + /* OK, we later increased BB priority, skip it. */ + if ((ptrdiff_t)*pointer_map_contains (visited, bb) queue_index) + continue; + + /* See if we can improve the bound. */ + entry = pointer_map_contains (bb_bounds, bb); + if (entry (ptrdiff_t)*entry bound_index) + bound_index = (ptrdiff_t)*entry; + + /* Insert succesors into the queue, watch for latch edge +and record greatest index we saw. */ + FOR_EACH_EDGE (e, ei, bb-succs) + { + bool insert = false; + void **entry; + + if (loop_exit_edge_p (loop, e)) + continue; + + if (e == loop_latch_edge (loop) + latch_index bound_index) + latch_index = bound_index; + else if (!(entry = pointer_map_contains (visited, e-dest))) + { + insert = true; + *pointer_map_insert (visited, e-dest) = (void *)bound_index; + } + else if ((ptrdiff_t)*entry bound_index) + { + insert = true; + *entry = (void *)bound_index; + } + + if (insert) + { + bb_queue queue2 = VEC_index (bb_queue, queues, bound_index); + VEC_safe_push (basic_block, heap, queue2, e-dest); + VEC_replace (bb_queue, queues, bound_index, queue2); + } + } + } + } + else + VEC_free (basic_block, heap, VEC_index (bb_queue, queues, queue_index)); +} + + gcc_assert (latch_index = 0); + if (latch_index VEC_length (double_int, bounds)) +{ + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Found better loop bound ); + dump_double_int (dump_file, + VEC_index (double_int, bounds, latch_index), true); + fprintf (dump_file, \n); + } + record_niter_bound (loop, VEC_index (double_int, bounds, latch_index), + false, true); +} + + VEC_free (bb_queue, heap, queues); + pointer_map_destroy (bb_bounds); + pointer_map_destroy (visited); +} + /* See if every path cross the loop goes through a statement that is known to not execute at the last iteration. In that case we can decrese iteration count by 1. */ @@ -3102,6 +3330,8 @@ estimate_numbers_of_iterations_loop (str infer_loop_bounds_from_undefined (loop); + discover_iteration_bound_by_body_walk (loop); + maybe_lower_iteration_bound (loop); /* If we have a measured profile, use it to estimate the number of Index: testsuite/gcc.dg/tree-ssa/loop-38.c === --- testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-cunrolli-details } */ +int a[10]; +int b[11]; +t(int n) +{ + int i; + int sum = 0; + for
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Wed, Oct 31, 2012 at 10:59 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. I suppose you are not going to merge your private port for 4.8 and thus the wide-int changes are not a show-stopper for you. That said, I considered the main conversion to be appropriate to be defered for the next stage1. There is no advantage in disrupting the tree more at this stage. I would like the wide_int class and rtl stuff to go in 4.8 though. IMO it's a significant improvement in its own right, and Kenny submitted it well before the deadline. If it gets in as-is then we'll have to live with the IMHO broken API (yet another one besides the existing double-int). So _please_ shrink the API down aggresively in favor of using non-member helper functions with more descriptive names for things that lump together multiple operations. Look at double-int and use the same API ideas as people are familiar with it (like the unsigned flag stuff) - consistency always trumps. I'm going to be on vacation for the next three weeks so somebody else has to pick up the review work. But I really think that the tree has to recover from too many changes already. Richard.
Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4
On Wed, 31 Oct 2012, Jan Hubicka wrote: visited is a poor name for a map ... Hmm, visited_with_priority? Just block_priority? Richard. Thanks, Honza Otherwise looks ok. Thanks, Richard. + + /* Perform shortest path discovery loop-header ... loop-latch. + + The distance is given by the smallest loop bound of basic block + present in the path and we look for path with largest smallest bound + on it. + + To avoid the need for fibonaci heap on double ints we simply compress + double ints into indexes to BOUNDS array and then represent the queue + as arrays of queues for every index. + Index of VEC_length (BOUNDS) means that the execution of given BB has + no bounds determined. + + VISITED is a pointer map translating basic block into smallest index + it was inserted into the priority queue with. */ + latch_index = -1; + + /* Start walk in loop header with index set to infinite bound. */ + queue_index = VEC_length (double_int, bounds); + VEC_safe_grow_cleared (bb_queue, heap, queues, queue_index + 1); + VEC_safe_push (basic_block, heap, queue, loop-header); + VEC_replace (bb_queue, queues, queue_index, queue); + *pointer_map_insert (visited, loop-header) = (void *)queue_index; + + for (; queue_index = 0; queue_index--) +{ + if (latch_index queue_index) + { + while (VEC_length (basic_block, + VEC_index (bb_queue, queues, queue_index))) + { + basic_block bb; + ptrdiff_t bound_index = queue_index; + void **entry; + edge e; + edge_iterator ei; + + queue = VEC_index (bb_queue, queues, queue_index); + bb = VEC_pop (basic_block, queue); + + /* OK, we later increased BB priority, skip it. */ + if ((ptrdiff_t)*pointer_map_contains (visited, bb) queue_index) + continue; + + /* See if we can improve the bound. */ + entry = pointer_map_contains (bb_bounds, bb); + if (entry (ptrdiff_t)*entry bound_index) + bound_index = (ptrdiff_t)*entry; + + /* Insert succesors into the queue, watch for latch edge + and record greatest index we saw. */ + FOR_EACH_EDGE (e, ei, bb-succs) + { + bool insert = false; + void **entry; + + if (loop_exit_edge_p (loop, e)) + continue; + + if (e == loop_latch_edge (loop) +latch_index bound_index) + latch_index = bound_index; + else if (!(entry = pointer_map_contains (visited, e-dest))) + { + insert = true; + *pointer_map_insert (visited, e-dest) = (void *)bound_index; + } + else if ((ptrdiff_t)*entry bound_index) + { + insert = true; + *entry = (void *)bound_index; + } + + if (insert) + { + bb_queue queue2 = VEC_index (bb_queue, queues, bound_index); + VEC_safe_push (basic_block, heap, queue2, e-dest); + VEC_replace (bb_queue, queues, bound_index, queue2); + } + } + } + } + else + VEC_free (basic_block, heap, VEC_index (bb_queue, queues, queue_index)); +} + + gcc_assert (latch_index = 0); + if (latch_index VEC_length (double_int, bounds)) +{ + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Found better loop bound ); + dump_double_int (dump_file, +VEC_index (double_int, bounds, latch_index), true); + fprintf (dump_file, \n); + } + record_niter_bound (loop, VEC_index (double_int, bounds, latch_index), + false, true); +} + + VEC_free (bb_queue, heap, queues); + pointer_map_destroy (bb_bounds); + pointer_map_destroy (visited); +} + /* See if every path cross the loop goes through a statement that is known to not execute at the last iteration. In that case we can decrese iteration count by 1. */ @@ -3102,6 +3330,8 @@ estimate_numbers_of_iterations_loop (str infer_loop_bounds_from_undefined (loop); + discover_iteration_bound_by_body_walk (loop); + maybe_lower_iteration_bound (loop); /* If we have a measured profile, use it to estimate the number of Index: testsuite/gcc.dg/tree-ssa/loop-38.c === --- testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Oct 29, 2012 at 6:56 PM, Jakub Jelinek ja...@redhat.com wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Reminds me of the stable plugin API for introspection. David, Micha - what's the status here? Adding this is certainly ok during stage3 and I think that we should have something in 4.8 to kick of further development here. Richard.
Re: RFA: hookize ADJUST_INSN_LENGTH
Richard Biener richard.guent...@gmail.com writes: On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Apart from the iteration_threshold the hookization would be straight-forward. Now I cannot decipher from the patch what functional change it introduces ;) The only change occurs if we reach an iteration count of MAX_INT iterations - which should already be indicative of a problem. At the MAX_INTth iteration, we will apply the length locking logic to instructions inside a delay slot sequence as well. If we disregard this exceptional case, there should be no functional changes unless someone defines TARGET_ADJUST_INSN_LENGTH. uid_lock_length gets initialized to allocated with XCNEWVEC. So, in particular, the elements corresponding to instructions inside delay slot sequences are initialized to zero. As default hook sets *iter_threshold to MAX_INT when inside a delay sequence, and doesn't change length, the max operation with uid_lock_length is a no-op, and *niter iter_threshold is true, hence a length change results in updating the length immediately, without changing uid_lock_length. In the case that we are not inside a delay slot sequence, the default hook leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH. Thus, when niter is 0 or larger, as is the case for the ordinary looping operation, we always find *niter = iter_threshold, and thus apply the length locking mechanism. If we are in the preliminary pass, or doing the single !increasing iteration, niter is set to -1, so *inter iter_threshold is always true, so again we update the length immediately. Ok, I see. The patch is ok then. I should probably have piped up earlier, but I'm really not sure about it. ADJUST_INSN_LENGTH as defined now is telling us a property of the target. The patch instead ties the new hook directly into the shorten_branches algorithm, which I think is a bad idea. IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as a fused process: every value returned by a length attribute (whether min, default or current) should be filtered through ADJUST_INSN_LENGTH before being used. Whether ADJUST_INSN_LENGTH decreases or increases the original length doesn't matter, because only the output of ADJUST_INSN_LENGTH should be used. Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns, which I agree is a good thing in itself. It's the all-important iteration parameter that feels wrong. TBH, I still don't understand what property of the target we're trying to describe here. I gather it has something to do with alignment. But I'd really like a description of that target property, independent of the shorten_branches implementation. Richard
Re: [PATCH] Fix PR53501
On Fri, Jun 1, 2012 at 3:29 AM, Eric Botcazou ebotca...@adacore.com wrote: Well, it would rather be TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (arg0)) TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (arg1)) but only in the !FLOAT_TYPE_P path. That works in all cases I think, see existing cases in the folder. We could even compare TYPE_OVERFLOW_UNDEFINED I think. Or even just make sure that when TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) also TYPE_OVERFLOW_UNDEFINED (type), thus !TYPE_OVERFLOW_UNDEFINED (type) || ((TREE_CODE (arg0) != MULT_EXPR || || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) (TREE_CODE (arg1) != MULT_EXPR || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1 That is, the newly created multiplication in type TYPE should either not have undefined overflow or the inner multiplications all should already have. Best done with a comment in fold_plusminus_mult_expr. I'm a little lost here. :-) I don't really care about the mainline at this point, but the fix on the branches should be the minimal working one. Your change caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142 -- H.J.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On 10/30/2012 01:56, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Somebody with commit rights please push [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h. Kai has already approved, but is off for the week. signature.asc Description: OpenPGP digital signature
Re: RFA: hookize ADJUST_INSN_LENGTH
On Wed, Oct 31, 2012 at 11:22 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Apart from the iteration_threshold the hookization would be straight-forward. Now I cannot decipher from the patch what functional change it introduces ;) The only change occurs if we reach an iteration count of MAX_INT iterations - which should already be indicative of a problem. At the MAX_INTth iteration, we will apply the length locking logic to instructions inside a delay slot sequence as well. If we disregard this exceptional case, there should be no functional changes unless someone defines TARGET_ADJUST_INSN_LENGTH. uid_lock_length gets initialized to allocated with XCNEWVEC. So, in particular, the elements corresponding to instructions inside delay slot sequences are initialized to zero. As default hook sets *iter_threshold to MAX_INT when inside a delay sequence, and doesn't change length, the max operation with uid_lock_length is a no-op, and *niter iter_threshold is true, hence a length change results in updating the length immediately, without changing uid_lock_length. In the case that we are not inside a delay slot sequence, the default hook leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH. Thus, when niter is 0 or larger, as is the case for the ordinary looping operation, we always find *niter = iter_threshold, and thus apply the length locking mechanism. If we are in the preliminary pass, or doing the single !increasing iteration, niter is set to -1, so *inter iter_threshold is always true, so again we update the length immediately. Ok, I see. The patch is ok then. I should probably have piped up earlier, but I'm really not sure about it. ADJUST_INSN_LENGTH as defined now is telling us a property of the target. The patch instead ties the new hook directly into the shorten_branches algorithm, which I think is a bad idea. IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as a fused process: every value returned by a length attribute (whether min, default or current) should be filtered through ADJUST_INSN_LENGTH before being used. Whether ADJUST_INSN_LENGTH decreases or increases the original length doesn't matter, because only the output of ADJUST_INSN_LENGTH should be used. Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns, which I agree is a good thing in itself. It's the all-important iteration parameter that feels wrong. TBH, I still don't understand what property of the target we're trying to describe here. I gather it has something to do with alignment. But I'd really like a description of that target property, independent of the shorten_branches implementation. Yeah, the iteration parameter doesn't feel right, I agree. But ADJUST_INSN_LENGTH is only used in this code. Which is why I originally said if the patch would not add the iteration parameter it would be obvious ... So, consider my approval revoked. Richard. Richard
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Wed, Oct 31, 2012 at 06:25:45PM +0800, JonY wrote: On 10/30/2012 01:56, Jakub Jelinek wrote: I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Somebody with commit rights please push [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h. Kai has already approved, but is off for the week. That looks like a bugfix (or even regression bugfix). Bugfixes are fine through stage 3, regression bugfixes are fine even in stage 4. Jakub
Non-dominating loop bounds in tree-ssa-loop-niter 3/4
Hi, this patch implements the logic to remove statements that are known to be undefined and thus expected to not be executed after unrolling. It also removes redundant exits that I originally tried to do at once, but it does not fly, since the peeling confuse number_of_iterations_exit and it no longer understands the ivs. So now we 1) always remove exits that are known to be redundant by the bounds found 2) try to peel/unroll 3) if success remove statemnts from the last iteration This silence the array-bounds warnings in my testcase and many cases of -O3 bootstrap (I am running it now). Still if one construct testcase touching out of bound in more than one iteration we will produce false warning, I will do that incrementally by similar logic in loop copying. Bootstrapped/regtested x86_64-linux, OK? Honza * tree-ssa-loop-niter.c (free_loop_bounds): Break out from ... (free_numbers_of_iterations_estimates_loop): ... here. * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): New function. (remove_redundant_iv_test): New function. (try_unroll_loop_completely): Pass in MAXITER; use remove_exits_and_undefined_stmts (canonicalize_loop_induction_variables): Compute MAXITER; use remove_redundant_iv_test. * cfgloop.h (free_loop_bounds): New function. * gcc.dg/tree-ssa/cunroll-10.c: New testcase. * gcc.dg/tree-ssa/cunroll-9.c: New testcase. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s return true; } -/* Frees the information on upper bounds on numbers of iterations of LOOP. */ - void -free_numbers_of_iterations_estimates_loop (struct loop *loop) +free_loop_bounds (struct loop *loop) { struct nb_iter_bound *bound, *next; - loop-nb_iterations = NULL; - loop-estimate_state = EST_NOT_COMPUTED; for (bound = loop-bounds; bound; bound = next) { next = bound-next; @@ -3523,6 +3751,16 @@ free_numbers_of_iterations_estimates_loo loop-bounds = NULL; } +/* Frees the information on upper bounds on numbers of iterations of LOOP. */ + +void +free_numbers_of_iterations_estimates_loop (struct loop *loop) +{ + loop-nb_iterations = NULL; + loop-estimate_state = EST_NOT_COMPUTED; + free_loop_bounds (loop); +} + /* Frees the information on upper bounds on numbers of iterations of loops. */ void Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 192989) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -389,6 +389,116 @@ loop_edge_to_cancel (struct loop *loop) return NULL; } +/* Remove all tests for exits that are known to be taken after LOOP was + peeled NPEELED times. Put gcc_unreachable before every statement + known to not be executed. */ + +static bool +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled) +{ + struct nb_iter_bound *elt; + bool changed = false; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + /* If statement is known to be undefined after peeling, turn it +into unreachable (or trap when debugging experience is supposed +to be good). */ + if (!elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt); + gimple stmt = gimple_build_call + (builtin_decl_implicit (optimize_debug + ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE), + 0); + + gimple_set_location (stmt, gimple_location (elt-stmt)); + gsi_insert_before (gsi, stmt, GSI_NEW_STMT); + changed = true; + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced statement unreachable: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + } + /* If we know the exit will be taken after peeling, update. */ + else if (elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + basic_block bb = gimple_bb (elt-stmt); + edge exit_edge = EDGE_SUCC (bb, 0); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced exit to be taken: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + if (!loop_exit_edge_p (loop, exit_edge)) + exit_edge = EDGE_SUCC (bb, 1); + if (exit_edge-flags EDGE_TRUE_VALUE) + gimple_cond_make_true (elt-stmt); + else + gimple_cond_make_false (elt-stmt); + update_stmt (elt-stmt); + changed = true; + } +} + return changed; +} + +/* Remove all exits that are known to be never taken because of the loop bound + discovered. */
Re: patch to fix constant math - 4th patch - the wide-int class.
Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I think Kenny's API is just taking that to its logical conclusion. There doesn't seem to be anything sacrosanct about the current choice of what's fused and what isn't. The speed problem we had using trees
[AARCH64-4.7] Merge from upstream gcc-4_7-branch r192902
Hi, I have just merged upstream gcc-4_7-branch into ARM/aarch64-4.7-branch, up to r192902. Thanks Sofiane
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On 31 October 2012 10:25, JonY wrote: On 10/30/2012 01:56, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Somebody with commit rights please push [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h. Kai has already approved, but is off for the week. I could have done that, if it had been sent to the right lists. All libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org please. Let's move this to the libstdc++ list, I have some questions about the patch.
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, 31 Oct 2012, Jan Hubicka wrote: Hi, this patch implements the logic to remove statements that are known to be undefined and thus expected to not be executed after unrolling. It also removes redundant exits that I originally tried to do at once, but it does not fly, since the peeling confuse number_of_iterations_exit and it no longer understands the ivs. So now we 1) always remove exits that are known to be redundant by the bounds found 2) try to peel/unroll 3) if success remove statemnts from the last iteration This silence the array-bounds warnings in my testcase and many cases of -O3 bootstrap (I am running it now). Still if one construct testcase touching out of bound in more than one iteration we will produce false warning, I will do that incrementally by similar logic in loop copying. Bootstrapped/regtested x86_64-linux, OK? Honza * tree-ssa-loop-niter.c (free_loop_bounds): Break out from ... (free_numbers_of_iterations_estimates_loop): ... here. * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): New function. (remove_redundant_iv_test): New function. (try_unroll_loop_completely): Pass in MAXITER; use remove_exits_and_undefined_stmts (canonicalize_loop_induction_variables): Compute MAXITER; use remove_redundant_iv_test. * cfgloop.h (free_loop_bounds): New function. * gcc.dg/tree-ssa/cunroll-10.c: New testcase. * gcc.dg/tree-ssa/cunroll-9.c: New testcase. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s return true; } -/* Frees the information on upper bounds on numbers of iterations of LOOP. */ - Needs a comment. void -free_numbers_of_iterations_estimates_loop (struct loop *loop) +free_loop_bounds (struct loop *loop) { struct nb_iter_bound *bound, *next; - loop-nb_iterations = NULL; - loop-estimate_state = EST_NOT_COMPUTED; for (bound = loop-bounds; bound; bound = next) { next = bound-next; @@ -3523,6 +3751,16 @@ free_numbers_of_iterations_estimates_loo loop-bounds = NULL; } +/* Frees the information on upper bounds on numbers of iterations of LOOP. */ + +void +free_numbers_of_iterations_estimates_loop (struct loop *loop) +{ + loop-nb_iterations = NULL; + loop-estimate_state = EST_NOT_COMPUTED; + free_loop_bounds (loop); +} + /* Frees the information on upper bounds on numbers of iterations of loops. */ void Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 192989) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -389,6 +389,116 @@ loop_edge_to_cancel (struct loop *loop) return NULL; } +/* Remove all tests for exits that are known to be taken after LOOP was + peeled NPEELED times. Put gcc_unreachable before every statement + known to not be executed. */ + +static bool +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled) +{ + struct nb_iter_bound *elt; + bool changed = false; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + /* If statement is known to be undefined after peeling, turn it + into unreachable (or trap when debugging experience is supposed + to be good). */ + if (!elt-is_exit +elt-bound.ule (double_int::from_uhwi (npeeled))) + { + gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt); + gimple stmt = gimple_build_call + (builtin_decl_implicit (optimize_debug + ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE), I'm not sure we should do different things for -Og here. In fact there is no unrolling done for -Og at all, so just use unreachable. +0); + + gimple_set_location (stmt, gimple_location (elt-stmt)); + gsi_insert_before (gsi, stmt, GSI_NEW_STMT); + changed = true; + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced statement unreachable: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + } + /* If we know the exit will be taken after peeling, update. */ + else if (elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + basic_block bb = gimple_bb (elt-stmt); + edge exit_edge = EDGE_SUCC (bb, 0); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced exit to be taken: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + if (!loop_exit_edge_p (loop, exit_edge)) + exit_edge = EDGE_SUCC (bb, 1); + if (exit_edge-flags EDGE_TRUE_VALUE) I think we can have abnormal/eh exit edges. So I'm not sure you
[Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
On 31 October 2012 11:01, Jonathan Wakely wrote: On 31 October 2012 10:25, JonY wrote: On 10/30/2012 01:56, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Somebody with commit rights please push [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h. Kai has already approved, but is off for the week. I could have done that, if it had been sent to the right lists. All libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org please. Let's move this to the libstdc++ list, I have some questions about the patch. It looks like the workaround is in mingw not in GCC, so is it a problem that it won't be possible to use GCC 4.8 with existing mingw versions, or are users required to use a brand new mingw to use a new GCC? Should that be documented in http://gcc.gnu.org/gcc-4.8/changes.html ? Why is the define commented out by the patch, not simply removed? If it's not needed then it's not needed. We have subversion to track change history, we don't need to leave dead code lying around with comments explaining why it's dead.
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
On 10/31/2012 19:12, Jonathan Wakely wrote: On 31 October 2012 11:01, Jonathan Wakely wrote: On 31 October 2012 10:25, JonY wrote: On 10/30/2012 01:56, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Somebody with commit rights please push [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h. Kai has already approved, but is off for the week. I could have done that, if it had been sent to the right lists. All libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org please. Let's move this to the libstdc++ list, I have some questions about the patch. It looks like the workaround is in mingw not in GCC, so is it a problem that it won't be possible to use GCC 4.8 with existing mingw versions, or are users required to use a brand new mingw to use a new GCC? Should that be documented in http://gcc.gnu.org/gcc-4.8/changes.html ? They are required to use the latest mingw-w64, the problem was that the vfswprintf that libstdc++ expects isn't the same as the one MS provides, so I've wrote a redirector to use the vsnwprintf, more precisely, the mingw C99 compliant __mingw_vsnwprintf. std::to_wstring and std::to_string work according to some simple tests. I guess the current comment about require mingw-w64 trunk at least r5437 is OK for the changes page. It should probably note that this change is mingw-w64 specific, with w64 as the vendor key. Why is the define commented out by the patch, not simply removed? If it's not needed then it's not needed. We have subversion to track change history, we don't need to leave dead code lying around with comments explaining why it's dead. OK, I will remove it. signature.asc Description: OpenPGP digital signature
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
On 10/31/2012 19:23, JonY wrote: Why is the define commented out by the patch, not simply removed? If it's not needed then it's not needed. We have subversion to track change history, we don't need to leave dead code lying around with comments explaining why it's dead. OK, I will remove it. Sorry, forgot to attach the patch. ChangeLog 2012-10-31 Jonathan Yong jo...@users.sourceforge.net * config/os/mingw32-w64/os_defines.h: Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF as no longer required. Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h === --- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 192802) +++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy) @@ -63,9 +63,6 @@ // See libstdc++/20806. #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1 -// See libstdc++/37522. -#define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1 - // See libstdc++/43738 // On native windows targets there is no ioctl function. And the existing // ioctlsocket function doesn't work for normal file-descriptors. signature.asc Description: OpenPGP digital signature
Re: Minimize downward code motion during reassociation
On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote: On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote: On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? It's sufficient to call it once if you conservatively update UIDs at stmt move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. Thanks. +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. It would be a lot easier to understand this function if you call it once after rewrite_expr_tree finished. I think you meant before rewrite_expr_tree. One thing to note here is this function is called only when we first see a statement whose operand changes after reassociation. That's the reason the moving of statements happen inside rewrite_expr_tree. I meant after because rewrite_expr_tree ends up
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
Applied. Thanks, Paolo.
Re: RFA: hookize ADJUST_INSN_LENGTH
Quoting Richard Sandiford rdsandif...@googlemail.com: I should probably have piped up earlier, but I'm really not sure about it. ADJUST_INSN_LENGTH as defined now is telling us a property of the target. The patch instead ties the new hook directly into the shorten_branches algorithm, which I think is a bad idea. IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as a fused process: every value returned by a length attribute (whether min, default or current) should be filtered through ADJUST_INSN_LENGTH before being used. Whether ADJUST_INSN_LENGTH decreases or increases the original length doesn't matter, because only the output of ADJUST_INSN_LENGTH should be used. Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns, which I agree is a good thing in itself. It's the all-important iteration parameter that feels wrong. TBH, I still don't understand what property of the target we're trying to describe here. I gather it has something to do with alignment. But I'd really like a description of that target property, independent of the shorten_branches implementation. It's about describing complex interactions of length adjustments that derive from branch shortening and length added for (un)alignment for scheduling purposes. Expressed naively, these can lead to cycles. I could manage with the combination of length / lock_length attribute, but there was a fine line between tuning for optimal scheduling and risking cycles. So, as Richard Biener has pointed out, that design was not optimal. What is really needed is a way to express priorities between instructions when to inhibit length shrinkage from one iteration to another. Ports that have no issues with negative feedback in the length calculations don't need to worry about priorities; the length locking mechanism will be a no-op anyways, so the question when it is being applied is moot. In the case you have negative feedback, the default, to apply length locking immediatly to all instructions, gives the fastest convergence, although it will likely give a sub-optimal shortening solution. The length variation for the ARC are not alike: there are branches that are subject to branch shortening in the usual way, but they might shrink when other things shrink. When we are iterating starting at minimal length and increasing lengths, these branches should be stopped from shrinking, as they likly will grow back again in the next iteration. OTOH there are potentially short instructions that are lengthened to archive scheduling-dictated alignment or misalignment. These should be allowed to change freely back and forth. Unless we have a rare cycle that only depends on alignments. So, to get this right without having the port writer to worry too much when a length adjustment causes a cycle, we want all varying length instructions subject to length locking to break cycles, but not with the same priority. Specifying this as an iteration count is an easy way to implement this, without having to make shorten_branches identify every cycle and the participating instructions that are oscillating in length. The latter would actually run up against computability problems in distinguishing a single complex cycle and multiple independent cycles. If you like, we can describe the iter_threshold parameter in a different way, just saying that it indicates a priority, 0 being the first to get length locking applied, and the larger the number, the later.
Re: [PATCH] pass filtering for -fopt-info
On Wed, Oct 31, 2012 at 8:20 AM, Sharad Singhai sing...@google.com wrote: I am attaching an updated patch with comments inline. On Tue, Oct 30, 2012 at 9:04 AM, Xinliang David Li davi...@google.com wrote: On Tue, Oct 30, 2012 at 8:28 AM, Richard Biener richard.guent...@gmail.com wrote: What I'd expect from that would be both vec.miss and vec.opt being populated ... (thus go away from the duality of dump files to primary dump file plus a set of alternate dump files). Yes, that would be an ideal behavior. Each pass currently has access to only two dump files, its primary file and alternate file. If we extend that to include a set, it would be cleaner and avoid some of the issues like the following: During testing, I found that even having any two different files for -fopt-info is problematic. For example, consider gcc ... -fopt-info-vec-optimized=vec.opt -fopt-info-loop-missed=loop.missed Now 'loop' and 'vec' include overlapping passes and since we have only one alternate file per pass, it can either be vec.opt or loop.missed, and some information would be lost. If we have a set of files, we can avoid this problem. However, considering the stage 1 deadline, I don't think it is feasible for this version. Instead, a set of alternate files can be considered as a future extension. For now, having a default of 'stderr' works for the common use case. To further prevent confusion, the current patch allows only one (non-stderr) file for -fopt-info and a warning is printed as in the example below gcc ... -fopt-info-vec-optimized=vec.opt -fopt-info-loop-missed=loop.missed cc1: warning: ignoring possibly conflicting option '-fopt-info-loop-missed=loop.missed' [enabled by default] Multiple dumps can be achieved using 'stderr' (default) instead. gcc ... -fopt-info-vec-optimized -fopt-info-loop-missed I have updated the documentation to include -fopt-info examples, and added some details about -fopt-info command line conflicts. I like it overall, not sure if we want to pre-populate the OPTGROUP set too much at this point. Like what is 'tree' or 'rtl' to users? nothing I think. 'ipa' yes. 'lto'? sounds redundant with 'ipa' to me. 'omp'? we don't have any optimizations here. OMP is a high level transformation, and it seems to be a good candidate group, but this part does not need to be designed now. For instance, there are a bunch of FDO related transformation (indirect call promotion, memcpy transformation etc), and coverage mismatch notes etc a good candidate to be filtered. Yes, perhaps later, we can consider something like an 'fdo' group for this purpose. Thus please drop TREE, RTL, LTO and OMP for now. Okay, I have removed 'tree', 'rtl', 'omp', 'lto' groups. Otherwise I'm leaving it for comments from other folks. Thanks, Richard. I have updated the documentation and testing is ongoing. @@ -503,7 +509,7 @@ register_one_dump_file (struct opt_pass *pass) name = name ? name + 1 : pass-name; dot_name = concat (., name, num, NULL); if (pass-type == SIMPLE_IPA_PASS || pass-type == IPA_PASS) -prefix = ipa-, flags = TDF_IPA; +prefix = ipa-, flags = TDF_IPA, optgroup_flags |= OPTGROUP_IPA; please do not use compound statements. Otherwise ok if testing is ok. Thanks, Richard. Thanks, Sharad
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more concerned about fused operations that use precision or bitsize as input. That is for example + bool
Re: RFA: hookize ADJUST_INSN_LENGTH
Joern Rennecke joern.renne...@embecosm.com writes: Quoting Richard Sandiford rdsandif...@googlemail.com: I should probably have piped up earlier, but I'm really not sure about it. ADJUST_INSN_LENGTH as defined now is telling us a property of the target. The patch instead ties the new hook directly into the shorten_branches algorithm, which I think is a bad idea. IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as a fused process: every value returned by a length attribute (whether min, default or current) should be filtered through ADJUST_INSN_LENGTH before being used. Whether ADJUST_INSN_LENGTH decreases or increases the original length doesn't matter, because only the output of ADJUST_INSN_LENGTH should be used. Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns, which I agree is a good thing in itself. It's the all-important iteration parameter that feels wrong. TBH, I still don't understand what property of the target we're trying to describe here. I gather it has something to do with alignment. But I'd really like a description of that target property, independent of the shorten_branches implementation. It's about describing complex interactions of length adjustments that derive from branch shortening and length added for (un)alignment for scheduling purposes. Expressed naively, these can lead to cycles. But shorten_branches should be written to avoid cycles, and I thought your patch did that by making sure that the address of each instruction only increases. I could manage with the combination of length / lock_length attribute, but there was a fine line between tuning for optimal scheduling and risking cycles. So, as Richard Biener has pointed out, that design was not optimal. What is really needed is a way to express priorities between instructions when to inhibit length shrinkage from one iteration to another. Ports that have no issues with negative feedback in the length calculations don't need to worry about priorities; the length locking mechanism will be a no-op anyways, so the question when it is being applied is moot. In the case you have negative feedback, the default, to apply length locking immediatly to all instructions, gives the fastest convergence, although it will likely give a sub-optimal shortening solution. The length variation for the ARC are not alike: there are branches that are subject to branch shortening in the usual way, but they might shrink when other things shrink. When we are iterating starting at minimal length and increasing lengths, these branches should be stopped from shrinking, as they likly will grow back again in the next iteration. OTOH there are potentially short instructions that are lengthened to archive scheduling-dictated alignment or misalignment. These should be allowed to change freely back and forth. Unless we have a rare cycle that only depends on alignments. Hmm, but this is still talking in terms of shorten_branches, rather than the target property that we're trying to model. It sounds like the property is something along the lines of: some instructions have different encodings (or work-alikes with different encodings), and we want to make all the encodings available to the optimisers. Is that right? If so, that sounds like a useful thing to model, but I think it should be modelled directly, rather than as a modification to the shorten_branches algorithm. Is the alignment/misalignment explicitly modelled in the rtl? If so, how? Using labels, or something else? Richard
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
On 31 October 2012 11:23, JonY wrote: On 10/31/2012 19:12, Jonathan Wakely wrote: It looks like the workaround is in mingw not in GCC, so is it a problem that it won't be possible to use GCC 4.8 with existing mingw versions, or are users required to use a brand new mingw to use a new GCC? Should that be documented in http://gcc.gnu.org/gcc-4.8/changes.html ? They are required to use the latest mingw-w64, the problem was that the vfswprintf that libstdc++ expects isn't the same as the one MS provides, so I've wrote a redirector to use the vsnwprintf, more precisely, the mingw C99 compliant __mingw_vsnwprintf. std::to_wstring and std::to_string work according to some simple tests. Excellent, the testsuite should automatically start running the relevant tests and we should be able to close http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015 I guess the current comment about require mingw-w64 trunk at least r5437 is OK for the changes page. It should probably note that this change is mingw-w64 specific, with w64 as the vendor key. i.e. *-w64-mingw* ? Or is *-w64-mingw32 more accurate? (I don't really know how the mingw target triplets work, sorry!) I'll put something in the changes page later. Thanks for fixing this.
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s return true; } -/* Frees the information on upper bounds on numbers of iterations of LOOP. */ - Needs a comment. Yep, also the reason I export it is because I use in other patch. (I think we should not hook the bounds into the loop structure and keep them dead, so i want the max_iter*friends to free them unless they are asked to preserve and explicitely free in the two users of them). +static bool +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled) +{ + struct nb_iter_bound *elt; + bool changed = false; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + /* If statement is known to be undefined after peeling, turn it +into unreachable (or trap when debugging experience is supposed +to be good). */ + if (!elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt); + gimple stmt = gimple_build_call + (builtin_decl_implicit (optimize_debug + ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE), I'm not sure we should do different things for -Og here. In fact there is no unrolling done for -Og at all, so just use unreachable. OK, I was thinking about BUILT_IN_TRAP for -O1 too, but I am not sure either. i guess we will gather some experience on how much are users confused. Why we do ont inline at -Og? + /* If we know the exit will be taken after peeling, update. */ + else if (elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + basic_block bb = gimple_bb (elt-stmt); + edge exit_edge = EDGE_SUCC (bb, 0); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced exit to be taken: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + if (!loop_exit_edge_p (loop, exit_edge)) + exit_edge = EDGE_SUCC (bb, 1); + if (exit_edge-flags EDGE_TRUE_VALUE) I think we can have abnormal/eh exit edges. So I'm not sure you can, without checking, assume the block ends in a GIMPLE_COND. We can't - those are only edges that are found by the IV analysis and they always test IV with some bound. (it is all done by number_of_iterations_exit) See above. Otherwise the overall idea sounds fine. Similarly here, simple exits are always conditionals. Thanks a lot! Honza
Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*
Quoting Richard Sandiford rdsandif...@googlemail.com: I can't approve the whole thing of course, but I like the idea. However... Joern Rennecke joern.renne...@embecosm.com writes: +@deftypevr {Target Hook} bool TARGET_HAVE_CC0 +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH +These flags are automatically generated; you should not override them in @file{tm.c}. +@end deftypevr Unless this goes against something already discussed, I think it'd be better to leave these out until there's a concerted attempt to use them somewhere. On its own this isn't even a partial transition. :-) Well, I was tempted to do a full transition, but then realized that that'd never get reviewed, so I thought a good start to get out of the hole we got ourselves into is to stop digging. Well, maybe we can't even do that before 4.8 ... whatever. I have removed that bit from the patch. + /* We make an exception here to provide stub definitions for + insn_*_length* / get_attr_enabled functions. */ + puts (#if !HAVE_ATTR_length\n + extern int hook_int_rtx_0 (rtx);\n + #define insn_default_length hook_int_rtx_0\n + #define insn_min_length hook_int_rtx_0\n + #define insn_variable_length_p hook_int_rtx_0\n + #define insn_current_length hook_int_rtx_0\n + #include \insn-addr.h\\n + #endif\n I'd prefer defaults that call gcc_unreachable, rather than silently return an arbitrary value. I'm using a new hook hook_int_rtx_unreachable for this purpose. tested with config-list.mk on x86_64-unknown-linux-gnu. 2012-10-30 Joern Rennecke joern.renne...@embecosm.com * doc/md.texi (Defining Attributes): Document that we are defining HAVE_ATTR_name macors as 1 for defined attributes, and as 0 for undefined special attributes. * final.c (asm_insn_count, align_fuzz): Always define. (insn_current_reference_address): Likewise. (init_insn_lengths): Use if (HAVE_ATTR_length) instead of #ifdef HAVE_ATTR_length. (get_attr_length_1, shorten_branches, final): Likewise. (final_scan_insn, output_asm_name): Likewise. * genattr.c (gen_attr): Define HAVE_ATTR_name macros for defined attributes as 1. Remove ancient get_attr_alternative compatibility code. For special purpose attributes not provided, define HAVE_ATTR_name as 0. In case no length attribute is given, provide stub definitions for insn_*_length* functions, and also include insn-addr.h. In case no enabled attribute is given, provide stub definition. * genattrtab.c (write_length_unit_log): Always write a definition. * hooks.c (hook_int_rtx_1, hook_int_rtx_unreachable): New functions. * hooks.h (hook_int_rtx_1, hook_int_rtx_unreachable): Declare. * lra-int.h (struct lra_insn_recog_data): Make member alternative_enabled_p unconditional. * lra.c (free_insn_recog_data): Use if (HAVE_ATTR_length) instead of #ifdef HAVE_ATTR_length. (lra_set_insn_recog_data): Likewise. Make initialization of alternative_enabled_p unconditional. (lra_update_insn_recog_data): Use #if instead of #ifdef for HAVE_ATTR_enabled. * recog.c [!HAVE_ATTR_enabled] (get_attr_enabled): Don't define. (extract_insn): Check HAVE_ATTR_enabled. (gate_handle_split_before_regstack): Use #if instead of #if defined for HAVE_ATTR_length. Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 193015) +++ gcc/doc/md.texi (working copy) @@ -7566,7 +7566,7 @@ (define_attr type branch,fp,load,stor the following lines will be written to the file @file{insn-attr.h}. @smallexample -#define HAVE_ATTR_type +#define HAVE_ATTR_type 1 enum attr_type @{TYPE_BRANCH, TYPE_FP, TYPE_LOAD, TYPE_STORE, TYPE_ARITH@}; extern enum attr_type get_attr_type (); @@ -7591,6 +7591,10 @@ extern enum attr_type get_attr_type (); generation. @xref{Disable Insn Alternatives}. @end table +For each of these special attributes, the corresponding +@samp{HAVE_ATTR_@var{name}} @samp{#define} is also written when the +attribute is not defined; in that case, it is defined as @samp{0}. + @findex define_enum_attr @anchor{define_enum_attr} Another way of defining an attribute is to use: Index: gcc/hooks.c === --- gcc/hooks.c (revision 193015) +++ gcc/hooks.c (working copy) @@ -203,6 +203,18 @@ hook_int_rtx_0 (rtx a ATTRIBUTE_UNUSED) } int +hook_int_rtx_1 (rtx) +{ + return 1; +} + +int +hook_int_rtx_unreachable (rtx) +{ + gcc_unreachable (); +} + +int hook_int_rtx_bool_0 (rtx a ATTRIBUTE_UNUSED, bool b ATTRIBUTE_UNUSED) {
Re: patch to fix constant math - 4th patch - the wide-int class.
Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more concerned about fused operations that use
[Patch] Update libquadmath from GLIBC
Hi all, libquadmath's math functions are based on (but not identical to) GLIBC's sysdeps/ieee754/ldbl-128 functions. In the attached patch, I have ported the bug fixes from GLIBC over to libquadmath. Hopefully, the port is complete and correct. I intent to commit the patch soon, unless there are other comments or suggestions. (It was build and tested on x86-64-gnu-linux. Unfortunately, we do not have a test suite for it, except for two minor tests in gfortran.dg.) Tobias 2012-10-31 Tobias Burnus bur...@net-b.de Joseph Myers jos...@codesourcery.com David S. Miller da...@davemloft.net Ulrich Drepper drep...@redhat.com Marek Polacek pola...@redhat.com: Petr Baudis pa...@suse.cz * math/expm1q.c (expm1q): Changes from GLIBC. Use expq for large parameters. Fix errno for boundary conditions. * math/finiteq.c (finiteq): Add comment. * math/fmaq.c (fmaq): Changes from GLIBC. Fix missing underflows and bad results for some subnormal results. Fix sign of inexact zero return. Fix sign of exact zero return. Ensure additions are not scheduled after fetestexcept. * math/jnq.c (jnq): Changes from GLIBC. Set up errno properly for ynq. Fix jnq precision. * math/nearbyintq.c (nearbyintq): Changes from GLIBC. Do not manipulate bits before adding and subtracting TWO112[sx]. * math/rintq.c (rintq): Ditto. * math/scalbnq.c (scalbnq): Changes from GLIBC. Fix integer overflow. diff --git a/libquadmath/math/expm1q.c b/libquadmath/math/expm1q.c index 510c98f..cd254b5 100644 --- a/libquadmath/math/expm1q.c +++ b/libquadmath/math/expm1q.c @@ -53,6 +53,7 @@ +#include errno.h #include quadmath-imp.h /* exp(x) - 1 = x + 0.5 x^2 + x^3 P(x)/Q(x) @@ -100,6 +101,11 @@ expm1q (__float128 x) ix = u.words32.w0; sign = ix 0x8000; ix = 0x7fff; + if (!sign ix = 0x4006) +{ + /* If num is positive and exp = 6 use plain exp. */ + return expq (x); +} if (ix = 0x7fff) { /* Infinity. */ @@ -120,7 +126,10 @@ expm1q (__float128 x) /* Overflow. */ if (x maxlog) -return (HUGE_VALQ * HUGE_VALQ); +{ + __set_errno (ERANGE); + return (HUGE_VALQ * HUGE_VALQ); +} /* Minimum value. */ if (x minarg) diff --git a/libquadmath/math/finiteq.c b/libquadmath/math/finiteq.c index f22e9d7..663c928 100644 --- a/libquadmath/math/finiteq.c +++ b/libquadmath/math/finiteq.c @@ -15,6 +15,11 @@ #include quadmath-imp.h +/* + * finiteq(x) returns 1 is x is finite, else 0; + * no branching! + */ + int finiteq (const __float128 x) { diff --git a/libquadmath/math/fmaq.c b/libquadmath/math/fmaq.c index 126b0a2..cd1b232 100644 --- a/libquadmath/math/fmaq.c +++ b/libquadmath/math/fmaq.c @@ -1,5 +1,5 @@ /* Compute x * y + z as ternary operation. - Copyright (C) 2010 Free Software Foundation, Inc. + Copyright (C) 2010-2012 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Jakub Jelinek ja...@redhat.com, 2010. @@ -57,6 +57,11 @@ fmaq (__float128 x, __float128 y, __float128 z) u.ieee.exponent != 0x7fff v.ieee.exponent != 0x7fff) return (z + x) + y; + /* If z is zero and x are y are nonzero, compute the result + as x * y to avoid the wrong sign of a zero result if x * y + underflows to 0. */ + if (z == 0 x != 0 y != 0) + return x * y; /* If x or y or z is Inf/NaN, or if fma will certainly overflow, or if x * y is less than half of FLT128_DENORM_MIN, compute as x * y + z. */ @@ -136,6 +141,11 @@ fmaq (__float128 x, __float128 y, __float128 z) y = v.value; z = w.value; } + + /* Ensure correct sign of exact 0 + 0. */ + if (__builtin_expect ((x == 0 || y == 0) z == 0, 0)) +return x * y + z; + /* Multiplication m1 + m2 = x * y using Dekker's algorithm. */ #define C ((1LL (FLT128_MANT_DIG + 1) / 2) + 1) __float128 x1 = x * C; @@ -191,7 +201,7 @@ fmaq (__float128 x, __float128 y, __float128 z) #endif v.value = a1 + u.value; /* Ensure the addition is not scheduled after fetestexcept call. */ - asm volatile ( : : m (v)); + asm volatile ( : : m (v.value)); #ifdef USE_FENV_H int j = fetestexcept (FE_INEXACT) != 0; feupdateenv (env); @@ -220,20 +230,14 @@ fmaq (__float128 x, __float128 y, __float128 z) { /* v.ieee.mant_low 2 is LSB bit of the result before rounding, v.ieee.mant_low 1 is the round bit and j is our sticky - bit. In round-to-nearest 001 rounds down like 00, - 011 rounds up, even though 01 rounds down (thus we need - to adjust), 101 rounds down like 10 and 111 rounds up - like 11. */ - if ((v.ieee.mant_low 3) == 1) - { - v.value *= 0x1p-226Q; - if (v.ieee.negative) - return v.value - 0x1p-16494Q /* __FLT128_DENORM_MIN__ */; - else - return v.value + 0x1p-16494Q /* __FLT128_DENORM_MIN__ */; - } - else - return v.value * 0x1p-226Q; + bit. */ + w.value = 0.0Q; +
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if
Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*
Joern Rennecke joern.renne...@embecosm.com writes: Quoting Richard Sandiford rdsandif...@googlemail.com: I can't approve the whole thing of course, but I like the idea. However... Joern Rennecke joern.renne...@embecosm.com writes: +@deftypevr {Target Hook} bool TARGET_HAVE_CC0 +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH +These flags are automatically generated; you should not override them in @file{tm.c}. +@end deftypevr Unless this goes against something already discussed, I think it'd be better to leave these out until there's a concerted attempt to use them somewhere. On its own this isn't even a partial transition. :-) Well, I was tempted to do a full transition, but then realized that that'd never get reviewed, so I thought a good start to get out of the hole we got ourselves into is to stop digging. Well, maybe we can't even do that before 4.8 ... whatever. I have removed that bit from the patch. Thanks. Sorry, a couple of minor nits that I should probably have picked up on before: @@ -2853,12 +2837,13 @@ final_scan_insn (rtx insn, FILE *file, i if (new_rtx == insn PATTERN (new_rtx) == body) fatal_insn (could not split insn, insn); -#ifdef HAVE_ATTR_length - /* This instruction should have been split in shorten_branches, -to ensure that we would have valid length info for the -splitees. */ - gcc_unreachable (); -#endif + if (HAVE_ATTR_length) + { + /* This instruction should have been split in shorten_branches, +to ensure that we would have valid length info for the +splitees. */ + gcc_unreachable (); + } probably gcc_assert (!HAVE_ATTR_length); @@ -3816,7 +3809,7 @@ struct rtl_opt_pass pass_split_after_rel static bool gate_handle_split_before_regstack (void) { -#if defined (HAVE_ATTR_length) defined (STACK_REGS) +#if (HAVE_ATTR_length) defined (STACK_REGS) /* If flow2 creates new instructions which need splitting and scheduling after reload is not done, they might not be split until final which doesn't allow splitting @@ -3900,7 +3893,7 @@ struct rtl_opt_pass pass_split_before_sc static bool gate_do_final_split (void) { -#if defined (HAVE_ATTR_length) !defined (STACK_REGS) +#if (HAVE_ATTR_length) !defined (STACK_REGS) return 1; #else return 0; Redundant brackets around HAVE_ATTR_length. OK with those changes for the rtl bits. Can't approve the generator stuff though. Richard
Re: [Patch] Update libquadmath from GLIBC
On Wed, Oct 31, 2012 at 1:06 PM, Tobias Burnus bur...@net-b.de wrote: Hi all, libquadmath's math functions are based on (but not identical to) GLIBC's sysdeps/ieee754/ldbl-128 functions. Heh, last time I copied things from sysdeps/ieee754 into GCC rms objected and I had to revert ... (libgccmath). Richard. In the attached patch, I have ported the bug fixes from GLIBC over to libquadmath. Hopefully, the port is complete and correct. I intent to commit the patch soon, unless there are other comments or suggestions. (It was build and tested on x86-64-gnu-linux. Unfortunately, we do not have a test suite for it, except for two minor tests in gfortran.dg.) Tobias
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, 31 Oct 2012, Jan Hubicka wrote: Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s return true; } -/* Frees the information on upper bounds on numbers of iterations of LOOP. */ - Needs a comment. Yep, also the reason I export it is because I use in other patch. (I think we should not hook the bounds into the loop structure and keep them dead, so i want the max_iter*friends to free them unless they are asked to preserve and explicitely free in the two users of them). +static bool +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled) +{ + struct nb_iter_bound *elt; + bool changed = false; + + for (elt = loop-bounds; elt; elt = elt-next) +{ + /* If statement is known to be undefined after peeling, turn it + into unreachable (or trap when debugging experience is supposed + to be good). */ + if (!elt-is_exit +elt-bound.ule (double_int::from_uhwi (npeeled))) + { + gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt); + gimple stmt = gimple_build_call + (builtin_decl_implicit (optimize_debug + ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE), I'm not sure we should do different things for -Og here. In fact there is no unrolling done for -Og at all, so just use unreachable. OK, I was thinking about BUILT_IN_TRAP for -O1 too, but I am not sure either. i guess we will gather some experience on how much are users confused. Why we do ont inline at -Og? unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): NEXT_PASS (pass_all_optimizations_g); { struct opt_pass **p = pass_all_optimizations_g.pass.sub; NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_strip_predict_hints); /* Lower remaining pieces of GIMPLE. */ NEXT_PASS (pass_lower_complex); NEXT_PASS (pass_lower_vector_ssa); /* Perform simple scalar cleanup which is constant/copy propagation. */ NEXT_PASS (pass_ccp); NEXT_PASS (pass_object_sizes); /* Copy propagation also copy-propagates constants, this is necessary to forward object-size results properly. */ NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_dce); /* Fold remaining builtins. */ NEXT_PASS (pass_fold_builtins); /* ??? We do want some kind of loop invariant motion, but we possibly need to adjust LIM to be more friendly towards preserving accurate debug information here. */ NEXT_PASS (pass_late_warn_uninitialized); NEXT_PASS (pass_uncprop); NEXT_PASS (pass_local_pure_const); } + /* If we know the exit will be taken after peeling, update. */ + else if (elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + basic_block bb = gimple_bb (elt-stmt); + edge exit_edge = EDGE_SUCC (bb, 0); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced exit to be taken: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + if (!loop_exit_edge_p (loop, exit_edge)) + exit_edge = EDGE_SUCC (bb, 1); + if (exit_edge-flags EDGE_TRUE_VALUE) I think we can have abnormal/eh exit edges. So I'm not sure you can, without checking, assume the block ends in a GIMPLE_COND. We can't - those are only edges that are found by the IV analysis and they always test IV with some bound. (it is all done by number_of_iterations_exit) See above. Otherwise the overall idea sounds fine. Similarly here, simple exits are always conditionals. I see. Then the patch is ok (with the comment added). Thanks, Richard.
Re: patch to fix constant math - 4th patch - the wide-int class.
Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and()
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? NEXT_PASS (pass_all_optimizations_g); { struct opt_pass **p = pass_all_optimizations_g.pass.sub; NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_strip_predict_hints); /* Lower remaining pieces of GIMPLE. */ NEXT_PASS (pass_lower_complex); NEXT_PASS (pass_lower_vector_ssa); /* Perform simple scalar cleanup which is constant/copy propagation. */ NEXT_PASS (pass_ccp); NEXT_PASS (pass_object_sizes); /* Copy propagation also copy-propagates constants, this is necessary to forward object-size results properly. */ NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_dce); /* Fold remaining builtins. */ NEXT_PASS (pass_fold_builtins); /* ??? We do want some kind of loop invariant motion, but we possibly need to adjust LIM to be more friendly towards preserving accurate debug information here. */ NEXT_PASS (pass_late_warn_uninitialized); NEXT_PASS (pass_uncprop); NEXT_PASS (pass_local_pure_const); } + /* If we know the exit will be taken after peeling, update. */ + else if (elt-is_exit + elt-bound.ule (double_int::from_uhwi (npeeled))) + { + basic_block bb = gimple_bb (elt-stmt); + edge exit_edge = EDGE_SUCC (bb, 0); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Forced exit to be taken: ); + print_gimple_stmt (dump_file, elt-stmt, 0, 0); + } + if (!loop_exit_edge_p (loop, exit_edge)) + exit_edge = EDGE_SUCC (bb, 1); + if (exit_edge-flags EDGE_TRUE_VALUE) I think we can have abnormal/eh exit edges. So I'm not sure you can, without checking, assume the block ends in a GIMPLE_COND. We can't - those are only edges that are found by the IV analysis and they always test IV with some bound. (it is all done by number_of_iterations_exit) See above. Otherwise the overall idea sounds fine. Similarly here, simple exits are always conditionals. I see. Then the patch is ok (with the comment added). Thanks, i will ad comment and privatize free_loop_bounds for now. Honza Thanks, Richard.
Modernize loop_finite_p
Hi, this patch changes finite_loop_p to use max_loop_iterations. Long time ago I made finite_loop_p as rip-off from the max_loop_iterations skipping parts that are not exactly related to the number of iteration estimates. It went out of date since then completelly missing the bounds derived from overflows and array bounds that are quite useful. So I guess these days it is better to simply implement it using max_loop_iterations, we do not save that much of effort and we can store the result. Bootstrapped/regtested x86_64, OK? Honza * tree-ssa-loop-niter.c (finite_loop_p): Reorg to use max_loop_iterations. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192991) +++ tree-ssa-loop-niter.c (working copy) @@ -1993,11 +1990,7 @@ find_loop_niter (struct loop *loop, edge bool finite_loop_p (struct loop *loop) { - unsigned i; - VEC (edge, heap) *exits; - edge ex; - struct tree_niter_desc desc; - bool finite = false; + double_int nit; int flags; if (flag_unsafe_loop_optimizations) @@ -2011,26 +2004,22 @@ finite_loop_p (struct loop *loop) return true; } - exits = get_loop_exit_edges (loop); - FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (loop-any_upper_bound) { - if (!just_once_each_iteration_p (loop, ex-src)) - continue; + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound is recorded.\n, +loop-num); + return true; +} - if (number_of_iterations_exit (loop, ex, desc, false)) -{ - if (dump_file (dump_flags TDF_DETAILS)) - { - fprintf (dump_file, Found loop %i to be finite: iterating , loop-num); - print_generic_expr (dump_file, desc.niter, TDF_SLIM); - fprintf (dump_file, times\n); - } - finite = true; - break; - } + if (max_loop_iterations (loop, nit)) +{ + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound found.\n, +loop-num); + return true; } - VEC_free (edge, heap, exits); - return finite; + return false; } /*
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, 31 Oct 2012, Jan Hubicka wrote: unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? It's not there (and I would disable it for -Og). Richard.
Re: RFA: hookize ADJUST_INSN_LENGTH
Quoting Richard Sandiford rdsandif...@googlemail.com: It's about describing complex interactions of length adjustments that derive from branch shortening and length added for (un)alignment for scheduling purposes. Expressed naively, these can lead to cycles. But shorten_branches should be written to avoid cycles, and I thought your patch did that by making sure that the address of each instruction only increases. This actually gives a very poor branch shortening result for ARC. What I did before with the lock_length attribute was only locking in increases for the length of branches, and preserving a complex set of invariants to avoid cycles from the (mis)alignment padding. Saying that for some instructions, the length locking comes in effect only after a few iterations of not permanently increasing the size of other instructions is still cycle-safe. The length variation for the ARC are not alike: there are branches that are subject to branch shortening in the usual way, but they might shrink when other things shrink. When we are iterating starting at minimal length and increasing lengths, these branches should be stopped from shrinking, as they likly will grow back again in the next iteration. OTOH there are potentially short instructions that are lengthened to archive scheduling-dictated alignment or misalignment. These should be allowed to change freely back and forth. Unless we have a rare cycle that only depends on alignments. Hmm, but this is still talking in terms of shorten_branches, rather than the target property that we're trying to model. It sounds like the property is something along the lines of: some instructions have different encodings (or work-alikes with different encodings), and we want to make all the encodings available to the optimisers. Is that right? No, not really. There are lots of redundant encodings, but we have only one favourite encoding for each size. We usually want the shortest version, unless we want to tweak the alignment of a following instruction, or in the case of unaligned conditional branches with unfilled delay slot, to hide the branch penalty (don't ask me why it has that effect...). If so, that sounds like a useful thing to model, but I think it should be modelled directly, rather than as a modification to the shorten_branches algorithm. The shorten_branches algorithm can't give the correct result without taking these things into account. Also, instruction lengths are needed to know the requirements. So, do you want me to run a scheduling pass in each iteration of shorten_branches? Is the alignment/misalignment explicitly modelled in the rtl? If so, how? Using labels, or something else? No, it works with instruction addresses during branch shortening, and extra bits in cfun-machine during instruction output to keep track of the current alignment.
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote: On Wed, 31 Oct 2012, Jan Hubicka wrote: unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? It's not there (and I would disable it for -Og). Generally, most of the loop transforms are undesirable for -Og. Jakub
Re: Modernize loop_finite_p
On Wed, 31 Oct 2012, Jan Hubicka wrote: Hi, this patch changes finite_loop_p to use max_loop_iterations. Long time ago I made finite_loop_p as rip-off from the max_loop_iterations skipping parts that are not exactly related to the number of iteration estimates. It went out of date since then completelly missing the bounds derived from overflows and array bounds that are quite useful. So I guess these days it is better to simply implement it using max_loop_iterations, we do not save that much of effort and we can store the result. Bootstrapped/regtested x86_64, OK? Honza * tree-ssa-loop-niter.c (finite_loop_p): Reorg to use max_loop_iterations. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192991) +++ tree-ssa-loop-niter.c (working copy) @@ -1993,11 +1990,7 @@ find_loop_niter (struct loop *loop, edge bool finite_loop_p (struct loop *loop) { - unsigned i; - VEC (edge, heap) *exits; - edge ex; - struct tree_niter_desc desc; - bool finite = false; + double_int nit; int flags; if (flag_unsafe_loop_optimizations) @@ -2011,26 +2004,22 @@ finite_loop_p (struct loop *loop) return true; } - exits = get_loop_exit_edges (loop); - FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (loop-any_upper_bound) { - if (!just_once_each_iteration_p (loop, ex-src)) - continue; + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound is recorded.\n, + loop-num); + return true; +} This looks redundant with ... - if (number_of_iterations_exit (loop, ex, desc, false)) -{ - if (dump_file (dump_flags TDF_DETAILS)) - { - fprintf (dump_file, Found loop %i to be finite: iterating , loop-num); - print_generic_expr (dump_file, desc.niter, TDF_SLIM); - fprintf (dump_file, times\n); - } - finite = true; - break; - } + if (max_loop_iterations (loop, nit)) +{ ... this. If you want to short-circuit max_loop_iterations () then why not if (loop-any_upper_bound || max_loop_iterations (loop, nit)) { ... ? The different dumping seems not a good reason to obfuscate it. Richard. + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound found.\n, + loop-num); + return true; } - VEC_free (edge, heap, exits); - return finite; + return false; } /*
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, 31 Oct 2012, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote: On Wed, 31 Oct 2012, Jan Hubicka wrote: unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? It's not there (and I would disable it for -Og). Generally, most of the loop transforms are undesirable for -Og. Maybe with the exception of un-looping once rolling loops. But yes, even loop header copying is undesirable in general. Richard.
Re: [PATCH] Fix PR53501
Your change caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142 Please check whether it worked before Richard's fix (r188009). -- Eric Botcazou
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were
Re: RFA: hookize ADJUST_INSN_LENGTH
On Wed, Oct 31, 2012 at 1:33 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Sandiford rdsandif...@googlemail.com: It's about describing complex interactions of length adjustments that derive from branch shortening and length added for (un)alignment for scheduling purposes. Expressed naively, these can lead to cycles. But shorten_branches should be written to avoid cycles, and I thought your patch did that by making sure that the address of each instruction only increases. This actually gives a very poor branch shortening result for ARC. What I did before with the lock_length attribute was only locking in increases for the length of branches, and preserving a complex set of invariants to avoid cycles from the (mis)alignment padding. Saying that for some instructions, the length locking comes in effect only after a few iterations of not permanently increasing the size of other instructions is still cycle-safe. The length variation for the ARC are not alike: there are branches that are subject to branch shortening in the usual way, but they might shrink when other things shrink. When we are iterating starting at minimal length and increasing lengths, these branches should be stopped from shrinking, as they likly will grow back again in the next iteration. OTOH there are potentially short instructions that are lengthened to archive scheduling-dictated alignment or misalignment. These should be allowed to change freely back and forth. Unless we have a rare cycle that only depends on alignments. Hmm, but this is still talking in terms of shorten_branches, rather than the target property that we're trying to model. It sounds like the property is something along the lines of: some instructions have different encodings (or work-alikes with different encodings), and we want to make all the encodings available to the optimisers. Is that right? No, not really. There are lots of redundant encodings, but we have only one favourite encoding for each size. We usually want the shortest version, unless we want to tweak the alignment of a following instruction, or in the case of unaligned conditional branches with unfilled delay slot, to hide the branch penalty (don't ask me why it has that effect...). Maybe we should split the thing then into a adjust_insn_length attribute without the iteration parameter and a hook, similar to the various scheduler hooks, that is special for branch shortening. Richard. If so, that sounds like a useful thing to model, but I think it should be modelled directly, rather than as a modification to the shorten_branches algorithm. The shorten_branches algorithm can't give the correct result without taking these things into account. Also, instruction lengths are needed to know the requirements. So, do you want me to run a scheduling pass in each iteration of shorten_branches? Is the alignment/misalignment explicitly modelled in the rtl? If so, how? Using labels, or something else? No, it works with instruction addresses during branch shortening, and extra bits in cfun-machine during instruction output to keep track of the current alignment.
Re: Fix bugs introduced by switch-case profile propagation
On Fri, 2012-10-26 at 17:05 +0200, Jan Hubicka wrote: Hi, On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote: Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. Seems fine, but under what conditions you get NULL here? Wasn't sure if this is an OK for the patch or if I need to address anything else. Actually I think you should make the except.c to setcurrent_bb when expanding the switch instead. OK with this change. Is there any progress regarding this issue? It makes testing on SH against current trunk difficult. Would it be OK to at least commit this hunk? Index: gcc/optabs.c === --- gcc/optabs.c(revision 192963) +++ gcc/optabs.c(working copy) @@ -4270,8 +4270,8 @@ JUMP_P (insn) any_condjump_p (insn)) { - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); + if (!find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } } I've tested this on rev 192983 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and it looks OK. Cheers, Oleg
Re: [PATCH] Fix PR53501
On Wed, Oct 31, 2012 at 5:42 AM, Eric Botcazou ebotca...@adacore.com wrote: Your change caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142 Please check whether it worked before Richard's fix (r188009). It failed with revision 188008. -- H.J.
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)
On 10/31/2012 20:01, Jonathan Wakely wrote: On 31 October 2012 11:23, JonY wrote: On 10/31/2012 19:12, Jonathan Wakely wrote: It looks like the workaround is in mingw not in GCC, so is it a problem that it won't be possible to use GCC 4.8 with existing mingw versions, or are users required to use a brand new mingw to use a new GCC? Should that be documented in http://gcc.gnu.org/gcc-4.8/changes.html ? They are required to use the latest mingw-w64, the problem was that the vfswprintf that libstdc++ expects isn't the same as the one MS provides, so I've wrote a redirector to use the vsnwprintf, more precisely, the mingw C99 compliant __mingw_vsnwprintf. std::to_wstring and std::to_string work according to some simple tests. Excellent, the testsuite should automatically start running the relevant tests and we should be able to close http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015 Yes, the correct way would be to check if the prototype given matches the ISO version. MS version has 1 less argument. The workaround is only active when C++11 is used though to maintain compatibility to older code. The actual guard in the mingw-w64: #if defined(__cplusplus) (__cplusplus = 201103L) !defined(_GLIBCXX_HAVE_BROKEN_VSWPRINTF) ...Redirect to ISO C99 compliant adapter with 4 args... #else ...Use MS Version with 3 args... #endif So the tests and checks would just need to use -std=c++11 or gnu++11. I guess the current comment about require mingw-w64 trunk at least r5437 is OK for the changes page. It should probably note that this change is mingw-w64 specific, with w64 as the vendor key. i.e. *-w64-mingw* ? Or is *-w64-mingw32 more accurate? (I don't really know how the mingw target triplets work, sorry!) In practice, *-w64-mingw32 is used more commonly when specifying triplet, though for flexibility, checking should use *-w64-mingw*. I'll put something in the changes page later. Thanks for fixing this. Thanks to Paolo for pushing it in too. signature.asc Description: OpenPGP digital signature
Re: Adapt one fold-const optimization for vectors
On Wed, 31 Oct 2012, Jakub Jelinek wrote: On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote: On Tue, 30 Oct 2012, Marek Polacek wrote: On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Capital C instead of lowercase. Ok. Patch missing? Indeed, thanks for the notice. This regressed FAIL: g++.dg/other/vector-compare.C (internal compiler error) FAIL: g++.dg/other/vector-compare.C (test for excess errors) on i686-linux. Thanks. I am surprised, I expected this problem to happen only for code where the optimization was previously crashing. Actually, before my patch, the code was apparently silently miscompiled. We appear to be missing code that would check that the result of a vector operation is a vector (in particular not a boolean). The problem is that tree-vect-generic.c doesn't expand BLKmode VEC_COND_EXPR piecewise. So, either fold-const on this optimization needs to test whether expand_vec_cond_expr_p is true for the chosen pair of comparison type and operand type, or tree-vect-generic.c needs to be tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not true for the particular VEC_COND_EXPR. The second one is needed anyway (that's PR 55001, which I mention in my email). One quick hack as long as it isn't done would be to replace: cond_code = VEC_COND_EXPR; with return NULL_TREE; (and a comment linking to PR55001 and this conversation). Now I am a bit afraid that the vector lowering pass is not the last one, and some later pass might call fold and re-create unexpandable vector operations, in which case the first option might also be needed... but I hope not. -- Marc Glisse
Re: RFA: hookize ADJUST_INSN_LENGTH
Quoting Richard Biener richard.guent...@gmail.com: Maybe we should split the thing then into a adjust_insn_length attribute without the iteration parameter Attributes don't get any parameter but the instruction, and don't apply to delay slot SEQUENCEs.
Re: RFA: hookize ADJUST_INSN_LENGTH
On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Maybe we should split the thing then into a adjust_insn_length attribute without the iteration parameter Attributes don't get any parameter but the instruction, and don't apply to delay slot SEQUENCEs. hook I meant, not attribute. Richard.
Re: [Ada] Fix Makefile dependencies of C files
On Tue, Oct 30, 2012 at 6:50 PM, Eric Botcazou ebotca...@adacore.com wrote: This will hopefully fix the build failure reported by Diego. Apart from adding missing dependencies, this also removes redundant command lines. Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch. 2012-10-30 Eric Botcazou ebotca...@adacore.com * gcc-interface/Make-lang.in: Fix and clean up rules for C files. Thanks! I'll update my tree and give it a try. Diego.
Re: patch to fix constant math - 4th patch - the wide-int class.
Richard Biener richard.guent...@gmail.com writes: But that means that wide_int has to model a P-bit operation as a normal len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize precision and that the bits precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) Richard
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Richi, Let me explain to you what a broken api is. I have spent the last week screwing around with tree-vpn and as of last night i finally got it to work. In tree-vpn, it is clear that double-int is the precise definition of a broken api. The tree-vpn uses an infinite-precision view of arithmetic. However, that infinite precision is implemented on top of a finite, CARVED IN STONE, base that is and will always be without a patch like this, 128 bits on an x86-64.However, as was pointed out by earlier, tree-vrp needs 2 * the size of a type + 1 bit to work correctly.Until yesterday i did not fully understand the significance of that 1 bit. what this means is that tree-vrp does not work on an x86-64 with _int128 variables. There are no checks in tree-vrp to back off when it sees something too large, tree-vrp simply gets the wrong answer. To me, this is a broken api and is GCC at its very worst. The patches that required this SHOULD HAVE NEVER GONE INTO GCC. What you have with my patches is someone who is willing to fix a large and complex problem that should have been fixed years ago. I understand that you do not like several aspects of the wide-int api and i am willing to make some of those improvements. However, what i am worried about is that you are in some ways really attached to the style of programmed where everything is dependent on the size of a HWI.I will continue to push back on those comments but have been working the rest in as i have been going along. To answer your other question, it will be a significant problem if i cannot get these patches in. They are very prone to patch rot and my customer wants a product without many patches to the base code. Also, i fear that your real reason that you want to wait is because you really do not like the fact these patches get rid of double in and that style of programming and putting off that day serves no one well. kenny On 10/31/2012 05:59 AM, Richard Sandiford wrote: Richard Bienerrichard.guent...@gmail.com writes: On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. I suppose you are not going to merge your private port for 4.8 and thus the wide-int changes are not a show-stopper for you. That said, I considered the main conversion to be appropriate to be defered for the next stage1. There is no advantage in disrupting the tree more at this stage. I would like the wide_int class and rtl stuff to go in 4.8 though. IMO it's a significant improvement in its own right, and Kenny submitted it well before the deadline. Richard
Re: [C++ Patch] PR 54583
On 10/30/2012 07:44 PM, Paolo Carlini wrote: straightforward fix. The below, which regtests fine, simply acts on any MULT_EXPR as TYPE_SIZE, which I think should be fine, but, in case the idea is basically Ok, we could also consider the more complex but more sophisticated variably_modified_type_p? I think I'd use EXPR_P rather than check for a particular tree code. Jason
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Wed, 2012-10-31 at 10:12 +0100, Richard Biener wrote: On Tue, Oct 30, 2012 at 9:23 PM, Peter Bergner berg...@vnet.ibm.com wrote: On Tue, 2012-10-30 at 20:37 +0100, Jakub Jelinek wrote: On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote: Ok, then I'll bootstrap and regtest your suggested change while we wait for richi to comment. I'm fine with whatever you and richi decide is best. The ObjC guys should probably test it though too. I assume you think we should change the current trunk code as well? Well, I haven't looked at the ObjC failures (guess they are darwin only anyway), so have no idea whether those set section name or not. So, if my proposed test instead of the trunk one doesn't work for darwin, perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) !...). I think DECL_USER_ALIGN test is undesirable for that though, it is just fine to increase alignment of anything, after all, it is still aligned properly then. I can confirm it bootstraps and regtests without any errors...and it fixes my problem. I'm fine with that, but please give it some time on trunk before backporting it. Great. Jakub, were you going to commit your change or did you want me to do that? Peter
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: But that means that wide_int has to model a P-bit operation as a normal len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I don't propose doing that. I propose that no wide-int member function may _change_ it's len (to something larger). Only that way you can avoid allocating wasted space for zero and -1. That way also the artificial limit on 2 * largest-int-mode-hwis goes. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Because nobody cares about accurate bit-tracking for modes larger than TImode. And because no convenient abstraction was available ;) Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize precision and that the bits precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. So, can we please remove all traces of bitsize from wide-int then? But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) There are many (silent) people that are able to approve the thing. But the point is I have too many issues with the current patch that I'm unable to point at a specific thing I want Kenny to change after which the patch would be fine. So I rely on some guesswork from Kenny giving my advices leaner API, less fused ops, get rid of bitsize, think of abstracting the core HWI[len] operation, there should be no tree or RTL dependencies in the wide-int API to produce an updated variant. Which of course takes time, which of course crosses my vacation, which in the end means it isn't going to make 4.8 (I _do_ like the idea of not having a dependence on host properties for integer constant representation). Btw, a good hint at what a minimal wide-int API would look like is if you _just_ replace double-int users with it. Then you obviously have to implement only the double-int interface and conversion from/to double-int. Richard. Richard
Re: RFA: hookize ADJUST_INSN_LENGTH
Quoting Richard Biener richard.guent...@gmail.com: On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Maybe we should split the thing then into a adjust_insn_length attribute without the iteration parameter Attributes don't get any parameter but the instruction, and don't apply to delay slot SEQUENCEs. hook I meant, not attribute. All right. So I gather first is called again adjust_insn_length. What do we call the one for the iteration count (or priority, if you want to put it that way) for the lock_length mechanism to become effective? int lock_length_iter_count (rtx_insn, bool in_delay_slot) ? int lock_length_priority (rtx_insn, bool iter_count) ?
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote: The tree-vpn uses an infinite-precision view of arithmetic. However, that infinite precision is implemented on top of a finite, CARVED IN STONE, base that is and will always be without a patch like this, 128 bits on an x86-64.However, as was pointed out by earlier, tree-vrp needs 2 * the size of a type + 1 bit to work correctly. Until yesterday i did not fully understand the significance of that 1 bit. what this means is that tree-vrp does not work on an x86-64 with _int128 variables. If you see a VRP bug, please file a PR with a testcase, or point to existing PR. I agree with richi that it would be better to add a clean wide_int implementation for 4.9, rather than rushing something in, introducing lots of bugs, just for a port that hasn't been submitted, nor I understand why int128_t integer types are so crucial to your port, the vector support doesn't generally need very large integers, even if your vector unit is 256-bit, 512-bit or larger. Jakub
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Wed, Oct 31, 2012 at 08:53:31AM -0500, Peter Bergner wrote: Great. Jakub, were you going to commit your change or did you want me to do that? I haven't tested it, you did, so please do that yourself. Thanks. Jakub
Re: [Patch] Potential fix for PR55033
Hello Alan, maybe it is better to use a require effective target instead of the { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* } patterns scattered around in the testsuite? One problem with this is that test cases for one of these will likely also work with powerpc*-*-rtems*. I want to look at the remaining 104 test cases which are marked as UNSUPPORTED for RTEMS some time in the future. Please have a look at the attached patches. On 10/31/2012 01:12 AM, Alan Modra wrote: On Tue, Oct 30, 2012 at 02:45:40PM +0100, Sebastian Huber wrote: On 10/26/2012 02:22 PM, Sebastian Huber wrote: Hello, here is a test case for PR55033. Is there something wrong with this test case? It compiles well with Alan's patch. It looks OK to me if you replace your gd-do compile line with the following two lines to avoid failures on powerpc targets that don't support -meabi -msdata. /* { dg-do compile { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* } } */ /* { dg-require-effective-target ilp32 } */ -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG. From a57f179c89b5ac1bb6bf8ecfa8fa47a6e8e7c6c0 Mon Sep 17 00:00:00 2001 From: Sebastian Huber sebastian.hu...@embedded-brains.de Date: Wed, 31 Oct 2012 10:36:49 +0100 Subject: [PATCH 1/2] PowerPC testsuite clean up 2012-10-31 Sebastian Huber sebastian.hu...@embedded-brains.de * lib/target-supports.exp (check_effective_target_powerpc_canonical): New. (check_effective_target_powerpc_eabi_ok): New. * gcc.target/powerpc/ppc-eabi.c: Use require effective target powerpc_eabi_ok. * gcc.target/powerpc/ppc-sdata-1.c: Likewise. * gcc.target/powerpc/spe-small-data-2.c: Likewise. * gcc.target/powerpc/ppc-sdata-2.c: Use require effective target powerpc_canonical and ilp32. * gcc.target/powerpc/pr51623.c: Likewise. * gcc.target/powerpc/ppc-stackalign-1.c: Use require effective target powerpc_canonical. * gcc.target/powerpc/ppc-ldstruct.c: Run for powerpc*-*-rtems*. * gcc.target/powerpc/spe-small-data-2.c: Do not run, compile only. --- gcc/testsuite/gcc.target/powerpc/ppc-eabi.c|3 +- gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c|2 +- gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c |3 +- gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c |4 ++- .../gcc.target/powerpc/ppc-stackalign-1.c |3 +- gcc/testsuite/gcc.target/powerpc/pr51623.c |4 ++- .../gcc.target/powerpc/spe-small-data-2.c |3 +- gcc/testsuite/lib/target-supports.exp | 23 8 files changed, 38 insertions(+), 7 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c b/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c index 47ba1a7..cd15586 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c @@ -1,4 +1,5 @@ /* PR target/16952 */ -/* { dg-do compile { target { powerpc*-*-linux* ilp32 } } } */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_eabi_ok } */ /* { dg-options -meabi -mrelocatable } */ char *s = boo; diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c b/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c index da6001f..ffb4264 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c @@ -1,4 +1,4 @@ -/* { dg-do run { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* } } */ +/* { dg-do run { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* powerpc*-*-rtems* } } */ /* { dg-options -O -mlong-double-128 } */ #include stdlib.h diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c index 5f39d86..efd5a5e 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c @@ -1,5 +1,6 @@ -/* { dg-do compile { target { { powerpc*-*-linux* ilp32 } || { powerpc-*-eabi* } } } } */ +/* { dg-do compile } */ /* { dg-options -O2 -fno-common -G 8 -meabi -msdata=eabi } */ +/* { dg-require-effective-target powerpc_eabi_ok } */ /* { dg-require-effective-target nonpic } */ /* { dg-final { scan-assembler \\.section\[ \t\]\\.sdata, } } */ /* { dg-final { scan-assembler \\.section\[ \t\]\\.sdata2, } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c index f102b1d..fdc4030 100644 --- a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c +++ b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c @@ -1,5 +1,7 @@ -/* { dg-do compile { target { { powerpc*-*-linux* ilp32 } || { powerpc-*-eabi* } } } } */ +/* { dg-do compile } */ /* { dg-options -O2 -fno-common -G 8 -msdata=sysv } */ +/* {
Re: Modernize loop_finite_p
- FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (loop-any_upper_bound) { - if (!just_once_each_iteration_p (loop, ex-src)) - continue; + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound is recorded.\n, +loop-num); + return true; +} This looks redundant with ... - if (number_of_iterations_exit (loop, ex, desc, false)) -{ - if (dump_file (dump_flags TDF_DETAILS)) - { - fprintf (dump_file, Found loop %i to be finite: iterating , loop-num); - print_generic_expr (dump_file, desc.niter, TDF_SLIM); - fprintf (dump_file, times\n); - } - finite = true; - break; - } + if (max_loop_iterations (loop, nit)) +{ ... this. If you want to short-circuit max_loop_iterations () then why not if (loop-any_upper_bound || max_loop_iterations (loop, nit)) { ... ? The different dumping seems not a good reason to obfuscate it. Sounds good to me. I only wanted to avoid max_loop_iterations re-trying to derrive the bound when it is already known to be finite. I think I will eventually make max_loop_iterations and friends to do the hard work only once per loop optimizer queue or when asked to collect the current bounds (since the pointers to statements are hard to maintain). OK with that change? Honza
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
jakub my port has 256 bit integers. They are done by strapping together all of the elements of a vector unit. if one looks at where intel is going, they are doing exactly the same thing.The difference is that they like to add the operations one at a time rather than just do a clean implementation like we did. Soon they will get there, it is just a matter of time. i understand the tree-vrp code well enough to say that this operation does not work if you have timode, but i do not know how to translate that back into c to generate a test case.My patch to tree-vrp is adaptable in that it looks at the types in the program and adjusts its definition of infinite precision based on the code that it sees. I can point people to that code in tree vrp and am happy to do that, but that is not my priority now. also, richi pointed out that there are places in the tree level constant propagators that require infinite precision so he is really the person who both should know about this and generate proper tests. kenny On 10/31/2012 09:55 AM, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote: The tree-vpn uses an infinite-precision view of arithmetic. However, that infinite precision is implemented on top of a finite, CARVED IN STONE, base that is and will always be without a patch like this, 128 bits on an x86-64.However, as was pointed out by earlier, tree-vrp needs 2 * the size of a type + 1 bit to work correctly. Until yesterday i did not fully understand the significance of that 1 bit. what this means is that tree-vrp does not work on an x86-64 with _int128 variables. If you see a VRP bug, please file a PR with a testcase, or point to existing PR. I agree with richi that it would be better to add a clean wide_int implementation for 4.9, rather than rushing something in, introducing lots of bugs, just for a port that hasn't been submitted, nor I understand why int128_t integer types are so crucial to your port, the vector support doesn't generally need very large integers, even if your vector unit is 256-bit, 512-bit or larger. Jakub
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote: On Wed, 31 Oct 2012, Jan Hubicka wrote: unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? It's not there (and I would disable it for -Og). Generally, most of the loop transforms are undesirable for -Og. Hmm, do we have some guidelines what is and is not desirable for -Og? Unrolling/header copying being code duplicating optimization do not really throw away any info by themself only require debugger to understand that some code may get duplicated, but it is not different from i.e. inlining. Of course the subsequent const prop will make it impossible to write into the iv variable, but do we want to support reliable writting into user vars (i.e. disable cprop on user vars) at -Og? Honza Jakub
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were
Re: Modernize loop_finite_p
On Wed, 31 Oct 2012, Jan Hubicka wrote: - FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (loop-any_upper_bound) { - if (!just_once_each_iteration_p (loop, ex-src)) - continue; + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Found loop %i to be finite: upper bound is recorded.\n, + loop-num); + return true; +} This looks redundant with ... - if (number_of_iterations_exit (loop, ex, desc, false)) -{ - if (dump_file (dump_flags TDF_DETAILS)) - { - fprintf (dump_file, Found loop %i to be finite: iterating , loop-num); - print_generic_expr (dump_file, desc.niter, TDF_SLIM); - fprintf (dump_file, times\n); - } - finite = true; - break; - } + if (max_loop_iterations (loop, nit)) +{ ... this. If you want to short-circuit max_loop_iterations () then why not if (loop-any_upper_bound || max_loop_iterations (loop, nit)) { ... ? The different dumping seems not a good reason to obfuscate it. Sounds good to me. I only wanted to avoid max_loop_iterations re-trying to derrive the bound when it is already known to be finite. I think I will eventually make max_loop_iterations and friends to do the hard work only once per loop optimizer queue or when asked to collect the current bounds (since the pointers to statements are hard to maintain). OK with that change? Ok. Thanks, Richard.
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Wed, 2012-10-31 at 14:55 +0100, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 08:53:31AM -0500, Peter Bergner wrote: Great. Jakub, were you going to commit your change or did you want me to do that? I haven't tested it, you did, so please do that yourself. Thanks. I tested it on FSF 4.7. I'll do a quick trunk bootstrap/regtest and commit it when it passes. Thanks. Peter
Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4
On Wed, 31 Oct 2012, Jan Hubicka wrote: On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote: On Wed, 31 Oct 2012, Jan Hubicka wrote: unroll you mean. Because unrolling mutates the CFG too much. Well - it was just a starting point, populating -Og with as little as possible and 100% profitable transforms (in both debug and speed metric). In late opts we only do (early opt queue is shared): Well, and what about early cunrolli? It's not there (and I would disable it for -Og). Generally, most of the loop transforms are undesirable for -Og. Hmm, do we have some guidelines what is and is not desirable for -Og? Unrolling/header copying being code duplicating optimization do not really throw away any info by themself only require debugger to understand that some code may get duplicated, but it is not different from i.e. inlining. Of course the subsequent const prop will make it impossible to write into the iv variable, but do we want to support reliable writting into user vars (i.e. disable cprop on user vars) at -Og? No, we don't support reliable writing. We should first sort out issues with the current passes and debug info, look at compile-time and runtime performance and then add new passes. Richard.
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 10:05 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour
Re: patch to fix constant math - 4th patch - the wide-int class.
On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 10:05 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK,
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 09:54 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: But that means that wide_int has to model a P-bit operation as a normal len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I don't propose doing that. I propose that no wide-int member function may _change_ it's len (to something larger). Only that way you can avoid allocating wasted space for zero and -1. That way also the artificial limit on 2 * largest-int-mode-hwis goes. it is now 4x not 2x to accomodate the extra bit in tree-vrp. remember that the space burden is minimal.wide-ints are not persistent and there are never more than a handful at a time. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Because nobody cares about accurate bit-tracking for modes larger than TImode. And because no convenient abstraction was available ;) yes, but tree-vrp does not even work for timode. and there are not tests to scale it back when it does see ti-mode. I understand that these can be added, but they so far have not been. I would also point out that i was corrected on this point by (i believe) lawrence. He points out that tree-vrp is still important for converting signed to unsigned for larger modes. Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize precision and that the bits precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. So, can we please remove all traces of bitsize from wide-int then? But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) There are many (silent) people that are able to approve the thing. But the point is I have too many issues with the current patch that I'm unable to point at a specific thing I want Kenny to change after which the patch would be fine. So I rely on some guesswork from Kenny giving my advices leaner API, less fused ops, get rid of bitsize, think of abstracting the core HWI[len] operation, there should be no tree or RTL dependencies in the wide-int API to produce an updated variant. Which of course takes time, which of course crosses my vacation, which in the end means it isn't going to make 4.8 (I _do_ like the idea of not having a dependence on host properties for integer constant representation). Btw, a good hint at what a minimal wide-int API would look like is if you _just_ replace double-int users with it. Then you obviously have to implement only the double-int interface and conversion from/to double-int. Richard. Richard
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On 10/30/2012 05:49 PM, Sriraman Tallam wrote: AFAIU, this should not be a problem. For duplicate declarations, duplicate_decls should merge them and they should never be seen here. Did I miss something? With extern C functions you can have multiple declarations of the same function in different namespaces that are not duplicates, but still match. And I can't think what that test is supposed to be catching, anyway. No, I thought about this but I did not want to handle this case in this iteration. The dispatcher is created only once and if more functions are declared later, they will not be dispatched atleast in this iteration. I still think that instead of collecting the set of functions in overload resolution, they should be collected at declaration time and added to a vector in the cgraph information for use when generating the body of the dispatcher. You talked about doing the dispatcher building later, but I did it here since I am doing it only once. I still don't think this is the right place for it. dispatcher_node does not have a body until it is generated in cgraphunit.c, so cgraph does not mark this field before this is processed in cgraph_analyze_function. That seems like something to address in your cgraph changes. Jason
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 10:24 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 10:05 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote: if one looks at where intel is going, they are doing exactly the same thing.The difference is that they like to add the operations one at a time rather than just do a clean implementation like we did. Soon they will get there, it is just a matter of time. All I see on Intel is whole vector register shifts (and like on many other ports and/or/xor/andn could be considered whole register too). And, even if your port has 256-bit integer arithmetics, there is no mangling for __int256_t or similar, so I don't see how you can offer such data type as supported in the 4.8 timeframe. Jakub
Re: [C++ Patch] PR 54583
Hi, On 10/31/2012 02:50 PM, Jason Merrill wrote: On 10/30/2012 07:44 PM, Paolo Carlini wrote: straightforward fix. The below, which regtests fine, simply acts on any MULT_EXPR as TYPE_SIZE, which I think should be fine, but, in case the idea is basically Ok, we could also consider the more complex but more sophisticated variably_modified_type_p? I think I'd use EXPR_P rather than check for a particular tree code. Yes, I also wondered whether that would be better. The below also passes testing. Thanks, Paolo. / Index: testsuite/g++.dg/ext/vla13.C === --- testsuite/g++.dg/ext/vla13.C(revision 0) +++ testsuite/g++.dg/ext/vla13.C(working copy) @@ -0,0 +1,8 @@ +// PR c++/54583 +// { dg-options -Wunused-value } + +void fred() +{ + int n=10; + double (*x)[n]; +} Index: cp/tree.c === --- cp/tree.c (revision 193033) +++ cp/tree.c (working copy) @@ -824,6 +824,10 @@ build_cplus_array_type (tree elt_type, tree index_ } } + /* Avoid spurious warnings with VLAs (c++/54583). */ + if (TYPE_SIZE (t) EXPR_P (TYPE_SIZE (t))) +TREE_NO_WARNING (TYPE_SIZE (t)) = 1; + /* Push these needs up so that initialization takes place more easily. */ TYPE_NEEDS_CONSTRUCTING (t)
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 08:44 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck zad...@naturalbridge.com wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for small sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int2 should be identical to double_int, thus we should be able to do typedef wide_int2 double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low ~b.low; result.high = high ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in
Re: [PATCH] Fix PR53501
It failed with revision 188008. OK, thanks. So the testcase never compiled on the trunk (except for about 24 hours between 188009 188118) or did it compile before 188008 at some point? -- Eric Botcazou
committed: fix compilation failure for ports that don't use LEN parameter of MOVE_BY_PIECES_P
Committed as obvious. 2012-10-31 Joern Rennecke joern.renne...@embecosm.com * expr.c (can_move_by_pieces): Apply ATTRIBUTE_UNUSED to len. Index: expr.c === --- expr.c (revision 193034) +++ expr.c (working copy) @@ -841,7 +841,7 @@ widest_int_mode_for_size (unsigned int s succeed. */ int -can_move_by_pieces (unsigned HOST_WIDE_INT len, +can_move_by_pieces (unsigned HOST_WIDE_INT len ATTRIBUTE_UNUSED, unsigned int align ATTRIBUTE_UNUSED) { return MOVE_BY_PIECES_P (len, align);
Re: [PATCH] Update source location for PRE inserted stmt
On Wed, Oct 31, 2012 at 2:34 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 31, 2012 at 12:57 AM, Xinliang David Li davi...@google.com wrote: It will make the location info for the newly synthesized stmt more deterministic, I think. Maybe, but it will increase the jumpiness in the debugger without actually being accurate, no? For example if the partially redundant expression is Sometimes, it is necessary to have a jump in gdb, especially when control flow changes. For the test case I gave, people may want to know if the branch in line 10 is taken or not. Without this patch, the gdb simply jump from line 10 to line 15 if the branch is not taken. But with the patch, people get the correct control flow. One more thing, for AutoFDO to work, we want to make sure debug info is correct at control flow boundaries. That is why we would prefer deterministic location info, instead of randomly inherit debug info from previous BB. Thanks, Dehao i + j; then when computed at the insertion point the values of i and j do not necessarily reflect the computed value! Instead we may compute the result of i + j using completely different components / operation. Thus I think inserted expressions should not have any debug information at all because they do not correspond to a source line. Richard. David On Tue, Oct 30, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Oct 31, 2012 at 12:00 AM, Dehao Chen wrote: This patch aims to improve debugging of optimized code. It ensures that PRE inserted statements have the same source location as the statement at the insertion point, instead of UNKNOWN_LOCATION. Wrong patch attached. However, is it really better to have the location of the insertion point than to have UNKNOWN_LOCATION? It's not where the value is computed in the source program... Ciao! Steven
Re: [PATCH] Fix debug info for expr and jump stmt
Yeah. But please also check gdb testsuite for this kind of patches. This patch also passed gdb testsuite. Thanks, Dehao Jakub
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
I was not planning to do that mangling for 4.8.My primary justification for getting it in publicly now is that there are a large number of places where the current compiler (both at the tree and rtl levels) do not do optimization of the value is larger than a single hwi.My code generalizes all of these places so that they do the transformations independent of the size of the hwi. (in some cases at the rtl level, the transformations were only done on 32 bit or smaller types, but i have seen nothing like that at the tree level.) This provides benefits for cross compilers and for ports that support timode now. The fact that i have chosen to do it in such a way that we will never have this problem again is the part of the patch that richi seems to object to. We have patches that do the mangling for 256 for the front ends but we figured that we would post those for comments. These are likely to be controversial because the require extensions to the syntax to accept large constants. But there is no reason why the patches that fix the existing problems in a general way should not be considered for this release. Kenny On 10/31/2012 10:27 AM, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote: if one looks at where intel is going, they are doing exactly the same thing.The difference is that they like to add the operations one at a time rather than just do a clean implementation like we did. Soon they will get there, it is just a matter of time. All I see on Intel is whole vector register shifts (and like on many other ports and/or/xor/andn could be considered whole register too). And, even if your port has 256-bit integer arithmetics, there is no mangling for __int256_t or similar, so I don't see how you can offer such data type as supported in the 4.8 timeframe. Jakub
Re: RFA: hookize ADJUST_INSN_LENGTH
Joern Rennecke joern.renne...@embecosm.com writes: Quoting Richard Sandiford rdsandif...@googlemail.com: The length variation for the ARC are not alike: there are branches that are subject to branch shortening in the usual way, but they might shrink when other things shrink. When we are iterating starting at minimal length and increasing lengths, these branches should be stopped from shrinking, as they likly will grow back again in the next iteration. OTOH there are potentially short instructions that are lengthened to archive scheduling-dictated alignment or misalignment. These should be allowed to change freely back and forth. Unless we have a rare cycle that only depends on alignments. Hmm, but this is still talking in terms of shorten_branches, rather than the target property that we're trying to model. It sounds like the property is something along the lines of: some instructions have different encodings (or work-alikes with different encodings), and we want to make all the encodings available to the optimisers. Is that right? No, not really. There are lots of redundant encodings, but we have only one favourite encoding for each size. We usually want the shortest version, unless we want to tweak the alignment of a following instruction, or in the case of unaligned conditional branches with unfilled delay slot, to hide the branch penalty (don't ask me why it has that effect...). But what I'm trying to get at is: why can't the backend tell shorten_branches about the amount of alignment/misalignment that the target wants, and where? Via an attribute or a hook, I don't mind which. But it should be declarative, rather than a part of the shorten_branches algorithm. Also... If so, that sounds like a useful thing to model, but I think it should be modelled directly, rather than as a modification to the shorten_branches algorithm. The shorten_branches algorithm can't give the correct result without taking these things into account. Also, instruction lengths are needed to know the requirements. So, do you want me to run a scheduling pass in each iteration of shorten_branches? ...it looks like arc_reorg already runs shorten_branches in a loop. Is that right? So in a sense we already have iteration involving shorten_branches and the backend. I realise we want to cut down the number of iterations where possible, but at the same time, it seems like you're trying to do conditional execution conversion on the fly during shorten_branches, such that the rtl doesn't actually represent the instructions we output. And the performance problem you're seeing is that those behind-the-scenes changes are too confusing for the current shorten_branches code to handle well. E.g. sometimes you decide to delete branches that stay in the rtl stream as branches, and that got a length in a previous iteration. At some point, as you say, you hav have to commit to using a particular form (conditionally executed or not) in order to avoid cycles. But that seems like a conditional execution decision rather than a shorten_branches decision. (It'd also be good to represent it in the rtl if possible.) Richard
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 09:30 AM, Richard Sandiford wrote: Richard Biener richard.guent...@gmail.com writes: But that means that wide_int has to model a P-bit operation as a normal len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I think with a little thought i can add some special constructors and get rid of the mutating aspects of the interface. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize precision and that the bits precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) Richard