[PATCH] options: Make --help= to emit values post-overrided
Hi Segher, on 2020/8/7 下午10:42, Segher Boessenkool wrote: > Hi! > > On Fri, Aug 07, 2020 at 10:44:10AM +0800, Kewen.Lin wrote: >>> I think this makes a lot of sense. >>> btw, not sure whether it's a good idea to move target_option_override_hook call into print_specific_help and use one function local static variable to control it's called once for all kinds of help dumping (possible combination), then can remove the calls in function common_handle_option. >>> >>> I cannot easily imagine what that will look like... it could easily be >>> worse than what you have here (callbacks aren't so nice, but there are >>> worse things). >>> >> >> I attached opts_alt2.diff to be more specific for this, both alt1 and alt2 >> follow the existing callback scheme, alt2 aims to avoid possible multiple >> times target_option_override_hook calls when we have several --help= or >> similar, but I guess alt1 is also fine since the hook should be allowed to >> be called more than once. > > It could take quadratic time in alt1... Mostly a theoretical problem I > think, but still. > > All options look fine to me, but you need someone else to approve this ;-) Yeah, CC Joseph. Thanks! > > One thing: > >> @@ -1664,6 +1665,14 @@ print_specific_help (unsigned int include_flags, >> } >> } >> >> + static bool call_override_once_p = false; >> + if (!call_override_once_p) >> +{ >> + gcc_assert (target_option_override_hook); >> + target_option_override_hook (); >> + call_override_once_p = true; >> +} > > That assert is pretty random, nothing else using the hook assert it > isn't zero; it immediately and always calls the hook right away, so you > will get a nice ICE with backtrace if it is zero *anyway*. > Good point! Removed it. Bootstrapped/regtested on powerpc64le-linux-gnu P8. BR, Kewen *opts_alt1.diff* gcc/ChangeLog: * opts-global.c (decode_options): Adjust call to print_help. * opts.c (print_help): Add one function point argument target_option_override_hook and call it before print_specific_help. * opts.h (print_help): Add one more argument to function declare. *opts_alt2.diff* gcc/ChangeLog: * opts-global.c (decode_options): Adjust call to print_help. * opts.c (print_specific_help): Add one function point argument target_option_override_hook and call it once. (print_help): Add one function point argument target_option_override_hook and pass it in print_specific_help call. (common_handle_option): Adjust calls to print_specific_help, refactor calls to target_option_override_hook. * opts.h (print_help): Add one more argument to function declare. diff --git a/gcc/opts-global.c b/gcc/opts-global.c index b1a8429dc3c..ec960c87c9a 100644 --- a/gcc/opts-global.c +++ b/gcc/opts-global.c @@ -328,7 +328,7 @@ decode_options (struct gcc_options *opts, struct gcc_options *opts_set, const char *arg; FOR_EACH_VEC_ELT (help_option_arguments, i, arg) -print_help (opts, lang_mask, arg); +print_help (opts, lang_mask, arg, target_option_override_hook); } /* Hold command-line options associated with stack limitation. */ diff --git a/gcc/opts.c b/gcc/opts.c index 499eb900643..a83b2f837dd 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -2017,7 +2017,8 @@ check_alignment_argument (location_t loc, const char *flag, const char *name) void print_help (struct gcc_options *opts, unsigned int lang_mask, - const char *help_option_argument) + const char *help_option_argument, + void (*target_option_override_hook) (void)) { const char *a = help_option_argument; unsigned int include_flags = 0; @@ -2146,8 +2147,10 @@ print_help (struct gcc_options *opts, unsigned int lang_mask, exclude_flags |= CL_PARAMS; if (include_flags) -print_specific_help (include_flags, exclude_flags, 0, opts, -lang_mask); +{ + target_option_override_hook (); + print_specific_help (include_flags, exclude_flags, 0, opts, lang_mask); +} } /* Handle target- and language-independent options. Return zero to diff --git a/gcc/opts.h b/gcc/opts.h index 8f594b46e33..9a837305af1 100644 --- a/gcc/opts.h +++ b/gcc/opts.h @@ -419,8 +419,9 @@ extern bool target_handle_option (struct gcc_options *opts, extern void finish_options (struct gcc_options *opts, struct gcc_options *opts_set, location_t loc); -extern void print_help (struct gcc_options *opts, unsigned int lang_mask, const - char *help_option_argument); +extern void print_help (struct gcc_options *opts, unsigned int lang_mask, + const char *help_option_argument, + void (*target_option_override_hook) (void)); extern void default_options_optimization (struct gcc_options *opts, struct
Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
Hi Bin, Thanks for the review!! on 2020/8/8 下午4:01, Bin.Cheng wrote: > Hi Kewen, > Sorry for the late reply. > The patch's most important change is below cost computation: > >> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct >> iv_cand *cand) >> cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); >> cost = cost_step + adjust_setup_cost (data, cost_base.cost); >> >> + /* Consider additional step updates during unrolling. */ >> + if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p) >> +cost += (data->current_loop->estimated_unroll - 1) * cost_step; > This is a bit strange, to me the add instructions are additional > computation caused by unrolling+addressing_mode, rather than a native > part in candidate itself. Specifically, an additional cost is needed > if candidates (without reg_offset_p) are chosen for the address type > group/uses. Good point, ideally it should be one additional cost for each cand set, when we select one cand for one group, we need to check this pair need more (estimated_unroll - 1) step costs, we probably need to care about this during remove/replace etc. IIUC the current IVOPTs cost framework doesn't support this and it could increase the selection complexity and time. I hesitated to do it and put it to cand step cost initially instead. I was thinking those candidates with reg_offset_p should be only used for those reg_offset_p groups in most cases (very limited) meanwhile the others are simply scaled up like before. But indeed this can cover some similar cases like one cand is only used for the compare type group which is for loop closing, then it doesn't need more step costs for unrolling. Do you prefer me to improve the current cost framework? >> + >> /* Prefer the original ivs unless we may gain something by replacing it. >> The reason is to make debugging simpler; so this is not relevant for >> artificial ivs created by other optimization passes. */ >> > >> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data, >> return; >> } >> >> + /* Since we priced more on non reg_offset IV cand step cost, we should >> scale >> + up the appropriate IV group costs. Simply consider USE_COMPARE at the >> + loop exit, FIXME if multiple exits supported or no loop exit >> comparisons >> + matter. */ >> + if (data->consider_reg_offset_for_unroll_p >> + && group->vuses[0]->type != USE_COMPARE) >> +cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll; > Not quite follow here, giving "pricing more on on-reg_offset IV cand" > doesn't make much sense to me. Also why generic type uses are not > skipped? We want to model the cost required for address computation, > however, for generic type uses there is no way to save the computation > in "address expression". Once unrolled, the computation is always > there? > The main intention is to scale up the group/cand cost for unrolling since we have scaled up the step costs. The assumption is that the original costing (without this patch) can be viewed as either for all unrolled iterations or just one single iteration. Since IVOPTs doesn't support fractional costing, I interpreted it as single iterations, to emulate unrolling scenario based on the previous step cost scaling, we need to scale up the cost for all computation. In most cases, the compare type use is for loop closing, there is still only one computation even unrolling happens, so I excluded it here. As "FIXME", if we find some cases are off, we can further restrict it to those USE_COMPARE uses which is exactly for loop closing. > And what's the impact on targets supporting [base + index + offset] > addressing mode? Good question, I didn't notice it since power doesn't support it. I noticed the comments of function addr_offset_valid_p only mentioning [base + offset], I guess it excludes [base + index + offset]? But I guess the address-based IV can work for this mode? > > Given the patch is not likely to harm because rtl loop unrolling is > (or was?) by default disabled, so I am OK once the above comments are > addressed. > Yes, it needs explicit unrolling options, excepting for some targets wants to enable it for some cases with specific loop_unroll_adjust checks. > I wonder if it's possible to get 10% of (all which should be unrolled) > loops unrolled (conservatively) on gimple and enable it by default at > O3, rather than teaching ivopts to model a future pass which not > likely be used outside of benchmarks? > Yeah, it would be nice if the unrolling happen before IVOPTs and won't have any future unrollings to get it off. PR[1] seems to have some discussion on gimple unrolling. Richi suggested driven-by-need gimple unrolling in the previous discussion[2] on the RFC of this. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537645.html BR, Kewen
Re: [patch] multi-range implementation for value_range (irange)
On 8/5/20 4:27 PM, Gerald Pfeifer wrote: Hi Aldy, On Fri, 31 Jul 2020, Aldy Hernandez via Gcc-patches wrote: Jeff approved this patch off-list. I will re-run tests once again and commit by Monday. I believe this has broken the bootstrap with clang (specifically FreeBSD clang version 10.0.0): In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/c/gimple-parser.c:44: In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/tree-vrp.h:23: /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:347:1: error: static declaration of 'gt_ggc_mx' follows non-static declaration gt_ggc_mx (int_range *x) /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:150:37: note: previous declaration is here template friend void gt_ggc_mx (int_range *); ^ /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:358:1: error: static declaration of 'gt_pch_nx' follows non-static declaration gt_pch_nx (int_range *x) /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:151:37: note: previous declaration is here template friend void gt_pch_nx (int_range *); ^ My daily tester started to 20200803T1640, so the root cause of this must have entered GCC trunk between Sunday 16:40 UTC and Monday 16:40 UTC. Gerald Hey. This one is still broken. Can you please Aldy take a look? It's good that GCC code base can be compiled with clang ;) Martin
Re: [PATCH] rs6000: MMA built-ins reject typedefs of MMA types
On 8/8/20 11:59 AM, Peter Bergner wrote: > Testing was clean, so I pushed this to trunk. Will wait before > backporting. Thanks! Scanning through Bill Seurer's testsuite runs for POWER, I see no fallout, so I have pushed this to the GCC 10 branch. Peter
Re: [PATCH] rs6000: ICE when using an MMA type as a function param
On 8/9/20 10:03 PM, Peter Bergner wrote: > gcc/ > PR target/96506 > * config/rs6000/rs6000-call.c (rs6000_promote_function_mode): > (rs6000_function_arg): Oops, missed the ChangeLog entries: gcc/ PR target/96506 * config/rs6000/rs6000-call.c (rs6000_promote_function_mode): Disallow MMA types as return values. (rs6000_function_arg): Disallow MMA types as function arguments. Peter
Re: RFC: Monitoring old PRs, new dg directives
On 8/5/20 12:22 AM, Marek Polacek wrote: On Thu, Jul 30, 2020 at 11:08:03AM +0200, Martin Liška wrote: Hello. I support the initiative! What would be nice to add leading 'PR component/12345' to a git commit so that these test additions are linked to bugzilla issues. Thanks! Yes, it should be clear which test tests a PR that has the monitored keyword. That may get lost when adding a lot of tests in one commit, but can always be clarified in a comment. Note that contrib/mklog.py can correctly extract leading 'PR component/12345' lines from test-cases and add them to a ChangeLog entry. Martin Or just grep 'PR component/12345' in the testsuite; new tests should have this as their first line. Marek
[PATCH] rs6000: ICE when using an MMA type as a function param
PR96506 shows a problem where we ICE on illegal usage, namely using MMA types for function arguments and return values. The solution is to flag these illegal usages as errors early, before we ICE. The patch below is currently bootstrapping and regtesting. Ok for trunk once that comes back clean? Ok for GCC 10 after some bake in? Peter gcc/ PR target/96506 * config/rs6000/rs6000-call.c (rs6000_promote_function_mode): (rs6000_function_arg): gcc/testsuite/ PR target/96506 * gcc.target/powerpc/pr96506.c: New test. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 189497efb45..e4ed88cd2f8 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -6444,8 +6444,23 @@ machine_mode rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, machine_mode mode, int *punsignedp ATTRIBUTE_UNUSED, - const_tree, int) + const_tree, int for_return) { + static struct function *fn = NULL; + + /* We do not allow MMA types being used as return values. Only report + the invalid return value usage the first time we encounter it. */ + if (for_return + && fn != cfun + && (mode == POImode || mode == PXImode)) +{ + fn = cfun; + if (TYPE_CANONICAL (type) != NULL_TREE) + type = TYPE_CANONICAL (type); + error ("invalid use of MMA type %qs as a function return value", +IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type; +} + PROMOTE_MODE (mode, *punsignedp, type); return mode; @@ -7396,6 +7411,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info ) machine_mode elt_mode; int n_elts; + /* We do not allow MMA types being used as function arguments. */ + if (mode == POImode || mode == PXImode) +{ + if (TYPE_CANONICAL (type) != NULL_TREE) + type = TYPE_CANONICAL (type); + error ("invalid use of MMA operand of type %qs as a function parameter", +IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type; + return NULL_RTX; +} + /* Return a marker to indicate whether CR1 needs to set or clear the bit that V.4 uses to say fp args were passed in registers. Assume that we don't need the marker for software floating point, diff --git a/gcc/testsuite/gcc.target/powerpc/pr96506.c b/gcc/testsuite/gcc.target/powerpc/pr96506.c new file mode 100644 index 000..4ed31bc55fa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr96506.c @@ -0,0 +1,61 @@ +/* PR target/96506 */ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -w" } */ + +typedef __vector_pair vpair_t; +typedef __vector_quad vquad_t; + +/* Verify we flag errors on the following. */ + +void +foo0 (void) +{ + __vector_pair v; + bar0 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. as a function parameter" } */ +} + +void +foo1 (void) +{ + vpair_t v; + bar1 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. as a function parameter" } */ +} + +void +foo2 (void) +{ + __vector_quad v; + bar2 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. as a function parameter" } */ +} + +void +foo3 (void) +{ + vquad_t v; + bar3 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. as a function parameter" } */ +} + +__vector_pair +foo4 (__vector_pair *src) /* { dg-error "invalid use of MMA type .__vector_pair. as a function return value" } */ +{ + return *src; +} + +vpair_t +foo5 (vpair_t *src) /* { dg-error "invalid use of MMA type .__vector_pair. as a function return value" } */ +{ + return *src; +} + +__vector_quad +foo6 (__vector_quad *src) /* { dg-error "invalid use of MMA type .__vector_quad. as a function return value" } */ +{ + return *src; +} + +vquad_t +foo7 (vquad_t *src) /* { dg-error "invalid use of MMA type .__vector_quad. as a function return value" } */ +{ + return *src; +}
Re: Fix remove_predictions_associated_with_edge
On 8/2/20 3:02 PM, Jan Hubicka wrote: Hi, remove_predictions_associated_with_edge currently calls filter_predicitons passing it equal_edge_p. Becase filter_predictions removes all edges where filter returns false, the function does exact oposite. Fixed thus. Bootstrapped/regtested x86_64-linux. Thanks for the patch. It seems to me obvious, please install it. Martin gcc/ChangeLog: 2020-08-02 Jan Hubicka * predict.c (filter_predictions): Document semantics of filter. (equal_edge_p): Rename to ... (not_equal_edge_p): ... this; reverse semantics. (remove_predictions_associated_with_edge): Fix. diff --git a/gcc/predict.c b/gcc/predict.c index 0a317a7a4ac..2164a06e083 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -595,10 +595,11 @@ gimple_predict_edge (edge e, enum br_predictor predictor, int probability) } } -/* Filter edge predictions PREDS by a function FILTER. DATA are passed - to the filter function. */ +/* Filter edge predictions PREDS by a function FILTER: if FILTER return false + the prediction is removed. + DATA are passed to the filter function. */ -void +static void filter_predictions (edge_prediction **preds, bool (*filter) (edge_prediction *, void *), void *data) { @@ -627,10 +628,10 @@ filter_predictions (edge_prediction **preds, /* Filter function predicate that returns true for a edge predicate P if its edge is equal to DATA. */ -bool -equal_edge_p (edge_prediction *p, void *data) +static bool +not_equal_edge_p (edge_prediction *p, void *data) { - return p->ep_edge == (edge)data; + return p->ep_edge != (edge)data; } /* Remove all predictions on given basic block that are attached @@ -642,7 +643,7 @@ remove_predictions_associated_with_edge (edge e) return; edge_prediction **preds = bb_predictions->get (e->src); - filter_predictions (preds, equal_edge_p, e); + filter_predictions (preds, not_equal_edge_p, e); } /* Clears the list of predictions stored for BB. */
Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]
On Fri, Aug 7, 2020 at 11:02 PM Kirill Yukhin wrote: > > Hello, > > On 05 авг 09:29, Hongtao Liu wrote: > > On Tue, Aug 4, 2020 at 6:28 PM Kirill Yukhin > > wrote: > > > > > > On 04 авг 13:26, Kirill Yukhin wrote: > > > > Could you please clarify, how your patch relared to [1]? > > > > I see from the bug that it describes perf issue w.r.t. scalar > > > > operations. > > > > > Sorry for Typo, it's pr96243. > > Please, don't forget to update ChangeLog entry. > Yes. > It's a pity that we don't support vector comparisons in CSE, > hope will fix in future. > > Patch LGTM. > > -- > K Thanks. -- BR, Hongtao
Simplify X * C1 == C2 with wrapping overflow
Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten as X == C2 * inv(C1) when overflow wraps. mod_inv should probably be updated to better match the other wide_int functions, but that's a separate issue. Bootstrap+regtest on x86_64-pc-linux-gnu. 2020-08-10 Marc Glisse PR tree-optimization/95433 * match.pd (X * C1 == C2): Handle wrapping overflow. * expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv. (mod_inv): Move... * wide-int.cc (mod_inv): ... here. * wide-int.h (mod_inv): Declare it. * gcc.dg/tree-ssa/pr95433-2.c: New file. -- Marc Glissediff --git a/gcc/expr.c b/gcc/expr.c index a150fa0d3b5..ebf0c9e4797 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11859,38 +11859,6 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl) return init; } -/* Compute the modular multiplicative inverse of A modulo M - using extended Euclid's algorithm. Assumes A and M are coprime. */ -static wide_int -mod_inv (const wide_int , const wide_int ) -{ - /* Verify the assumption. */ - gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1)); - - unsigned int p = a.get_precision () + 1; - gcc_checking_assert (b.get_precision () + 1 == p); - wide_int c = wide_int::from (a, p, UNSIGNED); - wide_int d = wide_int::from (b, p, UNSIGNED); - wide_int x0 = wide_int::from (0, p, UNSIGNED); - wide_int x1 = wide_int::from (1, p, UNSIGNED); - - if (wi::eq_p (b, 1)) -return wide_int::from (1, p, UNSIGNED); - - while (wi::gt_p (c, 1, UNSIGNED)) -{ - wide_int t = d; - wide_int q = wi::divmod_trunc (c, d, UNSIGNED, ); - c = t; - wide_int s = x0; - x0 = wi::sub (x1, wi::mul (q, x0)); - x1 = s; -} - if (wi::lt_p (x1, 0, SIGNED)) -x1 += d; - return x1; -} - /* Optimize x % C1 == C2 for signed modulo if C1 is a power of two and C2 is non-zero and C3 ((1<<(prec-1)) | (C1 - 1)): for C2 > 0 to x & C3 == C2 @@ -12101,7 +12069,7 @@ maybe_optimize_mod_cmp (enum tree_code code, tree *arg0, tree *arg1) w = wi::lrshift (w, shift); wide_int a = wide_int::from (w, prec + 1, UNSIGNED); wide_int b = wi::shifted_mask (prec, 1, false, prec + 1); - wide_int m = wide_int::from (mod_inv (a, b), prec, UNSIGNED); + wide_int m = wide_int::from (wi::mod_inv (a, b), prec, UNSIGNED); tree c3 = wide_int_to_tree (type, m); tree c5 = NULL_TREE; wide_int d, e; diff --git a/gcc/match.pd b/gcc/match.pd index 7e5c5a6eae6..c3b88168ac4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3828,7 +3828,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cmp @0 @2)) /* For integral types with undefined overflow fold - x * C1 == C2 into x == C2 / C1 or false. */ + x * C1 == C2 into x == C2 / C1 or false. + If overflow wraps and C1 is odd, simplify to x == C2 / C1 in the ring + Z / 2^n Z. */ (for cmp (eq ne) (simplify (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2) @@ -3839,7 +3841,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1), TYPE_SIGN (TREE_TYPE (@0)), )) (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); }) - { constant_boolean_node (cmp == NE_EXPR, type); }) + { constant_boolean_node (cmp == NE_EXPR, type); })) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)) + && (wi::bit_and (wi::to_wide (@1), 1) == 1)) +(cmp @0 + { + tree itype = TREE_TYPE (@0); + int p = TYPE_PRECISION (itype); + wide_int m = wi::one (p + 1) << p; + wide_int a = wide_int::from (wi::to_wide (@1), p + 1, UNSIGNED); + wide_int i = wide_int::from (wi::mod_inv (a, m), +p, TYPE_SIGN (itype)); + wide_int_to_tree (itype, wi::mul (i, wi::to_wide (@2))); + }) /* Simplify comparison of something with itself. For IEEE floating-point, we can only do some of these simplifications. */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c new file mode 100644 index 000..c830a3d195f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fwrapv -fdump-tree-gimple" } */ + +typedef __INT32_TYPE__ int32_t; +typedef unsigned __INT32_TYPE__ uint32_t; + +int e(int32_t x){return 3*x==5;} +int f(int32_t x){return 3*x==-5;} +int g(int32_t x){return -3*x==5;} +int h(int32_t x){return 7*x==3;} +int i(uint32_t x){return 7*x==3;} + +/* { dg-final { scan-tree-dump-times "== 1431655767" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "== -1431655767" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "== 613566757" 2 "gimple" } } */ diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc index 03be0074816..f4d949c38a0 100644 --- a/gcc/wide-int.cc +++ b/gcc/wide-int.cc @@ -2223,6 +2223,39 @@ wi::round_up_for_mask (const wide_int , const wide_int ) return (val | tmp) & -tmp; } +/* Compute the modular
Re: [PATCH] middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost.
On August 9, 2020 12:30:32 PM GMT+02:00, Roger Sayle wrote: > >This patch fixes a subtle bug in the depths of GCC's synth_mult, >where the middle-end queries whether (how well) the target supports >widening and highpart multiplications by calling targetm.rtx_costs. >The code in init_expmed and init_expmed_one_mode iterates over various >RTL patterns querying the cost of each. To avoid generating & garbage >collecting too much junk, it reuses the same RTL over and over, but >adjusting the modes between each call. > >Alas this reuse of state is a little fragile, and at some point a >change to init_expmed_one_conv has resulted in the state (mode of >a register) being changed, but not reset before being used again. > >The fallout is that GCC currently queries the backend for the cost >of non-sense RTL such as: > >(mult:HI (zero_extend:HI (reg:TI 82)) >(zero_extend:HI (reg:TI 82))) > >and > >(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82)) >(zero_extend:HI (reg:TI 82))) >(const_int 8 [0x8])) > >The fix is to set the mode of the register back to its assumed >state, as (reg:QI 82) in the above patterns makes much more sense. > >Using the old software engineering/defensive programming maxim of >"why fix a bug just once, if it can be fixed in multiple places", >this patch both restores the original value in init_expmed_one_conv, >and also sets it to the expected value in init_expmed_one_mode. >This should hopefully signal the need to be careful of invariants for >anyone modifying this code in future. > >Alas things are rarely simple... Fixing this obviously incorrect >logic causes a failure of gcc.target/i386/pr71321.c that tests for >a particular expansion from synth_mult. The issue here is that this >test is checking for a specific multiplication expansion, when it >should really be checking that we don't generate the inefficient >"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as >reported in the PR target/71321. Now that we use correct costs, >GCC uses a multiply instruction matching icc, LLVM and GCC prior >to v4.8. I've even microbenchmarked this function (over several >minutes) with (disappointingly) no difference in timings. Three >dependent leas has 3-cycle latency, exactly the same as a widening >byte multiply (on Haswell), so the shorter code splits the tie. >[I have a follow-up patch that may improve things further]. > >Before: >movzbl %dil, %eax >leal(%rax,%rax,4), %edx >leal(%rax,%rdx,8), %edx >leal(%rdx,%rdx,4), %edx >shrw$11, %dx >leal(%rdx,%rdx,4), %eax >... > >After: >movl$-51, %edx >movl%edx, %eax >mulb%dil >movl%eax, %edx >shrw$11, %dx >leal(%rdx,%rdx,4), %eax >... > > >This patch has been tested on x86_64-pc-linux-gnu with a "make >bootstrap" and "make -k check" with no new failures. >Ok for mainline? OK. Richard. > >2020-08-09 Roger Sayle > >gcc/ChangeLog > * expmed.c (init_expmed_one_conv): Restore all->reg's mode. > (init_expmed_one_mode): Set all->reg to desired mode. > >gcc/testsuite/ChangeLog > PR target/71321 > * gcc.target/i386/pr71321.c: Check that the code doesn't use > the 4B zero displacement lea, not that it uses lea. > >Thanks in advance, >Roger >-- >Roger Sayle >NextMove Software >Cambridge, UK
[PATCH] middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost.
This patch fixes a subtle bug in the depths of GCC's synth_mult, where the middle-end queries whether (how well) the target supports widening and highpart multiplications by calling targetm.rtx_costs. The code in init_expmed and init_expmed_one_mode iterates over various RTL patterns querying the cost of each. To avoid generating & garbage collecting too much junk, it reuses the same RTL over and over, but adjusting the modes between each call. Alas this reuse of state is a little fragile, and at some point a change to init_expmed_one_conv has resulted in the state (mode of a register) being changed, but not reset before being used again. The fallout is that GCC currently queries the backend for the cost of non-sense RTL such as: (mult:HI (zero_extend:HI (reg:TI 82)) (zero_extend:HI (reg:TI 82))) and (lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82)) (zero_extend:HI (reg:TI 82))) (const_int 8 [0x8])) The fix is to set the mode of the register back to its assumed state, as (reg:QI 82) in the above patterns makes much more sense. Using the old software engineering/defensive programming maxim of "why fix a bug just once, if it can be fixed in multiple places", this patch both restores the original value in init_expmed_one_conv, and also sets it to the expected value in init_expmed_one_mode. This should hopefully signal the need to be careful of invariants for anyone modifying this code in future. Alas things are rarely simple... Fixing this obviously incorrect logic causes a failure of gcc.target/i386/pr71321.c that tests for a particular expansion from synth_mult. The issue here is that this test is checking for a specific multiplication expansion, when it should really be checking that we don't generate the inefficient "leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as reported in the PR target/71321. Now that we use correct costs, GCC uses a multiply instruction matching icc, LLVM and GCC prior to v4.8. I've even microbenchmarked this function (over several minutes) with (disappointingly) no difference in timings. Three dependent leas has 3-cycle latency, exactly the same as a widening byte multiply (on Haswell), so the shorter code splits the tie. [I have a follow-up patch that may improve things further]. Before: movzbl %dil, %eax leal(%rax,%rax,4), %edx leal(%rax,%rdx,8), %edx leal(%rdx,%rdx,4), %edx shrw$11, %dx leal(%rdx,%rdx,4), %eax ... After: movl$-51, %edx movl%edx, %eax mulb%dil movl%eax, %edx shrw$11, %dx leal(%rdx,%rdx,4), %eax ... This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2020-08-09 Roger Sayle gcc/ChangeLog * expmed.c (init_expmed_one_conv): Restore all->reg's mode. (init_expmed_one_mode): Set all->reg to desired mode. gcc/testsuite/ChangeLog PR target/71321 * gcc.target/i386/pr71321.c: Check that the code doesn't use the 4B zero displacement lea, not that it uses lea. Thanks in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/expmed.c b/gcc/expmed.c index 3d2d234..d34f0fb 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -155,6 +155,8 @@ init_expmed_one_conv (struct init_expmed_rtl *all, scalar_int_mode to_mode, PUT_MODE (all->reg, from_mode); set_convert_cost (to_mode, from_mode, speed, set_src_cost (which, to_mode, speed)); + /* Restore all->reg's mode. */ + PUT_MODE (all->reg, to_mode); } static void @@ -229,6 +231,7 @@ init_expmed_one_mode (struct init_expmed_rtl *all, if (GET_MODE_CLASS (int_mode_to) == MODE_INT && GET_MODE_WIDER_MODE (int_mode_to).exists (_mode)) { + PUT_MODE (all->reg, mode); PUT_MODE (all->zext, wider_mode); PUT_MODE (all->wide_mult, wider_mode); PUT_MODE (all->wide_lshr, wider_mode); diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c b/gcc/testsuite/gcc.target/i386/pr71321.c index 86ad812..24d144b 100644 --- a/gcc/testsuite/gcc.target/i386/pr71321.c +++ b/gcc/testsuite/gcc.target/i386/pr71321.c @@ -12,5 +12,4 @@ unsigned cvt_to_2digit_ascii(uint8_t i) { return cvt_to_2digit(i, 10) + 0x0a3030; } -/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 } } */ -/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 } } */ +/* { dg-final { scan-assembler-not "lea.*0" } } */
Re: [Patch, fortran] PR96102 - ICE in check_host_association, at fortran/resolve.c:5994
Hi Paul, Dominique has kindly pointed out that the error message in the patch does not correspond to the errors in the testcase. This came about because the submitted patch was an earlier version of that on my tree. Please find attached the correct version of the patch. The principle is the same but the error is different... if you see what I mean :-) OK for master. Best regards Thomas
Re: [Patch, fortran] PR96312 - [10/11 Regression] Reallocation on assignment uses undefined variables
Hi Paul, Regtests OK on FC31/x86_64 - OK for master? OK. This bug is also exposed by a matmul optimization that I introduced quite a long time ago, so a backport would also be good (I can do that if you prefer). Best regards Thomas
[Patch, fortran] PR96312 - [10/11 Regression] Reallocation on assignment uses undefined variables
Hi All, trans-expr.c(fcncall_realloc_result) unconditionally compared the shapes of the function result and the lhs. This is clearly wrong when the lhs is not allocated since the bounds could be used uninitialized as found by the reporter. The patch does the obvious thing by checking the allocation status before doing the comparison. Regtests OK on FC31/x86_64 - OK for master? Paul This patch fixes PR96312. Cures a used uninitialized warning. 2020-08-9 Paul Thomas gcc/fortran PR fortran/96312 * trans-expr.c (fcncall_realloc_result): Only compare shapes if lhs was allocated.. gcc/testsuite/ PR fortran/96312 * gfortran.dg/pr96312.f90: New test. diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index b7c568e90e6..36ff9b5cbc6 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -9936,6 +9936,8 @@ fcncall_realloc_result (gfc_se *se, int rank) tree tmp; tree offset; tree zero_cond; + tree not_same_shape; + stmtblock_t shape_block; int n; /* Use the allocation done by the library. Substitute the lhs @@ -9965,7 +9967,11 @@ fcncall_realloc_result (gfc_se *se, int rank) tmp = gfc_conv_descriptor_data_get (res_desc); gfc_conv_descriptor_data_set (>post, desc, tmp); - /* Check that the shapes are the same between lhs and expression. */ + /* Check that the shapes are the same between lhs and expression. + The evaluation of the shape is done in 'shape_block' to avoid + unitialized warnings from the lhs bounds. */ + not_same_shape = boolean_false_node; + gfc_start_block (_block); for (n = 0 ; n < rank; n++) { tree tmp1; @@ -9982,15 +9988,24 @@ fcncall_realloc_result (gfc_se *se, int rank) tmp = fold_build2_loc (input_location, NE_EXPR, logical_type_node, tmp, gfc_index_zero_node); - tmp = gfc_evaluate_now (tmp, >post); - zero_cond = fold_build2_loc (input_location, TRUTH_OR_EXPR, - logical_type_node, tmp, - zero_cond); + tmp = gfc_evaluate_now (tmp, _block); + if (n == 0) + not_same_shape = tmp; + else + not_same_shape = fold_build2_loc (input_location, TRUTH_OR_EXPR, + logical_type_node, tmp, + not_same_shape); } /* 'zero_cond' being true is equal to lhs not being allocated or the shapes being different. */ - zero_cond = gfc_evaluate_now (zero_cond, >post); + tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR, logical_type_node, + zero_cond, not_same_shape); + gfc_add_modify (_block, zero_cond, tmp); + tmp = gfc_finish_block (_block); + tmp = build3_v (COND_EXPR, zero_cond, + build_empty_stmt (input_location), tmp); + gfc_add_expr_to_block (>post, tmp); /* Now reset the bounds returned from the function call to bounds based on the lhs lbounds, except where the lhs is not allocated or the shapes ! { dg-do compile } ! { dg-options "-O1 -Wall" } ! ! PR fortran/96312. The line with the call to 'matmul' gave the warning ! ‘tmp.dim[0].lbound’ is used uninitialized in this function ! ! Contributed by Thomas Koenig ! module moda contains PURE SUBROUTINE funca(arr, sz) REAL, ALLOCATABLE, DIMENSION(:, :), INTENT(OUT) :: arr integer, intent(in) :: sz allocate(arr(sz, sz)) arr(:, :) = 0. END SUBROUTINE end module module modc use moda, only: funca contains PURE SUBROUTINE funcb(oarr) REAL, DIMENSION(:), INTENT(OUT):: oarr REAL, ALLOCATABLE, DIMENSION(:, :) :: arr real, allocatable, dimension(:) :: tmp CALL funca(arr, ubound(oarr, 1)) tmp = matmul(transpose(arr),oarr) oarr = tmp*1. END SUBROUTINE funcb end module
[pushed] testsuite, Darwin: XFAIL runs for two timode conversion tests.
Hi, X86 Darwin fails two TI mode conversions at present, because (to work around PR80556) we insert libSystem ahead of libgcc. The libSystem implementation has a similar bug to one that was fixed for GCC. We need to fix 80556 properly, and then this issue will go away - we will be able to use the libgcc impl as intended. XFAIL the run for now, to reduce testsuite noise. tested on x86_64 darwin and linux applied to master, thanks Iain gcc/testsuite/ChangeLog: * gcc.dg/torture/fp-int-convert-timode-3.c: XFAIL run. * gcc.dg/torture/fp-int-convert-timode-4.c: Likewise. --- gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c | 1 + gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c | 1 + 2 files changed, 2 insertions(+) diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c index 707d539335f..10702302bf8 100644 --- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c +++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c @@ -4,6 +4,7 @@ /* { dg-require-effective-target int128 } */ /* { dg-require-effective-target fenv } */ /* { dg-options "-frounding-math" } */ +/* { dg-xfail-run-if "see PR80556 c63" { x86_64-*-darwin* i?86-*-darwin* } { "*" } { "" } } */ #include #include diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c index 09600f90903..3facf32fb8b 100644 --- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c +++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c @@ -4,6 +4,7 @@ /* { dg-require-effective-target int128 } */ /* { dg-require-effective-target fenv } */ /* { dg-options "-frounding-math" } */ +/* { dg-xfail-run-if "see PR80556 c63" { x86_64-*-darwin* i?86-*-darwin* } { "*" } { "" } } */ #include #include -- 2.24.1