Re: [PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)
On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote: > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Ok. Thanks. > >One thing I haven't done anything about yet is that there is > >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized > >".POPCOUNT" 1 > >before/after this patch with -m32/-march=skylake-avx512. That is > >because > >the popcountll effective target tests that we don't emit a call for > >__builtin_popcountll, which we don't on ia32 skylake-avx512, but > >direct_internal_fn_supported_p isn't true - that is because we expand > >the > >double word popcount using 2 word popcounts + addition. Shall the > >match.pd > >case handle that case too by allowing the optimization even if there > >is a > >type with half precision for which direct_internal_fn_supported_p? > > You mean emitting a single builtin call > Or an add of two ifns? I meant to do in the match.pd condition what expand_unop will do, i.e. - && direct_internal_fn_supported_p (IFN_POPCOUNT, type, - OPTIMIZE_FOR_BOTH)) + && (direct_internal_fn_supported_p (IFN_POPCOUNT, type, + OPTIMIZE_FOR_BOTH) + /* expand_unop can handle double-word popcount using + two word popcounts and addition. */ + || (TREE_CODE (type) == INTEGRAL_TYPE + && TYPE_PRECISION (type) == 2 * BITS_PER_WORD + && (optab_handler (popcount_optab, word_mode) + != CODE_FOR_nothing or so. Jakub
Re: [PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)
On December 31, 2019 12:00:56 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >This patch fixes various issues in the popcount{64,32}c matcher. > >The first issue is that it blindly uses tree_to_uhwi on all the >INTEGER_CST >operands. That is fine for those where we know their type, after the >prec <= 64 && TYPE_UNSIGNED (type) >verification, but shift counts can have different types and so we need >to >handle e.g. invalid shifts by negative amounts. > >Another issue is that the transformation is I believe only valid for >16, 32 and 64-bit precision, e.g. if it would be done in 24-bit or >62-bit, >it wouldn't be a popcount. For 8-bit, it would likely not match due to >>> >0, etc. > >Yet another issue is that the >> shift computations could very well >shift >by negative amounts e.g. for 128-bit precision, invoking UB in the >compiler >until we actually check the precision later. And, I think we want to >use >HOST_WIDE_INT_UC macro instead of hardcoding ULL suffixes. > >The formatting didn't match the match.pd formatting style either. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. >One thing I haven't done anything about yet is that there is >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized >".POPCOUNT" 1 >before/after this patch with -m32/-march=skylake-avx512. That is >because >the popcountll effective target tests that we don't emit a call for >__builtin_popcountll, which we don't on ia32 skylake-avx512, but >direct_internal_fn_supported_p isn't true - that is because we expand >the >double word popcount using 2 word popcounts + addition. Shall the >match.pd >case handle that case too by allowing the optimization even if there >is a >type with half precision for which direct_internal_fn_supported_p? You mean emitting a single builtin call Or an add of two ifns? >2019-12-31 Jakub Jelinek > > PR tree-optimization/93098 > * match.pd (popcount): For shift amounts, use integer_onep > or wi::to_widest () == cst instead of tree_to_uhwi () == cst > tests. Make sure that precision is power of two larger than or equal > to 16. Ensure shift is never negative. Use HOST_WIDE_INT_UC macro > instead of ULL suffixed constants. Formatting fixes. > > * gcc.c-torture/compile/pr93098.c: New test. > >--- gcc/match.pd.jj2019-12-09 11:12:48.351010794 +0100 >+++ gcc/match.pd 2019-12-30 09:52:05.423499909 +0100 >@@ -5786,43 +5786,50 @@ (define_operator_list COND_TERNARY > return (x * 0x01010101) >> 24; >} */ > (simplify >- (rshift >-(mult >- (bit_and >- (plus:c >-(rshift @8 INTEGER_CST@5) >-(plus:c@8 >- (bit_and @6 INTEGER_CST@7) >- (bit_and >-(rshift >- (minus@6 >-@0 >-(bit_and >- (rshift @0 INTEGER_CST@4) >- INTEGER_CST@11)) >-INTEGER_CST@10) >- INTEGER_CST@9))) >- INTEGER_CST@3) >- INTEGER_CST@2) >-INTEGER_CST@1) >+ (rshift >+ (mult >+ (bit_and >+(plus:c >+ (rshift @8 INTEGER_CST@5) >+ (plus:c@8 >+ (bit_and @6 INTEGER_CST@7) >+ (bit_and >+ (rshift >+(minus@6 @0 >+ (bit_and (rshift @0 INTEGER_CST@4) INTEGER_CST@11)) >+INTEGER_CST@10) >+ INTEGER_CST@9))) >+INTEGER_CST@3) >+ INTEGER_CST@2) >+ INTEGER_CST@1) > /* Check constants and optab. */ >- (with >- { >- unsigned prec = TYPE_PRECISION (type); >- int shift = 64 - prec; >- const unsigned HOST_WIDE_INT c1 = 0x0101010101010101ULL >> >shift, >- c2 = 0x0F0F0F0F0F0F0F0FULL >> shift, >- c3 = 0xULL >> shift, >- c4 = 0xULL >> shift; >- } >-(if (prec <= 64 && TYPE_UNSIGNED (type) && tree_to_uhwi (@4) == 1 >- && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4 >- && tree_to_uhwi (@1) == prec - 8 && tree_to_uhwi (@2) == c1 >- && tree_to_uhwi (@3) == c2 && tree_to_uhwi (@9) == c3 >- && tree_to_uhwi (@7) == c3 && tree_to_uhwi (@11) == c4 >- && direct_internal_fn_supported_p (IFN_POPCOUNT, type, >- OPTIMIZE_FOR_BOTH)) >-(convert (IFN_POPCOUNT:type @0) >+ (with { unsigned prec = TYPE_PRECISION (type); >+int shift = (64 - prec) & 63; >+unsigned HOST_WIDE_INT c1 >+ = HOST_WIDE_INT_UC (0x0101010101010101) >> shift; >+unsigned HOST_WIDE_INT c2 >+ = HOST_WIDE_INT_UC (0x0F0F0F0F0F0F0F0F) >> shift; >+unsigned HOST_WIDE_INT c3 >+ = HOST_WIDE_INT_UC (0x) >> shift; >+unsigned HOST_WIDE_INT c4 >+ = HOST_WIDE_INT_UC (0x) >> shift; >+ } >+ (if (prec >= 16 >+ && prec <= 64 >+ && pow2p_hwi (prec) >+ && TYPE_UNSIGNED (type) >+ && integer_onep (@4) >+ && wi::to_widest (@10) == 2 >+
[committed] Fix EXTRACT_LAST_REDUCTION segfault
This code: /* Make sure we don't accidentally use the old condition. */ cond_expr = NULL_TREE; was misplaced, since it triggered even when we needed to force the original unmodified cond_expr into a mask temporary and then invert it. Tested on aarch64-linux-gnu and applied as obvious. Richard 2019-12-31 Richard Sandiford gcc/ * tree-vect-stmts.c (vectorizable_condition): Only nullify cond_expr if we've created a new condition. Don't nullify it if we've decided to keep it and then invert the result. gcc/testsuite/ * gcc.dg/vect/vect-cond-reduc-6.c: New test. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c 2019-12-29 09:27:46.623861776 + +++ gcc/tree-vect-stmts.c 2019-12-31 15:31:37.22625 + @@ -10033,10 +10033,12 @@ vectorizable_condition (stmt_vec_info st if (new_code == ERROR_MARK) must_invert_cmp_result = true; else - cond_code = new_code; + { + cond_code = new_code; + /* Make sure we don't accidentally use the old condition. */ + cond_expr = NULL_TREE; + } } - /* Make sure we don't accidentally use the old condition. */ - cond_expr = NULL_TREE; std::swap (then_clause, else_clause); } Index: gcc/testsuite/gcc.dg/vect/vect-cond-reduc-6.c === --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-cond-reduc-6.c 2019-12-31 15:31:37.22625 + @@ -0,0 +1,10 @@ +/* { dg-do compile } */ + +int +f (int *y) +{ + int res = 0; + for (int i = 0; i < 100; ++i) +res = (y[i] & 1) == 0 && (y[i] < 10) ? res : 1; + return res; +}
[C++ PATCH] Fix up building of GCC 8 and earlier with GCC 9/10 (PR c/90677)
Hi! My PR90677 fix actually made building older GCCs with newer ones worse. The problem is that identifier_global_value used earlier returned either the type decl on success or NULL_TREE on failure and the caller in that case just defers handling it until later, and that is also what the C identifier_global_tag implementation does, but C++ uses lookup_qualified_name which returns error_mark_node if not found, rather than NULL_TREE and the c-format.c caller is unprepared to handle that and diagnoses error. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested by building GCC 8 with unmodified trunk (failed with In file included from ./config.h:8, from ../../gcc/gimple-streamer-in.c:22: ../../gcc/../include/ansidecl.h:169:64: error: ‘cgraph_node’ is not defined as a type 169 | # define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m))) |^ ... ) and finally building GCC 8 with patched trunk, which succeeded. Ok for trunk/9.3? 2019-12-31 Jakub Jelinek PR c/90677 * cp-objcp-common.c (identifier_global_tag): Return NULL_TREE if name has not been found, rather than error_mark_node. * c-c++-common/pr90677-2.c: New test. --- gcc/cp/cp-objcp-common.c.jj 2019-11-22 22:44:02.124600331 +0100 +++ gcc/cp/cp-objcp-common.c2019-12-30 18:29:01.386576418 +0100 @@ -354,8 +354,11 @@ identifier_global_value (tree name) tree identifier_global_tag (tree name) { - return lookup_qualified_name (global_namespace, name, /*prefer_type*/2, - /*complain*/false); + tree ret = lookup_qualified_name (global_namespace, name, /*prefer_type*/2, + /*complain*/false); + if (ret == error_mark_node) +return NULL_TREE; + return ret; } /* Returns true if NAME refers to a built-in function or function-like --- gcc/testsuite/c-c++-common/pr90677-2.c.jj 2019-12-30 18:30:50.318911231 +0100 +++ gcc/testsuite/c-c++-common/pr90677-2.c 2019-12-30 18:30:36.303125485 +0100 @@ -0,0 +1,8 @@ +/* PR c/90677 */ +/* { dg-do compile } */ +/* { dg-options "-W -Wall" } */ + +extern void foo (int, int, const char *, ...) + __attribute__ ((__format__ (__gcc_tdiag__, 3, 4))); +struct cgraph_node; +extern void bar (struct cgraph_node *); Jakub
[PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)
Hi! This patch fixes various issues in the popcount{64,32}c matcher. The first issue is that it blindly uses tree_to_uhwi on all the INTEGER_CST operands. That is fine for those where we know their type, after the prec <= 64 && TYPE_UNSIGNED (type) verification, but shift counts can have different types and so we need to handle e.g. invalid shifts by negative amounts. Another issue is that the transformation is I believe only valid for 16, 32 and 64-bit precision, e.g. if it would be done in 24-bit or 62-bit, it wouldn't be a popcount. For 8-bit, it would likely not match due to >> 0, etc. Yet another issue is that the >> shift computations could very well shift by negative amounts e.g. for 128-bit precision, invoking UB in the compiler until we actually check the precision later. And, I think we want to use HOST_WIDE_INT_UC macro instead of hardcoding ULL suffixes. The formatting didn't match the match.pd formatting style either. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? One thing I haven't done anything about yet is that there is FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized ".POPCOUNT" 1 before/after this patch with -m32/-march=skylake-avx512. That is because the popcountll effective target tests that we don't emit a call for __builtin_popcountll, which we don't on ia32 skylake-avx512, but direct_internal_fn_supported_p isn't true - that is because we expand the double word popcount using 2 word popcounts + addition. Shall the match.pd case handle that case too by allowing the optimization even if there is a type with half precision for which direct_internal_fn_supported_p? 2019-12-31 Jakub Jelinek PR tree-optimization/93098 * match.pd (popcount): For shift amounts, use integer_onep or wi::to_widest () == cst instead of tree_to_uhwi () == cst tests. Make sure that precision is power of two larger than or equal to 16. Ensure shift is never negative. Use HOST_WIDE_INT_UC macro instead of ULL suffixed constants. Formatting fixes. * gcc.c-torture/compile/pr93098.c: New test. --- gcc/match.pd.jj 2019-12-09 11:12:48.351010794 +0100 +++ gcc/match.pd2019-12-30 09:52:05.423499909 +0100 @@ -5786,43 +5786,50 @@ (define_operator_list COND_TERNARY return (x * 0x01010101) >> 24; } */ (simplify - (rshift -(mult - (bit_and - (plus:c - (rshift @8 INTEGER_CST@5) - (plus:c@8 - (bit_and @6 INTEGER_CST@7) - (bit_and - (rshift - (minus@6 - @0 - (bit_and - (rshift @0 INTEGER_CST@4) - INTEGER_CST@11)) - INTEGER_CST@10) - INTEGER_CST@9))) - INTEGER_CST@3) - INTEGER_CST@2) -INTEGER_CST@1) + (rshift + (mult + (bit_and +(plus:c + (rshift @8 INTEGER_CST@5) + (plus:c@8 + (bit_and @6 INTEGER_CST@7) + (bit_and +(rshift + (minus@6 @0 + (bit_and (rshift @0 INTEGER_CST@4) INTEGER_CST@11)) + INTEGER_CST@10) +INTEGER_CST@9))) +INTEGER_CST@3) + INTEGER_CST@2) + INTEGER_CST@1) /* Check constants and optab. */ - (with - { - unsigned prec = TYPE_PRECISION (type); - int shift = 64 - prec; - const unsigned HOST_WIDE_INT c1 = 0x0101010101010101ULL >> shift, - c2 = 0x0F0F0F0F0F0F0F0FULL >> shift, - c3 = 0xULL >> shift, - c4 = 0xULL >> shift; - } -(if (prec <= 64 && TYPE_UNSIGNED (type) && tree_to_uhwi (@4) == 1 - && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4 - && tree_to_uhwi (@1) == prec - 8 && tree_to_uhwi (@2) == c1 - && tree_to_uhwi (@3) == c2 && tree_to_uhwi (@9) == c3 - && tree_to_uhwi (@7) == c3 && tree_to_uhwi (@11) == c4 - && direct_internal_fn_supported_p (IFN_POPCOUNT, type, - OPTIMIZE_FOR_BOTH)) -(convert (IFN_POPCOUNT:type @0) + (with { unsigned prec = TYPE_PRECISION (type); + int shift = (64 - prec) & 63; + unsigned HOST_WIDE_INT c1 + = HOST_WIDE_INT_UC (0x0101010101010101) >> shift; + unsigned HOST_WIDE_INT c2 + = HOST_WIDE_INT_UC (0x0F0F0F0F0F0F0F0F) >> shift; + unsigned HOST_WIDE_INT c3 + = HOST_WIDE_INT_UC (0x) >> shift; + unsigned HOST_WIDE_INT c4 + = HOST_WIDE_INT_UC (0x) >> shift; + } + (if (prec >= 16 + && prec <= 64 + && pow2p_hwi (prec) + && TYPE_UNSIGNED (type) + && integer_onep (@4) + && wi::to_widest (@10) == 2 + && wi::to_widest (@5) == 4 + && wi::to_widest (@1) == prec - 8 + && tree_to_uhwi (@2) == c1 + && tree_to_uhwi (@3) == c2 + && tree_to_uhwi (@9) == c3 + && tree_to_uhwi (@7) == c3 + &&
Re: [patch, libfortran] Fortran 2018: Support d0.d, e0.d, es0.d, en0.d, g0.d and ew.d e0 edit descriptors
Hi Jerry, OK for trunk? Looks good. I also think that your approach that DEC stuff should be checked later is good. If it passes the testsuite, that's good enough for a commit. Thanks for the patch! Regards Thomas
Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808
The first level is ordinary clone, corresponding to a non-self caller, and the param_ipa_cp_max_recursive_depth-1 is for recursive cloning. Then it's ok that the least value is 1, with which disabling does happen. Feng From: luoxhu Sent: Tuesday, December 31, 2019 3:43 PM To: Feng Xue OS; Jan Hubicka; Martin Jambor Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org Subject: Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808 On 2019/12/31 14:43, Feng Xue OS wrote: > One comment: it's better to allow zero value for > param_ipa_cp_max_recursive_depth, > this can be used to disable recursive cloning. Thanks, "1" means no recursive cloning but only constant propagation from caller to callee in your code? ipa-cp.c, line 2019: /* Recursively generate lattice values with a limited count. */ FOR_EACH_VEC_ELT (val_seeds, i, src_val) { for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++) { tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2, src_val, res_type); if (!cstval) break; ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, src_offset, _val, true); gcc_checking_assert (src_val); } } XiongHu > > Feng > > From: luoxhu > Sent: Monday, December 30, 2019 4:11 PM > To: Jan Hubicka; Martin Jambor > Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; > wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org; Feng Xue OS > Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808 > > v2 Changes: > 1. Enable proportion orig_sum to the new nodes for self recursive node: > new_sum = (orig_sum + new_sum) \ > * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth). > 2. Add value range for param_ipa_cp_max_recursive_depth. > > The performance of exchange2 built with PGO will decrease ~28% by r278808 > due to profile count set incorrectly. The cloned nodes are updated to a > very small count caused later pass cunroll fail to unroll the recursive > function in exchange2. > > digits_2 -> > digits_2.constprop.0, digits_2.constprop.1, etc. > > gcc/ChangeLog: > > 2019-12-26 Luo Xiong Hu > > * ipa-cp.c (update_profiling_info): Check self_scc node. > * params.opt (ipa-cp-max-recursive-depth): Set param range. > --- > gcc/ipa-cp.c | 25 + > gcc/params.opt | 2 +- > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index 14064ae0034..947bf7c7199 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node, >false); > new_sum = stats.count_sum; > > + class ipa_node_params *info = IPA_NODE_REF (orig_node); > + if (info && info->node_is_self_scc) > +{ > + profile_count self_recursive_count; > + > + /* The self recursive edge is on the orig_node. */ > + for (cs = orig_node->callees; cs; cs = cs->next_callee) > + if (ipa_edge_within_scc (cs)) > + { > + cgraph_node *callee = cs->callee->function_symbol (); > + if (callee && cs->caller == cs->callee) > + { > + self_recursive_count = cs->count; > + break; > + } > + } > + > + /* Proportion count for self recursive node from all callers. */ > + new_sum > + = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum); > + > + /* Proportion count for specialized cloned node. */ > + new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth); > +} > + > if (orig_node_count < orig_sum + new_sum) > { > if (dump_file) > diff --git a/gcc/params.opt b/gcc/params.opt > index d88ae0c468b..40a03b04580 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) > Init(64) Param > Compile-time bonus IPA-CP assigns to candidates which make loop bounds or > strides known. > > -param=ipa-cp-max-recursive-depth= > -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param > +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) > IntegerRange(1, 10) Param > Maximum depth of recursive cloning for self-recursive function. > > -param=ipa-cp-min-recursive-probability= > -- > 2.21.0.777.g83232e3864 >