Re: [PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)

2019-12-31 Thread Jakub Jelinek
On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote:
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Ok. 

Thanks.

> >One thing I haven't done anything about yet is that there is
> >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized
> >".POPCOUNT" 1
> >before/after this patch with -m32/-march=skylake-avx512.  That is
> >because
> >the popcountll effective target tests that we don't emit a call for
> >__builtin_popcountll, which we don't on ia32 skylake-avx512, but
> >direct_internal_fn_supported_p isn't true - that is because we expand
> >the
> >double word popcount using 2 word popcounts + addition.  Shall the
> >match.pd
> >case handle that case too  by allowing the optimization even if there
> >is a
> >type with half precision for which direct_internal_fn_supported_p?
> 
> You mean emitting a single builtin call
> Or an add of two ifns? 

I meant to do in the match.pd condition what expand_unop will do, i.e.
-   && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
-  OPTIMIZE_FOR_BOTH))
+   && (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
+   OPTIMIZE_FOR_BOTH)
+   /* expand_unop can handle double-word popcount using
+  two word popcounts and addition.  */
+   || (TREE_CODE (type) == INTEGRAL_TYPE
+   && TYPE_PRECISION (type) == 2 * BITS_PER_WORD
+   && (optab_handler (popcount_optab, word_mode)
+   != CODE_FOR_nothing
or so.

Jakub



Re: [PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)

2019-12-31 Thread Richard Biener
On December 31, 2019 12:00:56 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>This patch fixes various issues in the popcount{64,32}c matcher.
>
>The first issue is that it blindly uses tree_to_uhwi on all the
>INTEGER_CST
>operands.  That is fine for those where we know their type, after the
>prec <= 64 && TYPE_UNSIGNED (type)
>verification, but shift counts can have different types and so we need
>to
>handle e.g. invalid shifts by negative amounts.
>
>Another issue is that the transformation is I believe only valid for
>16, 32 and 64-bit precision, e.g. if it would be done in 24-bit or
>62-bit,
>it wouldn't be a popcount.  For 8-bit, it would likely not match due to
>>>
>0, etc.
>
>Yet another issue is that the >> shift computations could very well
>shift
>by negative amounts e.g. for 128-bit precision, invoking UB in the
>compiler
>until we actually check the precision later.  And, I think we want to
>use
>HOST_WIDE_INT_UC macro instead of hardcoding ULL suffixes.
>
>The formatting didn't match the match.pd formatting style either.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

>One thing I haven't done anything about yet is that there is
>FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized
>".POPCOUNT" 1
>before/after this patch with -m32/-march=skylake-avx512.  That is
>because
>the popcountll effective target tests that we don't emit a call for
>__builtin_popcountll, which we don't on ia32 skylake-avx512, but
>direct_internal_fn_supported_p isn't true - that is because we expand
>the
>double word popcount using 2 word popcounts + addition.  Shall the
>match.pd
>case handle that case too  by allowing the optimization even if there
>is a
>type with half precision for which direct_internal_fn_supported_p?

You mean emitting a single builtin call
Or an add of two ifns? 

>2019-12-31  Jakub Jelinek  
>
>   PR tree-optimization/93098
>   * match.pd (popcount): For shift amounts, use integer_onep
>   or wi::to_widest () == cst instead of tree_to_uhwi () == cst
>   tests.  Make sure that precision is power of two larger than or equal
>   to 16.  Ensure shift is never negative.  Use HOST_WIDE_INT_UC macro
>   instead of ULL suffixed constants.  Formatting fixes.
>
>   * gcc.c-torture/compile/pr93098.c: New test.
>
>--- gcc/match.pd.jj2019-12-09 11:12:48.351010794 +0100
>+++ gcc/match.pd   2019-12-30 09:52:05.423499909 +0100
>@@ -5786,43 +5786,50 @@ (define_operator_list COND_TERNARY
>  return (x * 0x01010101) >> 24;
>}  */
> (simplify
>-  (rshift
>-(mult
>-  (bit_and
>-  (plus:c
>-(rshift @8 INTEGER_CST@5)
>-(plus:c@8
>-  (bit_and @6 INTEGER_CST@7)
>-  (bit_and
>-(rshift
>-  (minus@6
>-@0
>-(bit_and
>-  (rshift @0 INTEGER_CST@4)
>-  INTEGER_CST@11))
>-INTEGER_CST@10)
>-  INTEGER_CST@9)))
>-  INTEGER_CST@3)
>-  INTEGER_CST@2)
>-INTEGER_CST@1)
>+ (rshift
>+  (mult
>+   (bit_and
>+(plus:c
>+ (rshift @8 INTEGER_CST@5)
>+  (plus:c@8
>+   (bit_and @6 INTEGER_CST@7)
>+  (bit_and
>+   (rshift
>+(minus@6 @0
>+ (bit_and (rshift @0 INTEGER_CST@4) INTEGER_CST@11))
>+INTEGER_CST@10)
>+   INTEGER_CST@9)))
>+INTEGER_CST@3)
>+   INTEGER_CST@2)
>+  INTEGER_CST@1)
>   /* Check constants and optab.  */
>-  (with
>- {
>-   unsigned prec = TYPE_PRECISION (type);
>-   int shift = 64 - prec;
>-   const unsigned HOST_WIDE_INT c1 = 0x0101010101010101ULL >>
>shift,
>-  c2 = 0x0F0F0F0F0F0F0F0FULL >> shift,
>-  c3 = 0xULL >> shift,
>-  c4 = 0xULL >> shift;
>- }
>-(if (prec <= 64 && TYPE_UNSIGNED (type) && tree_to_uhwi (@4) == 1
>-  && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
>-  && tree_to_uhwi (@1) == prec - 8 && tree_to_uhwi (@2) == c1
>-  && tree_to_uhwi (@3) == c2 && tree_to_uhwi (@9) == c3
>-  && tree_to_uhwi (@7) == c3 && tree_to_uhwi (@11) == c4
>-  && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
>-  OPTIMIZE_FOR_BOTH))
>-(convert (IFN_POPCOUNT:type @0)
>+  (with { unsigned prec = TYPE_PRECISION (type);
>+int shift = (64 - prec) & 63;
>+unsigned HOST_WIDE_INT c1
>+  = HOST_WIDE_INT_UC (0x0101010101010101) >> shift;
>+unsigned HOST_WIDE_INT c2
>+  = HOST_WIDE_INT_UC (0x0F0F0F0F0F0F0F0F) >> shift;
>+unsigned HOST_WIDE_INT c3
>+  = HOST_WIDE_INT_UC (0x) >> shift;
>+unsigned HOST_WIDE_INT c4
>+  = HOST_WIDE_INT_UC (0x) >> shift;
>+   }
>+   (if (prec >= 16
>+  && prec <= 64
>+  && pow2p_hwi (prec)
>+  && TYPE_UNSIGNED (type)
>+  && integer_onep (@4)
>+  && wi::to_widest (@10) == 2
>+   

[committed] Fix EXTRACT_LAST_REDUCTION segfault

2019-12-31 Thread Richard Sandiford
This code:

  /* Make sure we don't accidentally use the old condition.  */
  cond_expr = NULL_TREE;

was misplaced, since it triggered even when we needed to force the
original unmodified cond_expr into a mask temporary and then invert it.

Tested on aarch64-linux-gnu and applied as obvious.

Richard


2019-12-31  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_condition): Only nullify cond_expr
if we've created a new condition.  Don't nullify it if we've decided
to keep it and then invert the result.

gcc/testsuite/
* gcc.dg/vect/vect-cond-reduc-6.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-12-29 09:27:46.623861776 +
+++ gcc/tree-vect-stmts.c   2019-12-31 15:31:37.22625 +
@@ -10033,10 +10033,12 @@ vectorizable_condition (stmt_vec_info st
  if (new_code == ERROR_MARK)
must_invert_cmp_result = true;
  else
-   cond_code = new_code;
+   {
+ cond_code = new_code;
+ /* Make sure we don't accidentally use the old condition.  */
+ cond_expr = NULL_TREE;
+   }
}
-  /* Make sure we don't accidentally use the old condition.  */
-  cond_expr = NULL_TREE;
   std::swap (then_clause, else_clause);
 }
 
Index: gcc/testsuite/gcc.dg/vect/vect-cond-reduc-6.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-reduc-6.c   2019-12-31 
15:31:37.22625 +
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+int
+f (int *y)
+{
+  int res = 0;
+  for (int i = 0; i < 100; ++i)
+res = (y[i] & 1) == 0 && (y[i] < 10) ? res : 1;
+  return res;
+}


[C++ PATCH] Fix up building of GCC 8 and earlier with GCC 9/10 (PR c/90677)

2019-12-31 Thread Jakub Jelinek
Hi!

My PR90677 fix actually made building older GCCs with newer ones worse.
The problem is that identifier_global_value used earlier returned either the
type decl on success or NULL_TREE on failure and the caller in that case
just defers handling it until later, and that is also what the C
identifier_global_tag implementation does, but C++ uses
lookup_qualified_name which returns error_mark_node if not found, rather
than NULL_TREE and the c-format.c caller is unprepared to handle that and
diagnoses error.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
additionally tested by building GCC 8 with unmodified trunk (failed with
In file included from ./config.h:8,
 from ../../gcc/gimple-streamer-in.c:22:
../../gcc/../include/ansidecl.h:169:64: error: ‘cgraph_node’ is not defined as 
a type
  169 | #  define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m)))
  |^
...
) and finally building GCC 8 with patched trunk, which succeeded.
Ok for trunk/9.3?

2019-12-31  Jakub Jelinek  

PR c/90677
* cp-objcp-common.c (identifier_global_tag): Return NULL_TREE if name
has not been found, rather than error_mark_node.

* c-c++-common/pr90677-2.c: New test.

--- gcc/cp/cp-objcp-common.c.jj 2019-11-22 22:44:02.124600331 +0100
+++ gcc/cp/cp-objcp-common.c2019-12-30 18:29:01.386576418 +0100
@@ -354,8 +354,11 @@ identifier_global_value (tree name)
 tree
 identifier_global_tag (tree name)
 {
-  return lookup_qualified_name (global_namespace, name, /*prefer_type*/2,
-   /*complain*/false);
+  tree ret = lookup_qualified_name (global_namespace, name, /*prefer_type*/2,
+   /*complain*/false);
+  if (ret == error_mark_node)
+return NULL_TREE;
+  return ret;
 }
 
 /* Returns true if NAME refers to a built-in function or function-like
--- gcc/testsuite/c-c++-common/pr90677-2.c.jj   2019-12-30 18:30:50.318911231 
+0100
+++ gcc/testsuite/c-c++-common/pr90677-2.c  2019-12-30 18:30:36.303125485 
+0100
@@ -0,0 +1,8 @@
+/* PR c/90677 */
+/* { dg-do compile } */
+/* { dg-options "-W -Wall" } */
+
+extern void foo (int, int, const char *, ...)  

+  __attribute__ ((__format__ (__gcc_tdiag__, 3, 4)));  

+struct cgraph_node;

+extern void bar (struct cgraph_node *);


Jakub



[PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)

2019-12-31 Thread Jakub Jelinek
Hi!

This patch fixes various issues in the popcount{64,32}c matcher.

The first issue is that it blindly uses tree_to_uhwi on all the INTEGER_CST
operands.  That is fine for those where we know their type, after the
prec <= 64 && TYPE_UNSIGNED (type)
verification, but shift counts can have different types and so we need to
handle e.g. invalid shifts by negative amounts.

Another issue is that the transformation is I believe only valid for
16, 32 and 64-bit precision, e.g. if it would be done in 24-bit or 62-bit,
it wouldn't be a popcount.  For 8-bit, it would likely not match due to >>
0, etc.

Yet another issue is that the >> shift computations could very well shift
by negative amounts e.g. for 128-bit precision, invoking UB in the compiler
until we actually check the precision later.  And, I think we want to use
HOST_WIDE_INT_UC macro instead of hardcoding ULL suffixes.

The formatting didn't match the match.pd formatting style either.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

One thing I haven't done anything about yet is that there is
FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized ".POPCOUNT" 1
before/after this patch with -m32/-march=skylake-avx512.  That is because
the popcountll effective target tests that we don't emit a call for
__builtin_popcountll, which we don't on ia32 skylake-avx512, but
direct_internal_fn_supported_p isn't true - that is because we expand the
double word popcount using 2 word popcounts + addition.  Shall the match.pd
case handle that case too  by allowing the optimization even if there is a
type with half precision for which direct_internal_fn_supported_p?

2019-12-31  Jakub Jelinek  

PR tree-optimization/93098
* match.pd (popcount): For shift amounts, use integer_onep
or wi::to_widest () == cst instead of tree_to_uhwi () == cst
tests.  Make sure that precision is power of two larger than or equal
to 16.  Ensure shift is never negative.  Use HOST_WIDE_INT_UC macro
instead of ULL suffixed constants.  Formatting fixes.

* gcc.c-torture/compile/pr93098.c: New test.

--- gcc/match.pd.jj 2019-12-09 11:12:48.351010794 +0100
+++ gcc/match.pd2019-12-30 09:52:05.423499909 +0100
@@ -5786,43 +5786,50 @@ (define_operator_list COND_TERNARY
  return (x * 0x01010101) >> 24;
}  */
 (simplify
-  (rshift
-(mult
-  (bit_and
-   (plus:c
- (rshift @8 INTEGER_CST@5)
- (plus:c@8
-   (bit_and @6 INTEGER_CST@7)
-   (bit_and
- (rshift
-   (minus@6
- @0
- (bit_and
-   (rshift @0 INTEGER_CST@4)
-   INTEGER_CST@11))
- INTEGER_CST@10)
-   INTEGER_CST@9)))
-   INTEGER_CST@3)
-  INTEGER_CST@2)
-INTEGER_CST@1)
+ (rshift
+  (mult
+   (bit_and
+(plus:c
+ (rshift @8 INTEGER_CST@5)
+  (plus:c@8
+   (bit_and @6 INTEGER_CST@7)
+   (bit_and
+(rshift
+ (minus@6 @0
+  (bit_and (rshift @0 INTEGER_CST@4) INTEGER_CST@11))
+ INTEGER_CST@10)
+INTEGER_CST@9)))
+INTEGER_CST@3)
+   INTEGER_CST@2)
+  INTEGER_CST@1)
   /* Check constants and optab.  */
-  (with
- {
-   unsigned prec = TYPE_PRECISION (type);
-   int shift = 64 - prec;
-   const unsigned HOST_WIDE_INT c1 = 0x0101010101010101ULL >> shift,
-   c2 = 0x0F0F0F0F0F0F0F0FULL >> shift,
-   c3 = 0xULL >> shift,
-   c4 = 0xULL >> shift;
- }
-(if (prec <= 64 && TYPE_UNSIGNED (type) && tree_to_uhwi (@4) == 1
-  && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
-  && tree_to_uhwi (@1) == prec - 8 && tree_to_uhwi (@2) == c1
-  && tree_to_uhwi (@3) == c2 && tree_to_uhwi (@9) == c3
-  && tree_to_uhwi (@7) == c3 && tree_to_uhwi (@11) == c4
-  && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
-   OPTIMIZE_FOR_BOTH))
-(convert (IFN_POPCOUNT:type @0)
+  (with { unsigned prec = TYPE_PRECISION (type);
+ int shift = (64 - prec) & 63;
+ unsigned HOST_WIDE_INT c1
+   = HOST_WIDE_INT_UC (0x0101010101010101) >> shift;
+ unsigned HOST_WIDE_INT c2
+   = HOST_WIDE_INT_UC (0x0F0F0F0F0F0F0F0F) >> shift;
+ unsigned HOST_WIDE_INT c3
+   = HOST_WIDE_INT_UC (0x) >> shift;
+ unsigned HOST_WIDE_INT c4
+   = HOST_WIDE_INT_UC (0x) >> shift;
+   }
+   (if (prec >= 16
+   && prec <= 64
+   && pow2p_hwi (prec)
+   && TYPE_UNSIGNED (type)
+   && integer_onep (@4)
+   && wi::to_widest (@10) == 2
+   && wi::to_widest (@5) == 4
+   && wi::to_widest (@1) == prec - 8
+   && tree_to_uhwi (@2) == c1
+   && tree_to_uhwi (@3) == c2
+   && tree_to_uhwi (@9) == c3
+   && tree_to_uhwi (@7) == c3
+   && 

Re: [patch, libfortran] Fortran 2018: Support d0.d, e0.d, es0.d, en0.d, g0.d and ew.d e0 edit descriptors

2019-12-31 Thread Thomas Koenig

Hi Jerry,


OK for trunk?


Looks good. I also think that your approach that DEC stuff should
be checked later is good.  If it passes the testsuite, that's good
enough for a commit.

Thanks for the patch!

Regards

Thomas



Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

2019-12-31 Thread Feng Xue OS
The first level is ordinary clone, corresponding to a non-self caller,
and the param_ipa_cp_max_recursive_depth-1 is for recursive cloning.
Then it's ok that the least value is 1, with which disabling does happen.

Feng


From: luoxhu 
Sent: Tuesday, December 31, 2019 3:43 PM
To: Feng Xue OS; Jan Hubicka; Martin Jambor
Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; 
wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org
Subject: Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808


On 2019/12/31 14:43, Feng Xue OS wrote:
> One comment: it's better to allow zero value for 
> param_ipa_cp_max_recursive_depth,
> this can be used to disable recursive cloning.

Thanks, "1" means no recursive cloning but only constant propagation from
caller to callee in your code?

ipa-cp.c, line 2019:

   /* Recursively generate lattice values with a limited count.  */
   FOR_EACH_VEC_ELT (val_seeds, i, src_val)
{
  for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++)
{
  tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
 src_val, res_type);
  if (!cstval)
break;

  ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
  src_offset, _val, true);
  gcc_checking_assert (src_val);
}
}

XiongHu

>
> Feng
> 
> From: luoxhu 
> Sent: Monday, December 30, 2019 4:11 PM
> To: Jan Hubicka; Martin Jambor
> Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; 
> wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org; Feng Xue OS
> Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808
>
> v2 Changes:
> 1. Enable proportion orig_sum to the new nodes for self recursive node:
> new_sum = (orig_sum + new_sum) \
> * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Add value range for param_ipa_cp_max_recursive_depth.
>
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small count caused later pass cunroll fail to unroll the recursive
> function in exchange2.
>
> digits_2 ->
> digits_2.constprop.0, digits_2.constprop.1, etc.
>
> gcc/ChangeLog:
>
>  2019-12-26  Luo Xiong Hu  
>
>  * ipa-cp.c (update_profiling_info): Check self_scc node.
>  * params.opt (ipa-cp-max-recursive-depth): Set param range.
> ---
>   gcc/ipa-cp.c   | 25 +
>   gcc/params.opt |  2 +-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 14064ae0034..947bf7c7199 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
>false);
> new_sum = stats.count_sum;
>
> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
> +  if (info && info->node_is_self_scc)
> +{
> +  profile_count self_recursive_count;
> +
> +  /* The self recursive edge is on the orig_node.  */
> +  for (cs = orig_node->callees; cs; cs = cs->next_callee)
> +   if (ipa_edge_within_scc (cs))
> + {
> +   cgraph_node *callee = cs->callee->function_symbol ();
> +   if (callee && cs->caller == cs->callee)
> + {
> +   self_recursive_count = cs->count;
> +   break;
> + }
> + }
> +
> +  /* Proportion count for self recursive node from all callers.  */
> +  new_sum
> +   = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
> +
> +  /* Proportion count for specialized cloned node.  */
> +  new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
> +}
> +
> if (orig_node_count < orig_sum + new_sum)
>   {
> if (dump_file)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index d88ae0c468b..40a03b04580 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) 
> Init(64) Param
>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or 
> strides known.
>
>   -param=ipa-cp-max-recursive-depth=
> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) 
> IntegerRange(1, 10) Param
>   Maximum depth of recursive cloning for self-recursive function.
>
>   -param=ipa-cp-min-recursive-probability=
> --
> 2.21.0.777.g83232e3864
>