[PATCH] options: Make --help= to emit values post-overrided

2020-08-09 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2020/8/7 下午10:42, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Aug 07, 2020 at 10:44:10AM +0800, Kewen.Lin wrote:
>>> I think this makes a lot of sense.
>>>
 btw, not sure whether it's a good idea to move target_option_override_hook
 call into print_specific_help and use one function local static
 variable to control it's called once for all kinds of help dumping
 (possible combination), then can remove the calls in function 
 common_handle_option.
>>>
>>> I cannot easily imagine what that will look like...  it could easily be
>>> worse than what you have here (callbacks aren't so nice, but there are
>>> worse things).
>>>
>>
>> I attached opts_alt2.diff to be more specific for this, both alt1 and alt2
>> follow the existing callback scheme, alt2 aims to avoid possible multiple
>> times target_option_override_hook calls when we have several --help= or
>> similar, but I guess alt1 is also fine since the hook should be allowed to
>> be called more than once.
> 
> It could take quadratic time in alt1...  Mostly a theoretical problem I
> think, but still.
> 
> All options look fine to me, but you need someone else to approve this ;-)

Yeah, CC Joseph.  Thanks!

> 
> One thing:
> 
>> @@ -1664,6 +1665,14 @@ print_specific_help (unsigned int include_flags,
>>  }
>>  }
>>  
>> +  static bool call_override_once_p = false;
>> +  if (!call_override_once_p)
>> +{
>> +  gcc_assert (target_option_override_hook);
>> +  target_option_override_hook ();
>> +  call_override_once_p = true;
>> +}
> 
> That assert is pretty random, nothing else using the hook assert it
> isn't zero; it immediately and always calls the hook right away, so you
> will get a nice ICE with backtrace if it is zero *anyway*.
> 

Good point!  Removed it.

Bootstrapped/regtested on powerpc64le-linux-gnu P8.

BR,
Kewen

*opts_alt1.diff*

gcc/ChangeLog:

* opts-global.c (decode_options): Adjust call to print_help.
* opts.c (print_help): Add one function point argument
target_option_override_hook and call it before print_specific_help.
* opts.h (print_help): Add one more argument to function declare.

*opts_alt2.diff*

gcc/ChangeLog:

* opts-global.c (decode_options): Adjust call to print_help.
* opts.c (print_specific_help): Add one function point argument
target_option_override_hook and call it once.
(print_help): Add one function point argument
target_option_override_hook and pass it in print_specific_help call.
(common_handle_option): Adjust calls to print_specific_help, refactor
calls to target_option_override_hook.
* opts.h (print_help): Add one more argument to function declare.

diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index b1a8429dc3c..ec960c87c9a 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -328,7 +328,7 @@ decode_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   const char *arg;
 
   FOR_EACH_VEC_ELT (help_option_arguments, i, arg)
-print_help (opts, lang_mask, arg);
+print_help (opts, lang_mask, arg, target_option_override_hook);
 }
 
 /* Hold command-line options associated with stack limitation.  */
diff --git a/gcc/opts.c b/gcc/opts.c
index 499eb900643..a83b2f837dd 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2017,7 +2017,8 @@ check_alignment_argument (location_t loc, const char 
*flag, const char *name)
 
 void
 print_help (struct gcc_options *opts, unsigned int lang_mask,
-   const char *help_option_argument)
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void))
 {
   const char *a = help_option_argument;
   unsigned int include_flags = 0;
@@ -2146,8 +2147,10 @@ print_help (struct gcc_options *opts, unsigned int 
lang_mask,
 exclude_flags |= CL_PARAMS;
 
   if (include_flags)
-print_specific_help (include_flags, exclude_flags, 0, opts,
-lang_mask);
+{
+  target_option_override_hook ();
+  print_specific_help (include_flags, exclude_flags, 0, opts, lang_mask);
+}
 }
 
 /* Handle target- and language-independent options.  Return zero to
diff --git a/gcc/opts.h b/gcc/opts.h
index 8f594b46e33..9a837305af1 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -419,8 +419,9 @@ extern bool target_handle_option (struct gcc_options *opts,
 extern void finish_options (struct gcc_options *opts,
struct gcc_options *opts_set,
location_t loc);
-extern void print_help (struct gcc_options *opts, unsigned int lang_mask, const
-   char *help_option_argument);
+extern void print_help (struct gcc_options *opts, unsigned int lang_mask,
+   const char *help_option_argument,
+   void (*target_option_override_hook) (void));
 extern void default_options_optimization (struct gcc_options *opts,
  struct 

Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

2020-08-09 Thread Kewen.Lin via Gcc-patches
Hi Bin,

Thanks for the review!!

on 2020/8/8 下午4:01, Bin.Cheng wrote:
> Hi Kewen,
> Sorry for the late reply.
> The patch's most important change is below cost computation:
> 
>> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct 
>> iv_cand *cand)
>> cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
>>   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>>
>> +  /* Consider additional step updates during unrolling.  */
>> +  if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p)
>> +cost += (data->current_loop->estimated_unroll - 1) * cost_step;
> This is a bit strange, to me the add instructions are additional
> computation caused by unrolling+addressing_mode, rather than a native
> part in candidate itself.  Specifically, an additional cost is needed
> if candidates (without reg_offset_p) are chosen for the address type
> group/uses.

Good point, ideally it should be one additional cost for each cand set,
when we select one cand for one group, we need to check this pair need
more (estimated_unroll - 1) step costs, we probably need to care about
this during remove/replace etc.  IIUC the current IVOPTs cost framework
doesn't support this and it could increase the selection complexity and
time.  I hesitated to do it and put it to cand step cost initially instead.

I was thinking those candidates with reg_offset_p should be only used for
those reg_offset_p groups in most cases (very limited) meanwhile the others
are simply scaled up like before.  But indeed this can cover some similar
cases like one cand is only used for the compare type group which is for
loop closing, then it doesn't need more step costs for unrolling.

Do you prefer me to improve the current cost framework?

>> +
>>   /* Prefer the original ivs unless we may gain something by replacing it.
>>  The reason is to make debugging simpler; so this is not relevant for
>>  artificial ivs created by other optimization passes.  */
>>
> 
>> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data,
>>   return;
>> }
>>
>> +  /* Since we priced more on non reg_offset IV cand step cost, we should 
>> scale
>> + up the appropriate IV group costs.  Simply consider USE_COMPARE at the
>> + loop exit, FIXME if multiple exits supported or no loop exit 
>> comparisons
>> + matter.  */
>> +  if (data->consider_reg_offset_for_unroll_p
>> +  && group->vuses[0]->type != USE_COMPARE)
>> +cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll;
> Not quite follow here, giving "pricing more on on-reg_offset IV cand"
> doesn't make much sense to me.  Also why generic type uses are not
> skipped?  We want to model the cost required for address computation,
> however, for generic type uses there is no way to save the computation
> in "address expression".  Once unrolled, the computation is always
> there?
> 

The main intention is to scale up the group/cand cost for unrolling since
we have scaled up the step costs.  The assumption is that the original
costing (without this patch) can be viewed as either for all unrolled
iterations or just one single iteration.  Since IVOPTs doesn't support
fractional costing, I interpreted it as single iterations, to emulate
unrolling scenario based on the previous step cost scaling, we need to 
scale up the cost for all computation.

In most cases, the compare type use is for loop closing, there is still
only one computation even unrolling happens, so I excluded it here.
As "FIXME", if we find some cases are off, we can further restrict it to
those USE_COMPARE uses which is exactly for loop closing.

> And what's the impact on targets supporting [base + index + offset]
> addressing mode?

Good question, I didn't notice it since power doesn't support it.
I noticed the comments of function addr_offset_valid_p only mentioning
[base + offset], I guess it excludes [base + index + offset]?
But I guess the address-based IV can work for this mode?

> 
> Given the patch is not likely to harm because rtl loop unrolling is
> (or was?) by default disabled, so I am OK once the above comments are
> addressed.
> 

Yes, it needs explicit unrolling options, excepting for some targets
wants to enable it for some cases with specific loop_unroll_adjust checks.

> I wonder if it's possible to get 10% of (all which should be unrolled)
> loops unrolled (conservatively) on gimple and enable it by default at
> O3, rather than teaching ivopts to model a future pass which not
> likely be used outside of benchmarks?
> 

Yeah, it would be nice if the unrolling happen before IVOPTs and won't
have any future unrollings to get it off.  PR[1] seems to have some
discussion on gimple unrolling.

Richi suggested driven-by-need gimple unrolling in the previous discussion[2]
on the RFC of this.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537645.html

BR,
Kewen


Re: [patch] multi-range implementation for value_range (irange)

2020-08-09 Thread Martin Liška

On 8/5/20 4:27 PM, Gerald Pfeifer wrote:

Hi Aldy,

On Fri, 31 Jul 2020, Aldy Hernandez via Gcc-patches wrote:

Jeff approved this patch off-list.  I will re-run tests once again and
commit by Monday.


I believe this has broken the bootstrap with clang (specifically
FreeBSD clang version 10.0.0):

In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/c/gimple-parser.c:44:
In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/tree-vrp.h:23:
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:347:1: error: static declaration of 
'gt_ggc_mx' follows non-static declaration gt_ggc_mx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:150:37: note: previous 
declaration is here
   template  friend void gt_ggc_mx (int_range *);
 ^

/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:358:1: error: static declaration of 
'gt_pch_nx' follows non-static declaration gt_pch_nx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:151:37: note: previous 
declaration is here
   template  friend void gt_pch_nx (int_range *);
 ^

My daily tester started to 20200803T1640, so the root cause of this must
have entered GCC trunk between Sunday 16:40 UTC and Monday 16:40 UTC.

Gerald



Hey.

This one is still broken. Can you please Aldy take a look?
It's good that GCC code base can be compiled with clang ;)

Martin


Re: [PATCH] rs6000: MMA built-ins reject typedefs of MMA types

2020-08-09 Thread Peter Bergner via Gcc-patches
On 8/8/20 11:59 AM, Peter Bergner wrote:
> Testing was clean, so I pushed this to trunk.  Will wait before
> backporting.  Thanks!

Scanning through Bill Seurer's testsuite runs for POWER, I see no fallout,
so I have pushed this to the GCC 10 branch.

Peter




Re: [PATCH] rs6000: ICE when using an MMA type as a function param

2020-08-09 Thread Peter Bergner via Gcc-patches
On 8/9/20 10:03 PM, Peter Bergner wrote:
> gcc/
>   PR target/96506
>   * config/rs6000/rs6000-call.c (rs6000_promote_function_mode):
>   (rs6000_function_arg):

Oops, missed the ChangeLog entries:

gcc/
PR target/96506
* config/rs6000/rs6000-call.c (rs6000_promote_function_mode): Disallow
MMA types as return values.
(rs6000_function_arg): Disallow MMA types as function arguments.


Peter


Re: RFC: Monitoring old PRs, new dg directives

2020-08-09 Thread Martin Liška

On 8/5/20 12:22 AM, Marek Polacek wrote:

On Thu, Jul 30, 2020 at 11:08:03AM +0200, Martin Liška wrote:

Hello.

I support the initiative!
What would be nice to add leading 'PR component/12345'
to a git commit so that these test additions are linked to bugzilla issues.


Thanks!  Yes, it should be clear which test tests a PR that has the monitored
keyword.  That may get lost when adding a lot of tests in one commit, but can
always be clarified in a comment.


Note that contrib/mklog.py can correctly extract leading 'PR component/12345' 
lines
from test-cases and add them to a ChangeLog entry.

Martin


Or just grep 'PR component/12345' in the
testsuite; new tests should have this as their first line.

Marek





[PATCH] rs6000: ICE when using an MMA type as a function param

2020-08-09 Thread Peter Bergner via Gcc-patches
PR96506 shows a problem where we ICE on illegal usage, namely using MMA
types for function arguments and return values.  The solution is to flag
these illegal usages as errors early, before we ICE.

The patch below is currently bootstrapping and regtesting.  Ok for trunk
once that comes back clean?  Ok for GCC 10 after some bake in?

Peter


gcc/
PR target/96506
* config/rs6000/rs6000-call.c (rs6000_promote_function_mode):
(rs6000_function_arg):

gcc/testsuite/
PR target/96506
* gcc.target/powerpc/pr96506.c: New test.


diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 189497efb45..e4ed88cd2f8 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -6444,8 +6444,23 @@ machine_mode
 rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
  machine_mode mode,
  int *punsignedp ATTRIBUTE_UNUSED,
- const_tree, int)
+ const_tree, int for_return)
 {
+  static struct function *fn = NULL;
+
+  /* We do not allow MMA types being used as return values.  Only report
+ the invalid return value usage the first time we encounter it.  */
+  if (for_return
+  && fn != cfun
+  && (mode == POImode || mode == PXImode))
+{
+  fn = cfun;
+  if (TYPE_CANONICAL (type) != NULL_TREE)
+   type = TYPE_CANONICAL (type);
+  error ("invalid use of MMA type %qs as a function return value",
+IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+}
+
   PROMOTE_MODE (mode, *punsignedp, type);
 
   return mode;
@@ -7396,6 +7411,16 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info )
   machine_mode elt_mode;
   int n_elts;
 
+  /* We do not allow MMA types being used as function arguments.  */
+  if (mode == POImode || mode == PXImode)
+{
+  if (TYPE_CANONICAL (type) != NULL_TREE)
+   type = TYPE_CANONICAL (type);
+  error ("invalid use of MMA operand of type %qs as a function parameter",
+IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+  return NULL_RTX;
+}
+
   /* Return a marker to indicate whether CR1 needs to set or clear the
  bit that V.4 uses to say fp args were passed in registers.
  Assume that we don't need the marker for software floating point,
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96506.c 
b/gcc/testsuite/gcc.target/powerpc/pr96506.c
new file mode 100644
index 000..4ed31bc55fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96506.c
@@ -0,0 +1,61 @@
+/* PR target/96506 */
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -w" } */
+
+typedef __vector_pair vpair_t;
+typedef __vector_quad vquad_t;
+
+/* Verify we flag errors on the following.  */
+
+void
+foo0 (void)
+{
+  __vector_pair v;
+  bar0 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. 
as a function parameter" } */
+}
+
+void
+foo1 (void)
+{
+  vpair_t v;
+  bar1 (v); /* { dg-error "invalid use of MMA operand of type .__vector_pair. 
as a function parameter" } */
+}
+
+void
+foo2 (void)
+{
+  __vector_quad v;
+  bar2 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. 
as a function parameter" } */
+}
+
+void
+foo3 (void)
+{
+  vquad_t v;
+  bar3 (v); /* { dg-error "invalid use of MMA operand of type .__vector_quad. 
as a function parameter" } */
+}
+
+__vector_pair
+foo4 (__vector_pair *src) /* { dg-error "invalid use of MMA type 
.__vector_pair. as a function return value" } */
+{
+  return *src;
+}
+
+vpair_t
+foo5 (vpair_t *src) /* { dg-error "invalid use of MMA type .__vector_pair. as 
a function return value" } */
+{
+  return *src;
+}
+
+__vector_quad
+foo6 (__vector_quad *src) /* { dg-error "invalid use of MMA type 
.__vector_quad. as a function return value" } */
+{
+  return *src;
+}
+
+vquad_t
+foo7 (vquad_t *src) /* { dg-error "invalid use of MMA type .__vector_quad. as 
a function return value" } */
+{
+  return *src;
+}


Re: Fix remove_predictions_associated_with_edge

2020-08-09 Thread Martin Liška

On 8/2/20 3:02 PM, Jan Hubicka wrote:

Hi,
remove_predictions_associated_with_edge currently calls filter_predicitons
passing it equal_edge_p. Becase filter_predictions removes all edges where
filter returns false, the function does exact oposite. Fixed thus.

Bootstrapped/regtested x86_64-linux.


Thanks for the patch. It seems to me obvious, please install it.

Martin



gcc/ChangeLog:

2020-08-02  Jan Hubicka  

* predict.c (filter_predictions): Document semantics of filter.
(equal_edge_p): Rename to ...
(not_equal_edge_p): ... this; reverse semantics.
(remove_predictions_associated_with_edge): Fix.

diff --git a/gcc/predict.c b/gcc/predict.c
index 0a317a7a4ac..2164a06e083 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -595,10 +595,11 @@ gimple_predict_edge (edge e, enum br_predictor predictor, 
int probability)
  }
  }
  
-/* Filter edge predictions PREDS by a function FILTER.  DATA are passed

-   to the filter function.  */
+/* Filter edge predictions PREDS by a function FILTER: if FILTER return false
+   the prediction is removed.
+   DATA are passed to the filter function.  */
  
-void

+static void
  filter_predictions (edge_prediction **preds,
bool (*filter) (edge_prediction *, void *), void *data)
  {
@@ -627,10 +628,10 @@ filter_predictions (edge_prediction **preds,
  /* Filter function predicate that returns true for a edge predicate P
 if its edge is equal to DATA.  */
  
-bool

-equal_edge_p (edge_prediction *p, void *data)
+static bool
+not_equal_edge_p (edge_prediction *p, void *data)
  {
-  return p->ep_edge == (edge)data;
+  return p->ep_edge != (edge)data;
  }
  
  /* Remove all predictions on given basic block that are attached

@@ -642,7 +643,7 @@ remove_predictions_associated_with_edge (edge e)
  return;
  
edge_prediction **preds = bb_predictions->get (e->src);

-  filter_predictions (preds, equal_edge_p, e);
+  filter_predictions (preds, not_equal_edge_p, e);
  }
  
  /* Clears the list of predictions stored for BB.  */






Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-09 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 7, 2020 at 11:02 PM Kirill Yukhin  wrote:
>
> Hello,
>
> On 05 авг 09:29, Hongtao Liu wrote:
> > On Tue, Aug 4, 2020 at 6:28 PM Kirill Yukhin  
> > wrote:
> > >
> > > On 04 авг 13:26, Kirill Yukhin wrote:
> > > > Could you please clarify, how your patch relared to [1]?
> > > > I see from the bug that it describes perf issue w.r.t. scalar
> > > > operations.
> > >
> > Sorry for Typo, it's pr96243.
>
> Please, don't forget to update ChangeLog entry.
>
Yes.
> It's a pity that we don't support vector comparisons in CSE,
> hope will fix in future.
>
> Patch LGTM.
>
> --
> K
Thanks.


-- 
BR,
Hongtao


Simplify X * C1 == C2 with wrapping overflow

2020-08-09 Thread Marc Glisse
Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten 
as X == C2 * inv(C1) when overflow wraps.


mod_inv should probably be updated to better match the other wide_int 
functions, but that's a separate issue.


Bootstrap+regtest on x86_64-pc-linux-gnu.

2020-08-10  Marc Glisse  

PR tree-optimization/95433
* match.pd (X * C1 == C2): Handle wrapping overflow.
* expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv.
(mod_inv): Move...
* wide-int.cc (mod_inv): ... here.
* wide-int.h (mod_inv): Declare it.

* gcc.dg/tree-ssa/pr95433-2.c: New file.

--
Marc Glissediff --git a/gcc/expr.c b/gcc/expr.c
index a150fa0d3b5..ebf0c9e4797 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11859,38 +11859,6 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
   return init;
 }
 
-/* Compute the modular multiplicative inverse of A modulo M
-   using extended Euclid's algorithm.  Assumes A and M are coprime.  */
-static wide_int
-mod_inv (const wide_int , const wide_int )
-{
-  /* Verify the assumption.  */
-  gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1));
-
-  unsigned int p = a.get_precision () + 1;
-  gcc_checking_assert (b.get_precision () + 1 == p);
-  wide_int c = wide_int::from (a, p, UNSIGNED);
-  wide_int d = wide_int::from (b, p, UNSIGNED);
-  wide_int x0 = wide_int::from (0, p, UNSIGNED);
-  wide_int x1 = wide_int::from (1, p, UNSIGNED);
-
-  if (wi::eq_p (b, 1))
-return wide_int::from (1, p, UNSIGNED);
-
-  while (wi::gt_p (c, 1, UNSIGNED))
-{
-  wide_int t = d;
-  wide_int q = wi::divmod_trunc (c, d, UNSIGNED, );
-  c = t;
-  wide_int s = x0;
-  x0 = wi::sub (x1, wi::mul (q, x0));
-  x1 = s;
-}
-  if (wi::lt_p (x1, 0, SIGNED))
-x1 += d;
-  return x1;
-}
-
 /* Optimize x % C1 == C2 for signed modulo if C1 is a power of two and C2
is non-zero and C3 ((1<<(prec-1)) | (C1 - 1)):
for C2 > 0 to x & C3 == C2
@@ -12101,7 +12069,7 @@ maybe_optimize_mod_cmp (enum tree_code code, tree *arg0, tree *arg1)
   w = wi::lrshift (w, shift);
   wide_int a = wide_int::from (w, prec + 1, UNSIGNED);
   wide_int b = wi::shifted_mask (prec, 1, false, prec + 1);
-  wide_int m = wide_int::from (mod_inv (a, b), prec, UNSIGNED);
+  wide_int m = wide_int::from (wi::mod_inv (a, b), prec, UNSIGNED);
   tree c3 = wide_int_to_tree (type, m);
   tree c5 = NULL_TREE;
   wide_int d, e;
diff --git a/gcc/match.pd b/gcc/match.pd
index 7e5c5a6eae6..c3b88168ac4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3828,7 +3828,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (cmp @0 @2))
 
 /* For integral types with undefined overflow fold
-   x * C1 == C2 into x == C2 / C1 or false.  */
+   x * C1 == C2 into x == C2 / C1 or false.
+   If overflow wraps and C1 is odd, simplify to x == C2 / C1 in the ring
+   Z / 2^n Z.  */
 (for cmp (eq ne)
  (simplify
   (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2)
@@ -3839,7 +3841,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1),
 			TYPE_SIGN (TREE_TYPE (@0)), ))
  (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); })
- { constant_boolean_node (cmp == NE_EXPR, type); })
+ { constant_boolean_node (cmp == NE_EXPR, type); }))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+	&& (wi::bit_and (wi::to_wide (@1), 1) == 1))
+(cmp @0
+ {
+   tree itype = TREE_TYPE (@0);
+   int p = TYPE_PRECISION (itype);
+   wide_int m = wi::one (p + 1) << p;
+   wide_int a = wide_int::from (wi::to_wide (@1), p + 1, UNSIGNED);
+   wide_int i = wide_int::from (wi::mod_inv (a, m),
+p, TYPE_SIGN (itype));
+   wide_int_to_tree (itype, wi::mul (i, wi::to_wide (@2)));
+ })
 
 /* Simplify comparison of something with itself.  For IEEE
floating-point, we can only do some of these simplifications.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c
new file mode 100644
index 000..c830a3d195f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fwrapv -fdump-tree-gimple" } */
+
+typedef __INT32_TYPE__ int32_t;
+typedef unsigned __INT32_TYPE__ uint32_t;
+
+int e(int32_t x){return 3*x==5;}
+int f(int32_t x){return 3*x==-5;}
+int g(int32_t x){return -3*x==5;}
+int h(int32_t x){return 7*x==3;}
+int i(uint32_t x){return 7*x==3;}
+
+/* { dg-final { scan-tree-dump-times "== 1431655767" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "== -1431655767" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "== 613566757" 2 "gimple" } } */
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 03be0074816..f4d949c38a0 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -2223,6 +2223,39 @@ wi::round_up_for_mask (const wide_int , const wide_int )
   return (val | tmp) & -tmp;
 }
 
+/* Compute the modular 

Re: [PATCH] middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost.

2020-08-09 Thread Richard Biener via Gcc-patches
On August 9, 2020 12:30:32 PM GMT+02:00, Roger Sayle 
 wrote:
>
>This patch fixes a subtle bug in the depths of GCC's synth_mult,
>where the middle-end queries whether (how well) the target supports
>widening and highpart multiplications by calling targetm.rtx_costs.
>The code in init_expmed and init_expmed_one_mode iterates over various
>RTL patterns querying the cost of each.  To avoid generating & garbage
>collecting too much junk, it reuses the same RTL over and over, but
>adjusting the modes between each call.
>
>Alas this reuse of state is a little fragile, and at some point a
>change to init_expmed_one_conv has resulted in the state (mode of
>a register) being changed, but not reset before being used again.
>
>The fallout is that GCC currently queries the backend for the cost
>of non-sense RTL such as:
>
>(mult:HI (zero_extend:HI (reg:TI 82))
>(zero_extend:HI (reg:TI 82)))
>
>and
>
>(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
>(zero_extend:HI (reg:TI 82)))
>(const_int 8 [0x8]))
>
>The fix is to set the mode of the register back to its assumed
>state, as (reg:QI 82) in the above patterns makes much more sense.
>
>Using the old software engineering/defensive programming maxim of
>"why fix a bug just once, if it can be fixed in multiple places",
>this patch both restores the original value in init_expmed_one_conv,
>and also sets it to the expected value in init_expmed_one_mode.
>This should hopefully signal the need to be careful of invariants for
>anyone modifying this code in future.
>
>Alas things are rarely simple...  Fixing this obviously incorrect
>logic causes a failure of gcc.target/i386/pr71321.c that tests for
>a particular expansion from synth_mult.  The issue here is that this
>test is checking for a specific multiplication expansion, when it
>should really be checking that we don't generate the inefficient
>"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
>reported in the PR target/71321.  Now that we use correct costs,
>GCC uses a multiply instruction matching icc, LLVM and GCC prior
>to v4.8.  I've even microbenchmarked this function (over several
>minutes) with (disappointingly) no difference in timings.  Three
>dependent leas has 3-cycle latency, exactly the same as a widening
>byte multiply (on Haswell), so the shorter code splits the tie.
>[I have a follow-up patch that may improve things further].
>
>Before:
>movzbl  %dil, %eax
>leal(%rax,%rax,4), %edx
>leal(%rax,%rdx,8), %edx
>leal(%rdx,%rdx,4), %edx
>shrw$11, %dx
>leal(%rdx,%rdx,4), %eax
>...
>
>After:
>movl$-51, %edx
>movl%edx, %eax
>mulb%dil
>movl%eax, %edx
>shrw$11, %dx
>leal(%rdx,%rdx,4), %eax
>...
>
>
>This patch has been tested on x86_64-pc-linux-gnu with a "make
>bootstrap" and "make -k check" with no new failures.
>Ok for mainline?

OK. 

Richard. 

>
>2020-08-09  Roger Sayle  
>
>gcc/ChangeLog
>   * expmed.c (init_expmed_one_conv): Restore all->reg's mode.
>   (init_expmed_one_mode): Set all->reg to desired mode.
>
>gcc/testsuite/ChangeLog
>   PR target/71321
>   * gcc.target/i386/pr71321.c: Check that the code doesn't use
>   the 4B zero displacement lea, not that it uses lea.
>
>Thanks in advance,
>Roger
>--
>Roger Sayle
>NextMove Software
>Cambridge, UK



[PATCH] middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost.

2020-08-09 Thread Roger Sayle

This patch fixes a subtle bug in the depths of GCC's synth_mult,
where the middle-end queries whether (how well) the target supports
widening and highpart multiplications by calling targetm.rtx_costs.
The code in init_expmed and init_expmed_one_mode iterates over various
RTL patterns querying the cost of each.  To avoid generating & garbage
collecting too much junk, it reuses the same RTL over and over, but
adjusting the modes between each call.

Alas this reuse of state is a little fragile, and at some point a
change to init_expmed_one_conv has resulted in the state (mode of
a register) being changed, but not reset before being used again.

The fallout is that GCC currently queries the backend for the cost
of non-sense RTL such as:

(mult:HI (zero_extend:HI (reg:TI 82))
(zero_extend:HI (reg:TI 82)))

and

(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
(zero_extend:HI (reg:TI 82)))
(const_int 8 [0x8]))

The fix is to set the mode of the register back to its assumed
state, as (reg:QI 82) in the above patterns makes much more sense.

Using the old software engineering/defensive programming maxim of
"why fix a bug just once, if it can be fixed in multiple places",
this patch both restores the original value in init_expmed_one_conv,
and also sets it to the expected value in init_expmed_one_mode.
This should hopefully signal the need to be careful of invariants for
anyone modifying this code in future.

Alas things are rarely simple...  Fixing this obviously incorrect
logic causes a failure of gcc.target/i386/pr71321.c that tests for
a particular expansion from synth_mult.  The issue here is that this
test is checking for a specific multiplication expansion, when it
should really be checking that we don't generate the inefficient
"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
reported in the PR target/71321.  Now that we use correct costs,
GCC uses a multiply instruction matching icc, LLVM and GCC prior
to v4.8.  I've even microbenchmarked this function (over several
minutes) with (disappointingly) no difference in timings.  Three
dependent leas has 3-cycle latency, exactly the same as a widening
byte multiply (on Haswell), so the shorter code splits the tie.
[I have a follow-up patch that may improve things further].

Before:
movzbl  %dil, %eax
leal(%rax,%rax,4), %edx
leal(%rax,%rdx,8), %edx
leal(%rdx,%rdx,4), %edx
shrw$11, %dx
leal(%rdx,%rdx,4), %eax
...

After:
movl$-51, %edx
movl%edx, %eax
mulb%dil
movl%eax, %edx
shrw$11, %dx
leal(%rdx,%rdx,4), %eax
...


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.
Ok for mainline?


2020-08-09  Roger Sayle  

gcc/ChangeLog
* expmed.c (init_expmed_one_conv): Restore all->reg's mode.
(init_expmed_one_mode): Set all->reg to desired mode.

gcc/testsuite/ChangeLog
PR target/71321
* gcc.target/i386/pr71321.c: Check that the code doesn't use
the 4B zero displacement lea, not that it uses lea.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3d2d234..d34f0fb 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -155,6 +155,8 @@ init_expmed_one_conv (struct init_expmed_rtl *all, 
scalar_int_mode to_mode,
   PUT_MODE (all->reg, from_mode);
   set_convert_cost (to_mode, from_mode, speed,
set_src_cost (which, to_mode, speed));
+  /* Restore all->reg's mode.  */
+  PUT_MODE (all->reg, to_mode);
 }
 
 static void
@@ -229,6 +231,7 @@ init_expmed_one_mode (struct init_expmed_rtl *all,
   if (GET_MODE_CLASS (int_mode_to) == MODE_INT
  && GET_MODE_WIDER_MODE (int_mode_to).exists (_mode))
{
+ PUT_MODE (all->reg, mode);
  PUT_MODE (all->zext, wider_mode);
  PUT_MODE (all->wide_mult, wider_mode);
  PUT_MODE (all->wide_lshr, wider_mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c 
b/gcc/testsuite/gcc.target/i386/pr71321.c
index 86ad812..24d144b 100644
--- a/gcc/testsuite/gcc.target/i386/pr71321.c
+++ b/gcc/testsuite/gcc.target/i386/pr71321.c
@@ -12,5 +12,4 @@ unsigned cvt_to_2digit_ascii(uint8_t i)
 {
   return cvt_to_2digit(i, 10) + 0x0a3030;
 }
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 
} } */
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 
} } */
+/* { dg-final { scan-assembler-not "lea.*0" } } */


Re: [Patch, fortran] PR96102 - ICE in check_host_association, at fortran/resolve.c:5994

2020-08-09 Thread Thomas Koenig via Gcc-patches

Hi Paul,


Dominique has kindly pointed out that the error message in the patch does
not correspond to the errors in the testcase. This came about because the
submitted patch was an earlier version of that on my tree. Please find
attached the correct version of the patch. The principle is the same but
the error is different... if you see what I mean :-)


OK for master.

Best regards

Thomas


Re: [Patch, fortran] PR96312 - [10/11 Regression] Reallocation on assignment uses undefined variables

2020-08-09 Thread Thomas Koenig via Gcc-patches

Hi Paul,


Regtests OK on FC31/x86_64 - OK for master?


OK. This bug is also exposed by a matmul optimization
that I introduced quite a long time ago, so a backport
would also be good (I can do that if you prefer).

Best regards

Thomas



[Patch, fortran] PR96312 - [10/11 Regression] Reallocation on assignment uses undefined variables

2020-08-09 Thread Paul Richard Thomas via Gcc-patches
Hi All,

trans-expr.c(fcncall_realloc_result) unconditionally compared the shapes of
the function result and the lhs. This is clearly wrong when the lhs is not
allocated since the bounds could be used uninitialized as found by the
reporter. The patch does the obvious thing by checking the allocation
status before doing the comparison.

Regtests OK on FC31/x86_64 - OK for master?

Paul

This patch fixes PR96312. Cures a used uninitialized warning.

2020-08-9  Paul Thomas  

gcc/fortran
PR fortran/96312
* trans-expr.c (fcncall_realloc_result): Only compare shapes if
lhs was allocated..

gcc/testsuite/
PR fortran/96312
* gfortran.dg/pr96312.f90: New test.
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index b7c568e90e6..36ff9b5cbc6 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -9936,6 +9936,8 @@ fcncall_realloc_result (gfc_se *se, int rank)
   tree tmp;
   tree offset;
   tree zero_cond;
+  tree not_same_shape;
+  stmtblock_t shape_block;
   int n;
 
   /* Use the allocation done by the library.  Substitute the lhs
@@ -9965,7 +9967,11 @@ fcncall_realloc_result (gfc_se *se, int rank)
   tmp = gfc_conv_descriptor_data_get (res_desc);
   gfc_conv_descriptor_data_set (>post, desc, tmp);
 
-  /* Check that the shapes are the same between lhs and expression.  */
+  /* Check that the shapes are the same between lhs and expression.
+ The evaluation of the shape is done in 'shape_block' to avoid
+ unitialized warnings from the lhs bounds. */
+  not_same_shape = boolean_false_node;
+  gfc_start_block (_block);
   for (n = 0 ; n < rank; n++)
 {
   tree tmp1;
@@ -9982,15 +9988,24 @@ fcncall_realloc_result (gfc_se *se, int rank)
   tmp = fold_build2_loc (input_location, NE_EXPR,
 			 logical_type_node, tmp,
 			 gfc_index_zero_node);
-  tmp = gfc_evaluate_now (tmp, >post);
-  zero_cond = fold_build2_loc (input_location, TRUTH_OR_EXPR,
-   logical_type_node, tmp,
-   zero_cond);
+  tmp = gfc_evaluate_now (tmp, _block);
+  if (n == 0)
+	not_same_shape = tmp;
+  else
+	not_same_shape = fold_build2_loc (input_location, TRUTH_OR_EXPR,
+	  logical_type_node, tmp,
+	  not_same_shape);
 }
 
   /* 'zero_cond' being true is equal to lhs not being allocated or the
  shapes being different.  */
-  zero_cond = gfc_evaluate_now (zero_cond, >post);
+  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR, logical_type_node,
+			 zero_cond, not_same_shape);
+  gfc_add_modify (_block, zero_cond, tmp);
+  tmp = gfc_finish_block (_block);
+  tmp = build3_v (COND_EXPR, zero_cond,
+		  build_empty_stmt (input_location), tmp);
+  gfc_add_expr_to_block (>post, tmp);
 
   /* Now reset the bounds returned from the function call to bounds based
  on the lhs lbounds, except where the lhs is not allocated or the shapes
! { dg-do compile }
! { dg-options "-O1 -Wall" }
!
! PR fortran/96312. The line with the call to 'matmul' gave the warning
! ‘tmp.dim[0].lbound’ is used uninitialized in this function
!
! Contributed by Thomas Koenig  
!
module moda
contains
   PURE SUBROUTINE funca(arr, sz)
  REAL, ALLOCATABLE, DIMENSION(:, :), INTENT(OUT) :: arr
  integer, intent(in) :: sz
  allocate(arr(sz, sz))
  arr(:, :) = 0.
   END SUBROUTINE
end module

module modc
use moda, only: funca
contains
   PURE SUBROUTINE funcb(oarr)
  REAL, DIMENSION(:), INTENT(OUT):: oarr
  REAL, ALLOCATABLE, DIMENSION(:, :) :: arr
  real, allocatable, dimension(:) :: tmp
  CALL funca(arr, ubound(oarr, 1))
  tmp = matmul(transpose(arr),oarr)
  oarr = tmp*1.
   END SUBROUTINE funcb
end module


[pushed] testsuite, Darwin: XFAIL runs for two timode conversion tests.

2020-08-09 Thread Iain Sandoe
Hi,

X86 Darwin fails two TI mode conversions at present, because (to work
around PR80556) we insert libSystem ahead of libgcc.  The libSystem
implementation has a similar bug to one that was fixed for GCC.  We need
to fix 80556 properly, and then this issue will go away - we will be able
to use the libgcc impl as intended.

XFAIL the run for now, to reduce testsuite noise.

tested on x86_64 darwin and linux
applied to master,
thanks
Iain

gcc/testsuite/ChangeLog:

* gcc.dg/torture/fp-int-convert-timode-3.c: XFAIL run.
* gcc.dg/torture/fp-int-convert-timode-4.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c | 1 +
 gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c 
b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
index 707d539335f..10702302bf8 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
@@ -4,6 +4,7 @@
 /* { dg-require-effective-target int128 } */
 /* { dg-require-effective-target fenv } */
 /* { dg-options "-frounding-math" } */
+/* { dg-xfail-run-if "see PR80556 c63" { x86_64-*-darwin* i?86-*-darwin* } { 
"*" } { "" } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c 
b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
index 09600f90903..3facf32fb8b 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
@@ -4,6 +4,7 @@
 /* { dg-require-effective-target int128 } */
 /* { dg-require-effective-target fenv } */
 /* { dg-options "-frounding-math" } */
+/* { dg-xfail-run-if "see PR80556 c63" { x86_64-*-darwin* i?86-*-darwin* } { 
"*" } { "" } } */
 
 #include 
 #include 
-- 
2.24.1