Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-05-05 Thread Feng Xue OS
Hi Richard,


   Since gcc 9 has been released, will you get some time to take a look at this 
patch? Thanks.


Feng


From: Richard Biener 
Sent: Tuesday, March 12, 2019 4:31:49 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR 
tree-optimization/89134)

On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS  wrote:
>
> This patch is composed to implement a loop transformation on one of its 
> conditional statements, which we call it semi-invariant, in that its 
> computation is impacted in only one of its branches.
>
> Suppose a loop as:
>
> void f (std::map m)
> {
> for (auto it = m.begin (); it != m.end (); ++it) {
> /* if (b) is semi-invariant. */
> if (b) {
> b = do_something();/* Has effect on b */
> } else {
> /* No effect on b */
> }
> statements;  /* Also no effect on b */
> }
> }
>
> A transformation, kind of loop split, could be:
>
> void f (std::map m)
> {
> for (auto it = m.begin (); it != m.end (); ++it) {
> if (b) {
> b = do_something();
> } else {
> ++it;
> statements;
> break;
> }
> statements;
> }
>
> for (; it != m.end (); ++it) {
> statements;
> }
> }
>
> If "statements" contains nothing, the second loop becomes an empty one, which 
> can be removed. (This part will be given in another patch). And if 
> "statements" are straight line instructions, we get an opportunity to 
> vectorize the second loop. In practice, this optimization is found to improve 
> some real application by %7.
>
> Since it is just a kind of loop split, the codes are mainly placed in 
> existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and 
> is enabled with -O3.

Note the transform itself is jump-threading with the threading
duplicating a whole CFG cycle.

I didn't look at the patch details yet since this is suitable for GCC 10 only.

Thanks for implementing this.
Richard.

> Feng
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 64bf6017d16..a6c2878d652 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-03-12  Feng Xue 
> +
> +   PR tree-optimization/89134
> +* doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +   (min-cond-loop-split-prob): Likewise.
> +   * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +   * passes.def (pass_cond_loop_split) : New pass.
> +   * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +   * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +   * tree-ssa-loop-split.c (split_info): New class.
> +   (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +   (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +   (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> +   (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +   (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +   (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +   (pass_data_cond_loop_split): New variable.
> +   (pass_cond_loop_split): New class.
> +   (make_pass_cond_loop_split): New function.
> +
>  2019-03-11  Jakub Jelinek  
>
> PR middle-end/89655
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index df0883f2fc9..f5e09bd71fd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.
> +
> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 3f1576448be..2e067526958 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> "The maximum number of unswitchings in a single loop.",
> 3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +   "max-cond-loop-split-insns",
> +   "The maximum number of insns to be increased due to loop split on 
> semi-invariant condition statement.",
> +  

Re: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts

2019-05-05 Thread Bin.Cheng
On Sun, May 5, 2019 at 2:02 PM Kewen.Lin  wrote:
>
> on 2019/5/5 下午12:04, Bin.Cheng wrote:
> > On Sun, May 5, 2019 at 11:23 AM Kewen.Lin  wrote:
>  +  /* Some compare iv_use is probably useless once the doloop 
>  optimization
>  + performs.  */
>  +  if (tailor_cmp_p)
>  +tailor_cmp_uses (data);
> >>> Function tailor_cmp_uses sets iv_use->zero_cost_p under some
> >>> conditions.  Once the flag is set, though the iv_use participates cost
> >>> computation in determine_group_iv_cost_cond, the result cost is
> >>> hard-wired to ZERO (which means cost computation for such iv_use is
> >>> waste of time).
> >>
> >> Yes, it can be improved by some early check and return.
> >> But it's still helpful to make it call with may_eliminate_iv.
> >> gcc.dg/no-strict-overflow-6.c is one example, with may_eliminate_iv
> >> consideration it exposes more opportunities for downstream optimization.
> > Hmm, I wasn't suggesting early check and return, on the contrary, we
> > can better integrate doloop/cost stuff in the overall model.  See more
> > in following comments.
>
> Sorry, I didn't claim it clearly, the previous comment is to claim the
> call with may_eliminate_iv is not completely "waste of time", and early
> return can make it save time.  :)
>
> And yes, it's not an issue any more with your proposed idea.
>
> >>
> >>> Also iv_use rewriting process is skipped for related
> >>> ivs preserved explicitly by preserve_ivs_for_use.
> >>> Note IVOPTs already adds candidate for original ivs.  So why not
> >>> detecting doloop iv_use, adjust its cost with the corresponding
> >>> original iv candidate, then let the general cost model make the
> >>> decision?  I believe this better fits existing infrastructure and
> >>> requires less change, for example, preserve_ivs_for_use won't be
> >>> necessary.
> >> I agree adjusting the cost of original iv candidate of the iv_use
> >> requires less change, but on one hand, I thought to remove interest
> >> cmp iv use or make it zero cost is close to the fact.  Since it's
> >> eliminated eventually in doloop optimization, it should not
> >> considered in cost modeling.  This way looks more exact.
> > Whether doloop transformation should be considered or be bypassed in
> > cost model isn't a problem.  Actually we can bind doloop iv_cand to
> > cmp iv_use in order to force the transformation. My concern is the
> > patch specially handles doloop by setting the special flag, then
> > checking it.  We generally avoid such special-case handling since it
> > hurts long time maintenance.  The pass was very unstable in the pass
> > because of such issues.
> >
>
> OK, I understand your concerns now. Thanks for explanation!
>
> >> One the other hand, I assumed your suggestion is to increase the
> >> cost for the pair (original iv cand, cmp iv use), the increase cost
> >> seems to be a heuristic value?  It seems it's possible to sacrifice
> > Decrease the cost so that the iv_cand is preferred?  The comment
> > wasn't made on top of implementing doloop in ivopts.  Anyway, you can
> > still bypass cost model by binding the "correct" iv_cand to cmp
> > iv_use.
> >
>
> To decrease the cost isn't workable for this case, it make the original
> iv cand is preferred more and over the other iv cand for memory iv use,
> then the desirable memory based iv cand won't be chosen.
> If increase the cost, one basic iv cand is chosen for cmp use, memory
> based iv cand is chosen for memory use, instead of original iv for both.
Sorry for the mistake, I meant to decrease use cost of whatever "correct"
iv_cand for cmp iv_use that could enable doloop optimization, it doesn't
has to the original iv_cand.

>
> Could you help to explain the "binding" more?  Does it mean cost modeling
> decision making can simply bypass the cmp iv_use (we have artificially
> assigned one "correct" cand iv to it) and also doesn't count the "correct"
> iv cand cost into total iv cost? Is my understanding correct?
For example, if the heuristic finds out the "correct" doloop iv_cand, we can
force choosing that iv_cand for cmp iv_use and bypass the candidate choosing
algorithm:
struct iv_group {
  //...
  struct iv_cand *bind_cand;
};
then set this bind_cand directly in struct iv_ca::cand_for_group.  As a result,
iv_use rewriting code takes care of everything, no special handling (such as
preserve_ivs_for_use) is needed.

Whether letting cost model decide the "correct" iv_cand or bind it by yourself
depends on how good your heuristic check is.  It's your call. :)

>
> >>> tuning;  2) the doloop decision can still be canceled by cost model if
> >>> it's really not beneficial.  With current patch, it can't be undo once
> >>> the decision is made (at very early stage in IVOPTs.).
> >>
> >> I can't really follow this.  If it's predicted to be transformed to doloop,
> >> I think it should not be undoed any more, since it's useless to consider
> >> this cmp iv use. Whatever IVOPTS does, the comp at loop closing should not
> 

Re: A bug in vrp_meet?

2019-05-05 Thread Eric Botcazou
> I have now applied this variant.

You backported it onto the 8 branch on Friday:

2019-05-03  Richard Biener  

Backport from mainline
[...]
2019-03-07  Richard Biener  

PR tree-optimization/89595
* tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
stmt iterator as reference, take boolean output parameter to
indicate whether the stmt was removed and thus the iterator
already advanced.
(dom_opt_dom_walker::before_dom_children): Re-iterate over
stmts created by folding.

and this introduced a regression for the attached Ada testcase at -O:

Program received signal SIGSEGV, Segmentation fault.
0x0102173c in set_value_range (
vr=0x17747a0 , t=VR_RANGE, min=0x76c3df78, max=, equiv=0x0)
at /home/eric/svn/gcc-8-branch/gcc/tree-vrp.c:298
298   vr->type = t;

on x86-64 at least.  Mainline and 9 branch are not affected.

-- 
Eric Botcazoupackage Opt78 is

   subtype Reasonable is Integer range 1..10;

   type UC (D: Reasonable := 2) is record
  S: String (1 .. D) := "Hi";
   end record;

   type AUC is access all UC;

   procedure Proc (P : UC; Msg : String);

end Opt78;
-- { dg-do compile }
-- { dg-options "-O" }

package body Opt78 is

   procedure Proc (P : UC; Msg : String) is
  Default : UC := (1, "!");
   begin
  if P = Default then
 raise Program_Error;
  else
 raise Constraint_Error;
  end if;
   end;

end Opt78;


New Swedish PO file for 'gcc' (version 9.1-b20190414)

2019-05-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-9.1-b20190414.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts

2019-05-05 Thread Segher Boessenkool
On Sun, May 05, 2019 at 12:04:00PM +0800, Bin.Cheng wrote:
> On Sun, May 5, 2019 at 11:23 AM Kewen.Lin  wrote:
> > I can't really follow this.  If it's predicted to be transformed to doloop,
> > I think it should not be undoed any more, since it's useless to consider
> > this cmp iv use. Whatever IVOPTS does, the comp at loop closing should not
> > be changed (although possible to use other iv), right?  Do I miss something?
> As mentioned, the previous comment wasn't made on top of implementing
> doloop in ivopts.  That would be nice but a different story.
> Before we can do that, it'd better be conservative and only makes
> (doloop) decision in ivopts when you are sure.  As you mentioned, it's
> hard to do the same checks at gimple as RTL, right?  In this case,
> making it a (conservative) heuristic capturing certain beneficial
> cases sounds better than capturing all cases but fail in later RTL
> passes.

But not *overly* conservative.  If some situation almost never happens,
it's better to have ivopts guess wrong some of the time than it is to
just optimise less aggressively.  If ivopts makes a non-optimal decision
you still end up with valid code :-)


Segher


[patch, committed] Fix gcc-7 regression in front-end optimization

2019-05-05 Thread Thomas Koenig

Hello world,

I have just committed the attached patch as obvious after 
regresson-testing - it fixed a rare beast, a gcc 7.4-only regression.


I have also committed the test case to trunk, to make sure that
this does not re-break.  No real need to commit to the other
branches, I think.

Regards

Thomas

2019-05-05  Thomas Koenig  

PR fortran/90344
* frontend-passes.c (create_var): Bring into sync with gcc 8.

2019-05-05  Thomas Koenig 

PR fortran/90344
* gfortran.dg/pr90344.f90: New test.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 270881)
+++ frontend-passes.c	(Arbeitskopie)
@@ -701,6 +701,11 @@ create_var (gfc_expr * e, const char *vname)
   if (e->expr_type == EXPR_CONSTANT || is_fe_temp (e))
 return gfc_copy_expr (e);
 
+  /* Creation of an array of unknown size requires realloc on assignment.
+ If that is not possible, just return NULL.  */
+  if (flag_realloc_lhs == 0 && e->rank > 0 && e->shape == NULL)
+return NULL;
+
   ns = insert_block ();
 
   if (vname)
@@ -748,7 +753,7 @@ create_var (gfc_expr * e, const char *vname)
 }
 
   deferred = 0;
-  if (e->ts.type == BT_CHARACTER && e->rank == 0)
+  if (e->ts.type == BT_CHARACTER)
 {
   gfc_expr *length;
 
@@ -759,6 +764,8 @@ create_var (gfc_expr * e, const char *vname)
   else
 	{
 	  symbol->attr.allocatable = 1;
+	  symbol->ts.u.cl->length = NULL;
+	  symbol->ts.deferred = 1;
 	  deferred = 1;
 	}
 }
@@ -771,7 +778,7 @@ create_var (gfc_expr * e, const char *vname)
 
   result = gfc_get_expr ();
   result->expr_type = EXPR_VARIABLE;
-  result->ts = e->ts;
+  result->ts = symbol->ts;
   result->ts.deferred = deferred;
   result->rank = e->rank;
   result->shape = gfc_copy_shape (e->shape, e->rank);
! { dg-do compile }
! { dg-additional-options "-ffrontend-optimize" }
! PR 90344 - this used to ICE.
! Test case by Urban Jost.
module M_xterm
contains
   elemental function func1(ch) result(res)
  character,intent(in) :: ch
  logical  :: res
  res=.true.
   end function func1
   elemental function func2(ch) result(res)
  character,intent(in) :: ch
  logical  :: res
  res=.false.
   end function func2
   pure function s2a(string)  RESULT (array)
  character(len=*),intent(in) :: string
  character(len=1):: array(len(string))
  forall(i=1:len(string)) array(i) = string(i:i)
   end function s2a
   subroutine sub1()
  write(*,*)all(func1(s2a('ABCDEFG')).or.func2(s2a('ABCDEFG')))
   end subroutine sub1
end module M_xterm


Re: [PATCH] Fix a typo in two_value_replacement function

2019-05-05 Thread Jakub Jelinek
On Sun, May 05, 2019 at 01:31:12AM -0500, Li Jia He wrote:
> GCC revision 267634 implemented two_value_replacement function.
> However, a typo occurred during the parameter check, which caused
> us to miss some optimizations.

Thanks for catching this.

> The regression testing for the patch was done on GCC mainline on
> 
> powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regressions.  Is it OK for trunk and backport to gcc 9 ?

Ok for both, but see below.

> gcc/ChangeLog
> 
> 2019-05-05  Li Jia He  
> 
>   * tree-ssa-phiopt.c (two_value_replacement):
>   Fix a typo in parameter detection.

Don't break a line after :, only when you reach 80 columns.  So:
* tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter
detection.

>   * gcc.dg/pr88676.c: Modify the include header file.
>   * gcc.dg/tree-ssa/pr37508.c: Add the no-ssa-phiopt option to
>   skip phi optimization.
>   * gcc.dg/tree-ssa/pr88676.c: Rename to ...
>   * gcc.dg/tree-ssa/pr88676-1.c: ... this new file.
>   * gcc.dg/tree-ssa/pr88676-2.c: New testcase.

Please don't rename tests unless really necessary.  Just keep pr88676.c
and add pr88676-2.c next to it.

> --- a/gcc/testsuite/gcc.dg/pr88676.c
> +++ b/gcc/testsuite/gcc.dg/pr88676.c
> @@ -2,7 +2,7 @@
>  /* { dg-do run } */
>  /* { dg-options "-O2" } */
>  
> -#include "tree-ssa/pr88676.c"
> +#include "tree-ssa/pr88676-1.c"
>  
>  __attribute__((noipa)) void
>  bar (int x, int y, int z)

Thus remove this hunk.

> rename from gcc/testsuite/gcc.dg/tree-ssa/pr88676.c
> rename to gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c

And this one.

Jakub


[PATCH] Fix a typo in two_value_replacement function

2019-05-05 Thread Li Jia He
Hi,

GCC revision 267634 implemented two_value_replacement function.
However, a typo occurred during the parameter check, which caused
us to miss some optimizations.

The intent of the code might be to check that the input parameters
are const int and their difference is one.  However, when I read
the code, I found that it is wrong to detect whether an input data
plus one is equal to itself.  This could be a typo.

The regression testing for the patch was done on GCC mainline on

powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.  Is it OK for trunk and backport to gcc 9 ?

Thanks,
Lijia He

gcc/ChangeLog

2019-05-05  Li Jia He  

* tree-ssa-phiopt.c (two_value_replacement):
Fix a typo in parameter detection.

gcc/testsuite/ChangeLog

2019-05-05  Li Jia He  

* gcc.dg/pr88676.c: Modify the include header file.
* gcc.dg/tree-ssa/pr37508.c: Add the no-ssa-phiopt option to
skip phi optimization.
* gcc.dg/tree-ssa/pr88676.c: Rename to ...
* gcc.dg/tree-ssa/pr88676-1.c: ... this new file.
* gcc.dg/tree-ssa/pr88676-2.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr88676.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr37508.c   |  6 ++--
 .../tree-ssa/{pr88676.c => pr88676-1.c}   |  0
 gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c | 30 +++
 gcc/tree-ssa-phiopt.c |  2 +-
 5 files changed, 35 insertions(+), 5 deletions(-)
 rename gcc/testsuite/gcc.dg/tree-ssa/{pr88676.c => pr88676-1.c} (100%)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c

diff --git a/gcc/testsuite/gcc.dg/pr88676.c b/gcc/testsuite/gcc.dg/pr88676.c
index b5fdd9342b8..719208083ae 100644
--- a/gcc/testsuite/gcc.dg/pr88676.c
+++ b/gcc/testsuite/gcc.dg/pr88676.c
@@ -2,7 +2,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 
-#include "tree-ssa/pr88676.c"
+#include "tree-ssa/pr88676-1.c"
 
 __attribute__((noipa)) void
 bar (int x, int y, int z)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
index 2ba09afe481..a6def045de4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr37508.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fno-ssa-phiopt -fno-tree-fre -fdump-tree-vrp1" } */
 
 struct foo1 {
   int i:1;
@@ -22,7 +22,7 @@ int test2 (struct foo2 *x)
 {
   if (x->i == 0)
 return 1;
-  else if (x->i == -1) /* This test is already folded to false by ccp1.  */
+  else if (x->i == -1) /* This test is already optimized by ccp1 or phiopt1.  
*/
 return 1;
   return 0;
 }
@@ -31,7 +31,7 @@ int test3 (struct foo1 *x)
 {
   if (x->i == 0)
 return 1;
-  else if (x->i == 1) /* This test is already folded to false by fold.  */
+  else if (x->i == 1) /* This test is already optimized by ccp1 or phiopt1.  */
 return 1;
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/tree-ssa/pr88676.c
rename to gcc/testsuite/gcc.dg/tree-ssa/pr88676-1.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
new file mode 100644
index 000..d377948e14d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88676-2.c
@@ -0,0 +1,30 @@
+/* PR tree-optimization/88676 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not " = PHI <" "optimized" } } */
+
+struct foo1 {
+  int i:1;
+};
+struct foo2 {
+  unsigned i:1;
+};
+
+int test1 (struct foo1 *x)
+{
+  if (x->i == 0)
+return 1;
+  else if (x->i == 1)
+return 1;
+  return 0;
+}
+
+int test2 (struct foo2 *x)
+{
+  if (x->i == 0)
+return 1;
+  else if (x->i == -1)
+return 1;
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 219791ea4ba..90674a2f3c4 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -602,7 +602,7 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   || TREE_CODE (arg1) != INTEGER_CST
   || (tree_int_cst_lt (arg0, arg1)
  ? wi::to_widest (arg0) + 1 != wi::to_widest (arg1)
- : wi::to_widest (arg1) + 1 != wi::to_widest (arg1)))
+ : wi::to_widest (arg1) + 1 != wi::to_widest (arg0)))
 return false;
 
   if (!empty_block_p (middle_bb))
-- 
2.17.1



[PING] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref

2019-05-05 Thread Kewen.Lin
Hi,

I'd like to gentle ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html

OK for trunk now?

Thanks!

on 2019/3/20 上午11:14, Kewen.Lin wrote:
> Hi,
> 
> Please refer to below link for previous threads.
> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
> 
> Comparing to patch v2, I've moved up the vector operation target 
> check upward together with vector type target check.  Besides, I
> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
> testcases' requirements and options for robustness.
> 
> Is it OK for GCC10?
> 
> 
> gcc/ChangeLog
> 
> 2019-03-20  Kewen Lin  
> 
>   PR target/88497
>   * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
>   GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
>   function undistribute_bitref_for_vector.
>   (undistribute_bitref_for_vector): New function.
>   (cleanup_vinfo_map): Likewise.
>   (unsigned_cmp): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-03-20  Kewen Lin  
> 
>   * gcc.dg/tree-ssa/pr88497-1.c: New test.
>   * gcc.dg/tree-ssa/pr88497-2.c: Likewise.
>   * gcc.dg/tree-ssa/pr88497-3.c: Likewise.
>   * gcc.dg/tree-ssa/pr88497-4.c: Likewise.
>   * gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 
>  gcc/tree-ssa-reassoc.c| 306 
> +-
>  6 files changed, 477 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> new file mode 100644
> index 000..99c9af8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } 
> } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { 
> powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref summation.
> +
> +   arg1 and arg2 are two arrays whose elements of type vector double.
> +   Assuming:
> + A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
> + B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
> +
> +   Then:
> + V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
> +
> +   reassoc transforms
> +
> + accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
> +   + V3[0] + V3[1];
> +
> +   into:
> +
> + T = V0 + V1 + V2 + V3
> + accumulator += T[0] + T[1];
> +
> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
> +
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> +  v2df temp;
> +  temp = arg1[0] * arg2[0];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[1] * arg2[1];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[2] * arg2[2];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[3] * arg2[3];
> +  accumulator += temp[0] + temp[1];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { 
> powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> new file mode 100644
> index 000..61ed0bf5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* 
> } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { 
> powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
> +
> +   v1, v2, v3, v4 of type vector float.
> +
> +   reassoc transforms
> +
> + accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
> +v2[0] * v2[1] * v2[2] * v2[3] *
> +v3[0] * v3[1] * v3[2] * v3[3] *
> +v4[0] * v4[1] * v4[2] * v4[3] ;
> +
> +   into:
> +
> + T = v1 * v2 * v3 * v4;
> + accumulator *= T[0] * T[1] * T[2] * T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef float v4si __attribute__((vector_size(16)));
> +float 

Re: [PATCH PR90240][RFC]Avoid scaling cost overflow by introducing scaling bound

2019-05-05 Thread bin.cheng
Hmm, mis-attached the old version patch.  Here is the updated one.

Thanks,
bin

--
Sender:bin.cheng 
Sent At:2019 May 5 (Sun.) 13:54
Recipient:Richard Biener 
Cc:GCC Patches 
Subject:Re: [PATCH PR90240][RFC]Avoid scaling cost overflow by introducing 
scaling bound


> --
> Sender:Richard Biener 
> Sent At:2019 Apr. 29 (Mon.) 20:01
> Recipient:bin.cheng 
> Cc:GCC Patches ; mliska 
> Subject:Re: [PATCH PR90240][RFC]Avoid scaling cost overflow by introducing 
> scaling bound
> 
> 
> On Sat, Apr 27, 2019 at 6:13 AM bin.cheng  wrote:
> >
> > > Hi,
> >
> > This is the draft patch avoiding scaling cost overflow by introducing a 
> > scaling bound
> > in IVOPTs.  For now the bound is 20, and scaling factor will be further 
> > scaled wrto
> > this bound.  For example, scaling factor like 1, 1000, 2000(max) would be 
> > scaled to
> > 1, 10, 20 correspondingly.
> >
> > HI Martin, I remember you introduced comp_cost/cost_scaling to improve one 
> > case
> > in spec2017.  Unfortunately I don't have access to the benchmark now, could 
> > you help
> > verify that if this patch has performance issue on it please?  Thanks
> >
> > Bootstrap and test on x86_64, and this is for further comment/discussion.
> 
> + float factor = 1.0f * bfreq / lfreq;
> + if (adjust_factor_p)
> +   {
> + factor *= COST_SCALING_FACTOR_BOUND;
> + factor = factor * lfreq / max_freq;
> +   }
> + body[i]->aux = (void *)(intptr_t)(int) factor;
> 
> float computations on the host shouldn't be done.
Fixed.
> 
> I wonder if changing comp_cost::cost to int64_t would help instead?  See also 
> my
> suggestion to use gmp.
Next patch for PR90078 changes the cost to int64_t.
> 
> Otherwise the approach looks sane to me - can we then even assert that
> no overflows
> happen and thus INFTY only appears if we manually set such cost?
Next patch for PR90078 changes to assert on cost overflow.

Attachment is the updated patch.  Bootstrap/test on x86_64 along with PR90078 
patch.
Is it OK?

Thanks,
bin
2019-05-05  Bin Cheng  

PR tree-optimization/90240
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Scale cost
with respect to scaling factor pre-computed for each basic block.
(try_improve_iv_set): Return bool if best_cost equals to iv_ca cost.
(find_optimal_iv_set_1): Free iv_ca set if it has infinite_cost.
(COST_SCALING_FACTOR_BOUND, determine_scaling_factor): New.
(tree_ssa_iv_optimize_loop): Call determine_scaling_factor.  Extend
live range for array of loop's basic blocks.  Cleanup aux field of
loop's basic blocks.

2018-05-05  Bin Cheng  

PR tree-optimization/90240
* gfortran.dg/graphite/pr90240.f: New test.

0001-pr90240.patch
Description: Binary data


Re: [PATCH PR90078]Capping comp_cost computation in ivopts

2019-05-05 Thread bin.cheng
> --
> Sender:Jakub Jelinek 
> Sent At:2019 Apr. 17 (Wed.) 19:27
> Recipient:Bin.Cheng 
> Cc:bin.cheng ; GCC Patches 
> 
> Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> 
> 
> On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > As
> > > #define INFTY 1000
> > > what is the reason to keep the previous condition as well?
> > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > cost1.cost + cost2.cost >= INFTY too.
> > > Unless costs can go negative.
> > It's a bit complicated, but in general, costs can go negative.
> 
> Ok, no objections from me then (but as I don't know anything about it,
> not an ack either; you are ivopts maintainer, so you don't need one).

Hi,
The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
fixed by another patch.  This is the updated patch for PR90078.  It promotes 
type
of ivopts cost from int to int64_t, as well as change behavior of infinite_cost 
overflow
from saturation to assert.
Please note, implicit conversions are kept in cost computation as before without
introducing any narrowing.

Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?

Thanks,
bin
2019-05-05  Bin Cheng  

PR tree-optimization/90078
* tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
(INFTY): Increase the value for infinite cost.
(struct comp_cost): Promote type of members to int64_t.
(infinite_cost): Don't set complexity in initialization.
(comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
overflows to infinite_cost.
(adjust_setup_cost): Promote type of parameter and cost computation
to int64_t.
(struct ainc_cost_data, struct iv_ca): Promote type of member to
int64_t.
(get_scaled_computation_cost_at, determine_iv_cost): Promote type of
cost computation to int64_t.
(determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
int64_t's format specifier in dump.

2018-05-05  Bin Cheng  

PR tree-optimization/90078
* g++.dg/tree-ssa/pr90078.C: New test.

0002-Fix-pr90078-by-promoting-type-of-IVOPTs-cost-to-int6.patch
Description: Binary data


Re: [PATCH, RFC, rs6000] PR80791 Consider doloop in ivopts

2019-05-05 Thread Kewen.Lin
on 2019/5/5 下午12:04, Bin.Cheng wrote:
> On Sun, May 5, 2019 at 11:23 AM Kewen.Lin  wrote:
 +  /* Some compare iv_use is probably useless once the doloop optimization
 + performs.  */
 +  if (tailor_cmp_p)
 +tailor_cmp_uses (data);
>>> Function tailor_cmp_uses sets iv_use->zero_cost_p under some
>>> conditions.  Once the flag is set, though the iv_use participates cost
>>> computation in determine_group_iv_cost_cond, the result cost is
>>> hard-wired to ZERO (which means cost computation for such iv_use is
>>> waste of time).
>>
>> Yes, it can be improved by some early check and return.
>> But it's still helpful to make it call with may_eliminate_iv.
>> gcc.dg/no-strict-overflow-6.c is one example, with may_eliminate_iv
>> consideration it exposes more opportunities for downstream optimization.
> Hmm, I wasn't suggesting early check and return, on the contrary, we
> can better integrate doloop/cost stuff in the overall model.  See more
> in following comments.

Sorry, I didn't claim it clearly, the previous comment is to claim the 
call with may_eliminate_iv is not completely "waste of time", and early
return can make it save time.  :)

And yes, it's not an issue any more with your proposed idea.

>>
>>> Also iv_use rewriting process is skipped for related
>>> ivs preserved explicitly by preserve_ivs_for_use.
>>> Note IVOPTs already adds candidate for original ivs.  So why not
>>> detecting doloop iv_use, adjust its cost with the corresponding
>>> original iv candidate, then let the general cost model make the
>>> decision?  I believe this better fits existing infrastructure and
>>> requires less change, for example, preserve_ivs_for_use won't be
>>> necessary.
>> I agree adjusting the cost of original iv candidate of the iv_use
>> requires less change, but on one hand, I thought to remove interest
>> cmp iv use or make it zero cost is close to the fact.  Since it's
>> eliminated eventually in doloop optimization, it should not
>> considered in cost modeling.  This way looks more exact.
> Whether doloop transformation should be considered or be bypassed in
> cost model isn't a problem.  Actually we can bind doloop iv_cand to
> cmp iv_use in order to force the transformation. My concern is the
> patch specially handles doloop by setting the special flag, then
> checking it.  We generally avoid such special-case handling since it
> hurts long time maintenance.  The pass was very unstable in the pass
> because of such issues.
> 

OK, I understand your concerns now. Thanks for explanation!

>> One the other hand, I assumed your suggestion is to increase the
>> cost for the pair (original iv cand, cmp iv use), the increase cost
>> seems to be a heuristic value?  It seems it's possible to sacrifice
> Decrease the cost so that the iv_cand is preferred?  The comment
> wasn't made on top of implementing doloop in ivopts.  Anyway, you can
> still bypass cost model by binding the "correct" iv_cand to cmp
> iv_use.
> 

To decrease the cost isn't workable for this case, it make the original
iv cand is preferred more and over the other iv cand for memory iv use,
then the desirable memory based iv cand won't be chosen.
If increase the cost, one basic iv cand is chosen for cmp use, memory
based iv cand is chosen for memory use, instead of original iv for both.

Could you help to explain the "binding" more?  Does it mean cost modeling
decision making can simply bypass the cmp iv_use (we have artificially 
assigned one "correct" cand iv to it) and also doesn't count the "correct"
iv cand cost into total iv cost? Is my understanding correct?

>>> tuning;  2) the doloop decision can still be canceled by cost model if
>>> it's really not beneficial.  With current patch, it can't be undo once
>>> the decision is made (at very early stage in IVOPTs.).
>>
>> I can't really follow this.  If it's predicted to be transformed to doloop,
>> I think it should not be undoed any more, since it's useless to consider
>> this cmp iv use. Whatever IVOPTS does, the comp at loop closing should not
>> be changed (although possible to use other iv), right?  Do I miss something?
> As mentioned, the previous comment wasn't made on top of implementing
> doloop in ivopts.  That would be nice but a different story.
> Before we can do that, it'd better be conservative and only makes
> (doloop) decision in ivopts when you are sure.  As you mentioned, it's
> hard to do the same checks at gimple as RTL, right?  In this case,
> making it a (conservative) heuristic capturing certain beneficial
> cases sounds better than capturing all cases but fail in later RTL
> passes.
> 

Yes, I agree we should be conservative.  But it's hard to determine which is 
better in practice, even for capturing all cases, we are still trying our best
to be more conservative, excluding any suspicious factor which is possible to 
make it fail in later RTL checking, one example is that the patch won't predict
it can be doloop once finding switch statement.