[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2021-08-31 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #4 from bin cheng  ---
(In reply to Jiu Fu Guo from comment #3)
> The issue may come from 'iv0 cmp iv1' transform:
> 
>if (c -->if (c>=b) in-loop
> -->if (b<=c) in-loop
> 
>   c: {4, +, 3}
>   b: {1, +, 1}
> 
>   if ({1, +, 1} <= {4, +, 3})
>   ==> if ({1,+,-2} <= {4,+,0})  here, error occur
>   ==> if ({1,+,-2} < {5,+,0}) le-->lt

So this duplicates to PR100740?  Thanks

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-25 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #7 from bin cheng  ---
(In reply to Jiu Fu Guo from comment #5)
> (In reply to bin cheng from comment #4)
> > (In reply to Jiu Fu Guo from comment #3)
> > > Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow 
> > > tricky:
> > > 
> > >   /* Only support simple cases for the moment.  */
> > >   if (TREE_CODE (iv0->base) != INTEGER_CST
> > >   || TREE_CODE (iv1->base) != INTEGER_CST)
> > > return false;
> > > 
> > > This code requires both sides are constant.
> > Actually it requires an IV with constant base.
> 
> I also feel that the intention of this function may only require one side
> constant for IV0 CODE IV1.
> As tests, for below loop, adjust_cond_for_loop_until_wrap return false:
> 
> foo (int *__restrict__ a, int *__restrict__ b, unsigned i)
> {
>   while (++i > 100)
> *a++ = *b++ + 1;
> }
> 
> For below code, adjust_cond_for_loop_until_wrap returns true:
>   i = UINT_MAX - 200;
>   while (++i > 100)
> *a++ = *b++ + 1;

Oh sorry for being misleading.  When I mentioned it requires something(...), I
was describing the current behavior, not that the conditions are necessary. 
Feel free to improve such cases.  Looking into niter analysis, these
cases(trade-offs) are not rare.

Thanks

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-24 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #4 from bin cheng  ---
(In reply to Jiu Fu Guo from comment #3)
> Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky:
> 
>   /* Only support simple cases for the moment.  */
>   if (TREE_CODE (iv0->base) != INTEGER_CST
>   || TREE_CODE (iv1->base) != INTEGER_CST)
> return false;
> 
> This code requires both sides are constant.
Actually it requires an IV with constant base.

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-24 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #2 from bin cheng  ---
(In reply to Richard Biener from comment #1)
> This comes up with a pending patch to split loops like
> 
> void
> foo (int *a, int *b, unsigned l, unsigned n)
> {
>   while (++l != n)
> a[l] = b[l] + 1;
> }
> 
> into
> 
>   while (++l > n)
> a[l] = b[l] + 1;
>   while (++l < n)
> a[l] = b[l] + 1;
> 
> since for the second loop (the "usual" case involving no wrapping of the IV)
> this results in affine IVs and thus analyzable data dependence.

Special case like "i++ > constant" are handled in function
adjust_cond_for_loop_until_wrap, however, it only handles constant invariant on
the other side right now.

Will see how to cover simple cases as reported here.

[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu

2021-06-23 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173

--- Comment #5 from bin cheng  ---
(In reply to Richard Biener from comment #3)
> So we're exchanging the inner two loops
> 
>   a[1][3] = 8;
>   for (int b = 1; b <= 5; b++)
> for (int d = 0; d <= 5; d++)
>   for (c = 0; c <= 5; c++)
> a[b][c] = a[b][c + 2] & 216;
> 
> to
> 
>   a[1][3] = 8;
>   for (int b = 1; b <= 5; b++)
> for (c = 0; c <= 5; c++)
>   for (int d = 0; d <= 5; d++)
> a[b][c] = a[b][c + 2] & 216;
> 
> but that looks wrong from a dependence analysis perspective.  We have
> 
> (compute_affine_dependence
>   ref_a: a[b_33][_1], stmt_a: _2 = a[b_33][_1];
>   ref_b: a[b_33][c.3_32], stmt_b: a[b_33][c.3_32] = _3;
> (analyze_overlapping_iterations
>   (chrec_a = {2, +, 1}_5)
>   (chrec_b = {0, +, 1}_5)
> (analyze_siv_subscript
> (analyze_subscript_affine_affine
>   (overlaps_a = [0 + 1 * x_1])
>   (overlaps_b = [2 + 1 * x_1]))
> )
>   (overlap_iterations_a = [0 + 1 * x_1])
>   (overlap_iterations_b = [2 + 1 * x_1]))
> (analyze_overlapping_iterations
>   (chrec_a = {1, +, 1}_1)
>   (chrec_b = {1, +, 1}_1)
>   (overlap_iterations_a = [0])
>   (overlap_iterations_b = [0]))
> (analyze_overlapping_iterations
>   (chrec_a = {0, +, 1}_5)
>   (chrec_b = {2, +, 1}_5)
> (analyze_siv_subscript
> (analyze_subscript_affine_affine
>   (overlaps_a = [2 + 1 * x_1])
>   (overlaps_b = [0 + 1 * x_1]))
> )
>   (overlap_iterations_a = [2 + 1 * x_1])
>   (overlap_iterations_b = [0 + 1 * x_1]))
> (analyze_overlapping_iterations
>   (chrec_a = {1, +, 1}_1)
>   (chrec_b = {1, +, 1}_1)
>   (overlap_iterations_a = [0])
>   (overlap_iterations_b = [0]))
> (build_classic_dist_vector
>   dist_vector = (  0   0   2
>   )
> )
> )
> 
> I don't see anything wrong with that at a first glance so the bug must be in
> tree_loop_interchange::valid_data_dependences it checks
> 
>   /* Be conservative, skip case if either direction at i_idx/o_idx
>  levels is not '=' or '<'.  */
>   if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0)
> return false;
> 
> dist_vect is [0 0 2], i_idx 2 and o_idx 1 but I think that dist_vect[o_idx]
> should exclude zero, thus
> 
> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
> index f45b9364644..265e36c48d4 100644
> --- a/gcc/gimple-loop-interchange.cc
> +++ b/gcc/gimple-loop-interchange.cc
> @@ -1043,8 +1043,8 @@ tree_loop_interchange::valid_data_dependences
> (unsigned i_idx, unsigned o_idx,
> continue;
>  
>   /* Be conservative, skip case if either direction at i_idx/o_idx
> -levels is not '=' or '<'.  */
> - if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0)
> +levels is not '=' (for the inner loop) or '<'.  */
> + if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0)
> return false;
> }
>  }
> 
> Bin - does this analysis look sound?

Hi Richard,
Thanks very much for helping on this.  Sorry I would need a bit more time to
answer this question.  Thanks again.

[Bug tree-optimization/100740] [9/10/11/12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r9-4145

2021-05-24 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100740

bin cheng  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amker at gcc dot gnu.org

--- Comment #2 from bin cheng  ---
mine.  Sorry for the breakage.

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-19 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

--- Comment #19 from bin cheng  ---
(In reply to bin cheng from comment #18)
> Did some experiments, there are two fallouts after explicitly returning
> false for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR.  One is
> the mentioned use of multiple_of_p in number_of_iterations_ne, the other is
> for alignment warning in stor-layout.c.  As pointed out, the latter case is
> known not overflow/wrap.  
> 
> So I am thinking to introduce an additional parameter indicating that caller
> knows "top" doesn't overfow/wrap, otherwise, try to get rid of the
> undocumented assumption.  we can always improve the accuracy using ranger or
> other tools.  Not sure if this is the right way to do.
> 
> As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple?  For
> example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is
> multiple of 4, but it's overflow behavior on which we rely here.

Hmm, 4 is special and not a correct example.  Considering:
  n (unsigned, multiple of 3, and > 0) + 0xfffd
It's multiple of 3, but we need to rely on wrapping to get answer.

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-19 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

--- Comment #18 from bin cheng  ---
Did some experiments, there are two fallouts after explicitly returning false
for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR.  One is the
mentioned use of multiple_of_p in number_of_iterations_ne, the other is for
alignment warning in stor-layout.c.  As pointed out, the latter case is known
not overflow/wrap.  

So I am thinking to introduce an additional parameter indicating that caller
knows "top" doesn't overfow/wrap, otherwise, try to get rid of the undocumented
assumption.  we can always improve the accuracy using ranger or other tools. 
Not sure if this is the right way to do.

As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple?  For
example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is
multiple of 4, but it's overflow behavior on which we rely here.

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-18 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

--- Comment #14 from bin cheng  ---
(In reply to Richard Biener from comment #12)
> So in number_of_iterations_ne it looks like the step 's' is always constant
> which makes me wonder if we can somehow use ranger to tell multiple_of_p
> (type, c, s)
> or at least whether, if c is x * s, the multiplication could have overflowed?

Yeah, I am looking if "multiple of" can be feasibly checked in niter analysis,
with help of some basic information from multiple_of_p.

BTW, I am not following changes in "ranger", how should I used in analysis? or
similar to value range info?

Thanks

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-18 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

--- Comment #13 from bin cheng  ---
(In reply to Richard Biener from comment #10)
> (In reply to bin cheng from comment #9)
> > Seems we have a long standing bug in fold-const.c:multiple_of_p in case of
> > wrapping types.  Take unsigned int as an example:
> >   (0xfffc * 0x3) % 0x3 = 0x1
> > But multiple_of_p returns true here.
> > 
> > The same issue also stands for MINUS_EXPR and PLUS_EXPR.  Given
> > multiple_of_p is used elsewhere, the fix might break existing optimizations.
> > Especially, number of loop iterations is computed in unsigned types
> 
> multiple_of_p is mostly used in contexts where overflow "cannot happen"
> (in TYPE/DECL_SIZE computation context), and in niter analysis it seems to
> be guarded similarly.  This restriction of multiple_of_p seems undocumented,
Oh, I am not aware of this.  Actually my previous change to it seems broke this
assumption already.  Will see how to fix or revert the change.

> so fixing that might be good.
> 
> Now, you don't say what's the chain of events that lead to a multiple_of_p
> call
> eventually leading to the wrong answer, but I guess it's the code added
> under the
> 
> +  if (!niter->control.no_overflow
> +  && (integer_onep (s) || multiple_of_p (type, c, s)))
> 
> check as !niter->control.no_overflow seems to suggest that the multiple_of_p
> check is not properly guarded?

[Bug tree-optimization/90078] [9 Regression] ICE with deep templates caused by overflow

2021-05-17 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90078

--- Comment #19 from bin cheng  ---
I will check if the latter fix can be easily backported to GCC-9.

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-16 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

--- Comment #9 from bin cheng  ---
Seems we have a long standing bug in fold-const.c:multiple_of_p in case of
wrapping types.  Take unsigned int as an example:
  (0xfffc * 0x3) % 0x3 = 0x1
But multiple_of_p returns true here.

The same issue also stands for MINUS_EXPR and PLUS_EXPR.  Given multiple_of_p
is used elsewhere, the fix might break existing optimizations.  Especially,
number of loop iterations is computed in unsigned types

[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options

2021-05-11 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499

bin cheng  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amker at gcc dot gnu.org

--- Comment #7 from bin cheng  ---
(In reply to Martin Liška from comment #4)
> (In reply to Martin Liška from comment #3)
> > But expected result is end g_2823 = 32768, right?
> > Clang returns the same result 32768.
> 
> Which regresses since r7-2373-g69b806f6a60efcf1.

Hmm, that was a fix long ago.  Will investigate this.  Sorry for the breakage.

[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6

2021-04-06 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736

--- Comment #6 from bin cheng  ---
Shall this be backported to 10/11 later? Thanks.

[Bug tree-optimization/95638] [10 Regression] Legit-looking code doesn't work with -O2

2021-03-26 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95638

bin cheng  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||10.2.0
 Resolution|--- |FIXED

--- Comment #15 from bin cheng  ---
Confirmed fixed for 10.2.0 also.  Closing.

[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6

2021-03-20 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736

--- Comment #4 from bin cheng  ---
(In reply to bin cheng from comment #3)
> hmm, seems topological order isn't enough for distributing a loop nest, we
> need topological order plus inner loop depth-first.

Well, not really.  In this case, problem is that rev-post order algorithm puts
"a[c] = d[3];" before the inner loop which violates the original program order. 

Seems that it can be fixed by inner loop depth-first order wrto how we
distribute inner loop, but I am not sure if this always preserves programming
order because loop has been reformed by various optimizers.

[Bug tree-optimization/99067] Missed optimization for induction variable elimination

2021-02-17 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

--- Comment #3 from bin cheng  ---
Though not sure if the underlying root causes are the same, I think these are
two different issues, at least, they are handled by different parts of code in
IVOPTs.  
For the first one, it's a known issue in GCC and IV elimination is complicated
yet quite conservative for long time, while for the second one, we indeed don't
know whether "i*N+j" wraps or not.  Even though we might be able to improve
IVOPTs under condition of wrapping behavior.

[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6

2021-02-17 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736

--- Comment #3 from bin cheng  ---
hmm, seems topological order isn't enough for distributing a loop nest, we need
topological order plus inner loop depth-first.

[Bug tree-optimization/99067] Missed optimization for induction variable elimination

2021-02-16 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

bin cheng  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amker at gcc dot gnu.org

--- Comment #2 from bin cheng  ---
Mine, will have a look.  Thanks for reporting.

[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC

2021-01-25 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627

--- Comment #12 from bin cheng  ---
a. why the loop is considered as infinite
b. we need to skip fake exit edges in niter analysis?

[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC

2021-01-25 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627

--- Comment #11 from bin cheng  ---
(In reply to bin cheng from comment #10)
> hmm,
> For below basic block:
> 128 ;;   basic block 4, loop depth 2, maybe hot
> 129 ;;prev block 3, next block 9, flags: (NEW, VISITED)
> 130 ;;pred:   3 (FALLTHRU,EXECUTABLE)
> 131 ;;7 (FALLTHRU,DFS_BACK,EXECUTABLE)
> 132   # RANGE [0, 2147483647] NONZERO 2147483647
> 133   # c_5 = PHI <0(3), c_17(7)>
> 134   # .MEM_8 = PHI <.MEM_7(3), .MEM_9(7)>
> 135   if (_2 < c_5)
> 136 goto ; [INV]
> 137   else
> 138 goto ; [INV]
> 139 ;;succ:   8 (TRUE_VALUE,EXECUTABLE)
> 140 ;;9 (FALSE_VALUE,EXECUTABLE)
> 
> Code in :
> 4276 
> 4277   basic_block *body = get_loop_body (loop);
> 4278   exits = get_loop_exit_edges (loop, body);
> 4279   likely_exit = single_likely_exit (loop, exits);
> 4280   FOR_EACH_VEC_ELT (exits, i, ex)
> 4281 {
> 4282   if (ex == likely_exit)
> 4283 {
> 4284   gimple *stmt = last_stmt (ex->src);
> 4285   if (stmt != NULL)
> 4286 {
> 
> gets three exit edges, one of which is  bb1>, as a result, 0 niter is
> computed for this exit in function number_of_iterations_exit_assumptions. 
> This seems strange, is it a fake edge added for some reason?
> 
> Thanks

Right, it's added by connect_infinite_loops_to_exit.

[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC

2021-01-25 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627

--- Comment #10 from bin cheng  ---
hmm,
For below basic block:
128 ;;   basic block 4, loop depth 2, maybe hot
129 ;;prev block 3, next block 9, flags: (NEW, VISITED)
130 ;;pred:   3 (FALLTHRU,EXECUTABLE)
131 ;;7 (FALLTHRU,DFS_BACK,EXECUTABLE)
132   # RANGE [0, 2147483647] NONZERO 2147483647
133   # c_5 = PHI <0(3), c_17(7)>
134   # .MEM_8 = PHI <.MEM_7(3), .MEM_9(7)>
135   if (_2 < c_5)
136 goto ; [INV]
137   else
138 goto ; [INV]
139 ;;succ:   8 (TRUE_VALUE,EXECUTABLE)
140 ;;9 (FALSE_VALUE,EXECUTABLE)

Code in :
4276 
4277   basic_block *body = get_loop_body (loop);
4278   exits = get_loop_exit_edges (loop, body);
4279   likely_exit = single_likely_exit (loop, exits);
4280   FOR_EACH_VEC_ELT (exits, i, ex)
4281 {
4282   if (ex == likely_exit)
4283 {
4284   gimple *stmt = last_stmt (ex->src);
4285   if (stmt != NULL)
4286 {

gets three exit edges, one of which is  bb1>, as a result, 0 niter is
computed for this exit in function number_of_iterations_exit_assumptions.  This
seems strange, is it a fake edge added for some reason?

Thanks

[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC

2021-01-21 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627

--- Comment #9 from bin cheng  ---
(In reply to Jakub Jelinek from comment #8)
> Still broken on current 10 branch, as written works fine on the trunk due to
> the C++ FE loop changes.
> Bin, did you have time to look into this yet?

I am very sorry, seems I have two correctness PRs now? Will try to investigate
these on this WE.

[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops

2021-01-08 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598

--- Comment #4 from bin cheng  ---
Didn't go deep into the case.

For simple cases taken as examples, it's possible to interchange the two loops
thus enables loop invariant code motion.  Though loop interchange may fail
because of complicated data dependences, we may take some useful points from
it, for example, the cost model checking new loop invariants wrto the outer
loop.

[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC

2020-10-29 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627

--- Comment #5 from bin cheng  ---
(In reply to Jakub Jelinek from comment #3)
> Started with r9-4145-ga81e2c6240655f60a49c16e0d8bbfd2ba40bba51

Sorry for the breakage.  Will fix this.

[Bug tree-optimization/78427] missed optimization of loop condition

2020-09-27 Thread amker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78427

bin cheng  changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #5 from bin cheng  ---
(In reply to Antony Polukhin from comment #4)
> Any progress?

Oh, I missed this one.  Will try to find time later.

Thanks