[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2021-08-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2021-07-16 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Andrew Macleod  ---

Range-ops uses wi_fold (implemented by each opcode)  to individually 
fold subranges one at a time and then combines them. This patch first 
calls wi_fold_in_parts which checks if one of the subranges is small, 
and if so, further splits that subrange into constants.

Currently, if a subrange is 4 or less values, then we call it 
individually for each of the 4 values. 4 was chosen as a reasonable 
tradeoff between excess work vs benefit.  

this gets all 3 cases now.
fixed

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2021-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:704e8a825c78b9a8424c291509413bbb48e602c7

commit r12-2381-g704e8a825c78b9a8424c291509413bbb48e602c7
Author: Andrew MacLeod 
Date:   Fri Jul 16 11:42:14 2021 -0400

Add wi_fold_in_parts.

range-ops uses wi_fold to individually fold subranges one at a time and
then combined them.  This patch first calls wi_fold_in_parts which checks
if
one of the subranges is small, and if so, further splits that subrange
into constants.

gcc/
PR tree-optimization/96542
* range-op.cc (range_operator::wi_fold_in_parts): New.
(range_operator::fold_range): Call wi_fold_in_parts.
(operator_lshift::wi_fold): Fix broken lshift by [0,0].
* range-op.h (wi_fold_in_parts): Add prototype.

gcc/testsuite
* gcc.dg/pr96542.c: New.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread amacleod at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #6 from Andrew Macleod  ---
Likewise, for

unsigned
baz (unsigned int x)
{
  if (x >= 4) return 32;
  return (-1U >> x) * 16;
}


=== BB 2 
x_3(D)  unsigned int VARYING
_4  UNDEFINED
 :
if (x_3(D) > 3)
  goto ; [INV]
else
  goto ; [INV]

2->3  (T) x_3(D) :  unsigned int [4, +INF]
2->4  (F) x_3(D) :  unsigned int [0, 3]

=== BB 3 
_4  UNDEFINED
 :
// predicted unlikely by early return (on trees) predictor.
goto ; [INV]


=== BB 4 
x_3(D)  unsigned int [0, 3]
 :
_1 = 4294967295 >> x_3(D);
_4 = _1 * 16;
_1 : unsigned int [536870911, +INF]

0x >> [0,3] is currently producing [0x2FFF, 0x]

again making operator_rshift capable of generating multi-ranges,

0x >> [0,3] should produce
[0x2FFF, 0x2FFF][0x4FFF, 0x4FFF][0x8FFF,
0x8FFF][0x, 0x] 

and then _4 = _1 *16 should automatically produce: [0xFFF0, 0xFFF0]
I think multiply does that today, if its given the proper input.




=== BB 5 
_4  unsigned int VARYING
 :
# _2 = PHI <32(3), _4(4)>
return _2;

And then this PHI will have the constant propagated and become:
# _2 = PHI <32(3), 0xFFF0(4)>
return _2

ANd given that, presumably phi-opt or maybe even VRPs simplfication will turn
that into the desired conditional once the constant is calculated.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread amacleod at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #5 from Andrew Macleod  ---
I think this all goes away when multi-range is enabled.

The original testcase produces:
=== BB 2 
x_4(D)  unsigned int VARYING
 :
tmp_5 = x_4(D) != 0;
_1 = (int) tmp_5;
_2 = 255 >> _1;
_3 = (unsigned char) _2;
_6 = _3 * 2;
return _6;

_1 : int [0, 1]
_2 : int [127, 255]
_3 : unsigned char [127, +INF]


A number of the range-ops routines are simply inherited from the single range
codebase of value_range, and are not multi-range enabled yet.

In particular, rshift here.   with a tweak to operator_rshift, we can take

255 >> [0,1] and instead calculate 
_2 = [127,127][255,255]
which would make the results:

_1 : int [0, 1]
_2 : int [127, 127][255, 255]
_3 : unsigned char [127, 127] [255, 255]

Then when _6 is calculated as [127, 127][255, 255] * 2 , range-ops will
calculate the result to be [254, 254]


The whole thing will just fold away to return 254  (or -2 if you want to sign
it :-)

All we need to do is get the multi-range code enabled it's coming.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #4 from Jakub Jelinek  ---
As for COND_EXPR, if we do it that way, it should be rather keyed on a range
with only two possible values in the range.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

Jakub Jelinek  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com

--- Comment #3 from Jakub Jelinek  ---
(In reply to Marc Glisse from comment #2)
> > Or shall say VRP try harder if it sees [0, 1] ranges?
> 
> If a range has only 2 (or some other small number) values, try propagating
> each and see if some variables end up with the same value in both cases?

Maybe.  The question is if it should be done in forwprop, or say vrp or in the
ranger code (does that one do forward propagation)?
In any case, it should be limited to ranges with small number of values
(perhaps decided with a param) and also bound on how many immediate use stmts
it attempts to propagate it through.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #2 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #1)
> In bar, this is optimized, because fold_binary_op_with_conditional_arg
> optimizes
> 255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned
> char this results in x ? 254 : 254.
> We don't have anything comparable in match.pd yet I believe (and should we?).

We have something for VEC_COND_EXPR to fold a op (b?c:d), but not for
COND_EXPR, which you would be unlikely to see in gimple (and the generator of
phiopt transforms from match.pd patterns hasn't appeared yet). Also, we only
have x!=0, and while fold_binary_op_with_conditional_arg tries to handle it
like x!=0?1:0, we indeed don't do anything like that for gimple. And it seems
possibly better suited to forward propagation than backward like match.pd.

> Or shall say VRP try harder if it sees [0, 1] ranges?

If a range has only 2 (or some other small number) values, try propagating each
and see if some variables end up with the same value in both cases? Or if
enough simplifications occur that it is worth introducing a conditional? I am
not sure it would be worth the trouble.

> Though, shouldn't we optimize e.g.
> unsigned
> baz (unsigned int x)
> {
>   if (x >= 4) return 32;
>   return (-1U >> x) * 16;
> }
> too to return x >= 4 ? 32U : -16U; ?
> Not sure where and how to generalize it though.
> Value range of -1U >> [0, 3] is not really useful here, nonzero bits either.
> And having a specialized (const1 >> x) * const2 optimizer based on x's value
> range would work, but not sure if it has a real-world benefit.

And here this is complicated by the fact that we do not narrow the operation,
so it is less obvious that the constant is -1.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

Jakub Jelinek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-08-10

--- Comment #1 from Jakub Jelinek  ---
unsigned char
foo (unsigned int x)
{
  _Bool y = x;
  return (((unsigned char) ~0) >> y) * 2;
}

unsigned char
bar (unsigned int x)
{
  return (((unsigned char) ~0) >> (_Bool) x) * 2;
}

In bar, this is optimized, because fold_binary_op_with_conditional_arg
optimizes
255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned
char this results in x ? 254 : 254.
We don't have anything comparable in match.pd yet I believe (and should we?).
Or shall say VRP try harder if it sees [0, 1] ranges?
Though, shouldn't we optimize e.g.
unsigned
baz (unsigned int x)
{
  if (x >= 4) return 32;
  return (-1U >> x) * 16;
}
too to return x >= 4 ? 32U : -16U; ?
Not sure where and how to generalize it though.
Value range of -1U >> [0, 3] is not really useful here, nonzero bits either.
And having a specialized (const1 >> x) * const2 optimizer based on x's value
range would work, but not sure if it has a real-world benefit.