[Bug tree-optimization/94779] Bad optimization of simple switch

2021-08-16 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-18 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #22 from Andrew Macleod  ---
(In reply to Martin Liška from comment #21)
> > See my comment above.  It isn't any integration of VRP, just asking the
> > ranger about the range, and it isn't useless because to be able to optimize
> > properly,
> > you need to figure out for each value one of the 3 possibilities (handled
> > explicitly by switch and well defined, handled by default and never
> > reachable or UB).
> 
> I've got the point that we have these 3 possibilities for each case.
> However, we also have it for "holes" which are removed and we can't
> distinguish
> them in between __builtin_unreachable and falling to the default case.
> 
> I believe a better place for Ranger is to improve:
> simplify_using_ranges::simplify_switch_using_ranges (gswitch *stmt)

Long term, my goal is to decentralize this sort of thing.  This routine a is
legacy approach from when VRP needed to be a mini-optimizer because it was the
only place reasonable range information was available. 

I would  to push virtually everything from VRP into more appropriate places.
folding, simplification, threading, switch rewriting, etc can all be done
either in a specialized pass of their own, or ongoing as other related things
are happening.  

Switch simplification/optimization in particular is a good candidate.  A self
contained pass that knows what best to do with a switch, and can utilize
accurate range info can take care of everything with a deep analysis. It could
be invoked in multiple places if required.  Then you don't need to worry about
a pass like VRP or some other pass messing up the structure of the switch.  

I would like to see the simplification routines completely moved into a switch
optimization  pass, and VRP would then just ignore switches.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-18 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #21 from Martin Liška  ---
> See my comment above.  It isn't any integration of VRP, just asking the
> ranger about the range, and it isn't useless because to be able to optimize
> properly,
> you need to figure out for each value one of the 3 possibilities (handled
> explicitly by switch and well defined, handled by default and never
> reachable or UB).

I've got the point that we have these 3 possibilities for each case.
However, we also have it for "holes" which are removed and we can't distinguish
them in between __builtin_unreachable and falling to the default case.

I believe a better place for Ranger is to improve:
simplify_using_ranges::simplify_switch_using_ranges (gswitch *stmt)

Note that even if CSWITCH does not do a transformation other passes can still
benefit from a more canonical gswitch form.

That said, I don't like the idea of using Ranger in cswitch because similar
should do each pass that somehow works with switches.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #20 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #19)
> > 
> > One of the work items for the next release is to multi-range enable all
> > these consumers that can make use of the information.
> 
> I would really appreciate that. I'm don't like integrating VRP into the
> CSWITCH pass mainly because it's duplicate work and EVRP runs right before
> that pass. And I would prefer a more canonical form of switch statements:

See my comment above.  It isn't any integration of VRP, just asking the ranger
about the range, and it isn't useless because to be able to optimize properly,
you need to figure out for each value one of the 3 possibilities (handled
explicitly by switch and well defined, handled by default and never reachable
or UB).  It is expected that many further passes will just query the ranger
when they need some information in the future; currently they sometimes just
get_range_info to query the static single range and often not even that.
VRP and other passes can perhaps throw out labels which won't be reachable
because they are outside of the index range (they shouldn't throw out labels
that lead to __builtin_unreachable because then information would be lost for
the optimization), but still the optimization needs to categorize the values.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-18 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #19 from Martin Liška  ---
> 
> One of the work items for the next release is to multi-range enable all
> these consumers that can make use of the information.

I would really appreciate that. I'm don't like integrating VRP into the CSWITCH
pass mainly because it's duplicate work and EVRP runs right before that pass.
And I would prefer a more canonical form of switch statements:

f1(unsigned x) {
  if (x >= 3)
__builtin_unreachable();

  switch (x) {
case 0:
  return 1;
case 1:
  return 2;
case 2:
  return 3;
  }
}

right now we generate:

switch (x)
default:
  return 1;
case 1:
  return 2;
case 2:
  return 3;

I would rather prefer:
switch (x)
case 0:
  return 1;
case 1:
  return 2;
case 2:
  return 3;
default:
  __builtin_unreachable ();

Similarly:

int y;

f1(unsigned x) {
  if (x == 2)
__builtin_unreachable();

  switch (x) {
case 0:
  return 1;
case 1:
  return 2;
case 2:
  return 3;
default:
  y = 123;
  return 2;
  }
}

is transformed into:
 switch (x) {
case 0:
  return 1;
case 1:
  return 2;
default:
  y = 123;
  return 2;

but I would prefer:

 switch (x) {
case 0:
  return 1;
case 1:
  return 2;
case 2:
  __builtin_unreachable ();
default:
  y = 123;
  return 2;

having that, we could be able to do a CSWITCH transformation.
I'm planning to return to these situations in the next stage1.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-17 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #18 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #17)
> (In reply to Martin Liška from comment #16)

> > 
> > EVRP knows the proper range:
> > 2->4  (F) x_6(D) :  unsigned int [0, 1][4, 4]
> 
> EVRP ATM invokes both the old code and ranger and only ranger knows the
> above.
> There is a param to adjust the behavior.
> Anyway, if something isn't able to work with the detailed ranges (up to 255
> subranges), type conversions will ensure that one gets a single summary
> range ([0, 4] in this case likely) and possibly the switch opts still do
> that.
>

Indeed.  Ranger knows, but at this point most of the client consumers such as
folding and simplification still only deal with value_range, which means they
will revert to using the best approximation using only a pair, which would be
[0,4] in this case as you observe.

One of the work items for the next release is to multi-range enable all these
consumers that can make use of the information.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #17 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #16)
> > The old VRP/EVRP only tracks simple ranges and anti-ranges, so can't deal
> > with
> > what you have above, the new ranger code can deal with multiple subranges,
> > but the question is if all the interfaces deal with those.
> 
> EVRP knows the proper range:
> 2->4  (F) x_6(D) :unsigned int [0, 1][4, 4]

EVRP ATM invokes both the old code and ranger and only ranger knows the above.
There is a param to adjust the behavior.
Anyway, if something isn't able to work with the detailed ranges (up to 255
subranges), type conversions will ensure that one gets a single summary range
([0, 4] in this case likely) and possibly the switch opts still do that.

Anyway, whether EVRP or VRP or whatever other pass performs some switch
optimizations based on ranges or not, it is valuable to query the ranges in
switchconv too.  Because, what e.g. EVRP can only do with the switch is remove
unreachable cases, but for switchconv it should be valueable to know for each
possible value in the range of the switch index whether such value is possible
and there is a handler for it and that handler is not __builtin_unreachable
();, or if the case is possible but not explicitly handled (thus default:
unless default is __builtin_unreachable ()), or the never happen cases (or will
invoke UB immediately, that is the same thing).
Because for some things it is better to treat the last category as non-existing
and for others to treat it as existing but can do anything.
E.g. if you compute the full covered range, the invalid cases at the start of
the range or at the end of the range don't need to be counted, one can shorten
the range, but e.g. when deciding if it is contiguous or if one can use linear
expressions on the index, one can treat the invalid cases as present and
satisfying whatever we need.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #16 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #15)
> (In reply to Martin Liška from comment #14)
> > Ok, I've just taken look at what EVRP pass does before SWITCHCONV pass is
> > called.
> > I see that EVRP can properly prune dead cases of a switch, but it's not
> > perfect:
> > 
> > int f1(unsigned x)
> > {
> > if (x == 2 || x == 3 || x >= 5)
> >   __builtin_unreachable ();
> > switch (x)
> > {
> >   case 0: return 1;
> >   case 1: return 2;
> >   case 2: return 3;
> >   case 3 ... 8: return 4;
> > }
> > }
> 
> The old VRP/EVRP only tracks simple ranges and anti-ranges, so can't deal
> with
> what you have above, the new ranger code can deal with multiple subranges,
> but the question is if all the interfaces deal with those.

EVRP knows the proper range:
2->4  (F) x_6(D) :  unsigned int [0, 1][4, 4]

> 
> > This seems to me like a strange EVRP transformation :/
> 
> Why?  And, the user could have written it that way too.

Looking into it.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #15 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #14)
> Ok, I've just taken look at what EVRP pass does before SWITCHCONV pass is
> called.
> I see that EVRP can properly prune dead cases of a switch, but it's not
> perfect:
> 
> int f1(unsigned x)
> {
> if (x == 2 || x == 3 || x >= 5)
>   __builtin_unreachable ();
> switch (x)
> {
>   case 0: return 1;
>   case 1: return 2;
>   case 2: return 3;
>   case 3 ... 8: return 4;
> }
> }

The old VRP/EVRP only tracks simple ranges and anti-ranges, so can't deal with
what you have above, the new ranger code can deal with multiple subranges, but
the question is if all the interfaces deal with those.

> This seems to me like a strange EVRP transformation :/

Why?  And, the user could have written it that way too.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-17 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #14 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #12)
> So, for start I think we should do:
> 1) query->range_of_expr (m_index_range, m_index_expr, swtch)
>where query is address of gimple_ranger passed down from the caller
>to figure out range of the index expression; case labels without
> CASE_HIGH for
>which m_index_range.contains_p (CASE_LOW (case)) is false can be ignored
>for the optimization (well, handled the way it is better for the
> optimization)
>For CASE_HIGH labels, similarly query if the range overlaps the index
> range.
>Note, I don't remember exactly in which way the types of the index
> expression
>and of the case labels can differ, I believe all case labels have to have
> the
>same type, so probably for the computation of the range if it has
> different
>type from the case label ones, it needs to call something that would
> emulate
>the effect of conversion to that type (or vice versa?)
>CCing Andrew for the range stuff

Ok, I've just taken look at what EVRP pass does before SWITCHCONV pass is
called.
I see that EVRP can properly prune dead cases of a switch, but it's not
perfect:

int f1(unsigned x)
{
if (x == 2 || x == 3 || x >= 5)
  __builtin_unreachable ();
switch (x)
{
  case 0: return 1;
  case 1: return 2;
  case 2: return 3;
  case 3 ... 8: return 4;
}
}

after VRP we end up with:
  switch (x_6(D))  [INV], case 1:  [INV], case 2:  [INV],
case 3 ... 4:  [INV]>

but case '3' is not pruned here. Similarly, I can imagine the following
reduction:

if (x >= 4 && x <= 9)
  __builtin_unreachable ();
switch (x)
{
  case 0: return 1;
  case 1: return 2;
  case 2: return 3;
  case 3 ... 10: return 4;
}

is not simplified to:
  switch (x_3(D))  [INV], case 0:  [INV], case 1:  [INV],
case 2:  [INV], case 3 ... 10:  [INV]>

but I would expect:
  switch (x_3(D))  [INV], case 0:  [INV], case 1:  [INV],
case 2:  [INV], case 3: , case 10:  [INV]>

Btw, EVRP knows the information:
4->8  x_3(D) :  unsigned int [11, +INF]
4->9  x_3(D) :  unsigned int [0, 0]
4->5  x_3(D) :  unsigned int [1, 1]
4->6  x_3(D) :  unsigned int [2, 2]
4->7  x_3(D) :  unsigned int [3, 3][10, 10]

The question is whether we want to make the transformation by EVRP?

> 2) similarly, cases that refer to blocks which have EDGE_COUNT (bb->succs)
> == 0
>&& gimple_seq_unreachable_p (bb_seq (bb)) should be treated again like
> cases
>one shouldn't need to care about

I've got a patch candidate for this.

> 3) to handle the #c0 testcase in C (C++ already adds __builtin_unreachable),
>handle also the case where case refers to block which has EDGE_COUNT
> (bb->succs) and ends in GIMPLE_RETURN without expression in a function that
>returns integral or pointer value (for simplicity) and has no statements
>other than debug stmts or clobber stmts before it.  Note, this case can't
> be
>treated completely like a UB case, there is no UB in C unless the caller
>checks the return value, but it could be handled conditionally, as long as
>the code we emit for that doesn't invoke UB in the function, we really
> don't 
>care about the value we return to the caller.  So e.g. if we are
> considering
>a linear function and other case labels return that linear value and some
>return unspecified value, just use the linear function to cover those too.

Likewise here, it's smart to return a random value for C when there's a missing
return value.

> 4) to be able to optimize the int f1(unsigned x) { if (x >= 3)
> __builtin_unreachable();
>switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; } }
>testcase, we should for consideration virtually undo the evrp optimization
>which removed the case 0: and replaced it with default: - here the handled
>cases (that should include the really handled ones and ones that and the
>UB ones (if the case is outside of range or __builtin_unreachable) cover
>everything but one case, assume the default label is that for that single
>case rather than handling anything else; similarly, if the whole range
>is covered, ignore the default label

This seems to me like a strange EVRP transformation :/

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-15 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #13 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #12)
> So, for start I think we should do:
> 1) query->range_of_expr (m_index_range, m_index_expr, swtch)
>where query is address of gimple_ranger passed down from the caller
>to figure out range of the index expression; case labels without

For starters, you can probably simply create a gimple_ranger on the spot to do
the query.  At least until everything works the way you want.   Its low
overhead... we had reasonable success with that early on with some pass
conversions, and ultimately we hope by next release to make this more generally
accessible without passing a ranger around anyway. 


> CASE_HIGH for
>which m_index_range.contains_p (CASE_LOW (case)) is false can be ignored
>for the optimization (well, handled the way it is better for the
> optimization)
>For CASE_HIGH labels, similarly query if the range overlaps the index
> range.
>Note, I don't remember exactly in which way the types of the index
> expression
>and of the case labels can differ, I believe all case labels have to have
> the
>same type, so probably for the computation of the range if it has
> different
>type from the case label ones, it needs to call something that would
> emulate
>the effect of conversion to that type (or vice versa?)
>CCing Andrew for the range stuff

There are a couple of PRs related to this, I forget the current resolution, but
I originally opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87798 for it.

Ranger currently punts if the precision of the labels is different than the
precision of the switch expr in gimple-range-edge.cc :
outgoing_range::get_edge_range

In order to convert between types, range-op.h exports the routine
   extern void range_cast (irange &, tree type);

which does an in place conversion to type... supports full cast semantics.
there are uses of it sprinkled around.


Furthermore, IIRC there can be multiple cases sharing the same edge.. It seems
it might be useful to make more use of gimple-range-edge.h here.. (we may need
some tweaks in order to get things working with a ranger since this is a
private member to ranger)  so I would again start by simply creating my own
local outgoing_range instance and then you can ask

int_range_max range;
outgoing_range_instance.edge_range_p (range, switch_edge)

where this will give you the cumulative union of all the case-labels on that
switch edge.   (its unrelated to the index variable...  Ranger intersects this
range with the index variable range when you ask for the range of the index
variable on an edge) 

If this turns out to be useful, we can probably figure a way to get it exported
from ranger to avoid creating a local instance


The last useful bit of info that may not be obvious...  if you ask ranger for
the range of the index variable on the default edge, and if all the cases cover
the known range of the index variable, as in the first example, then ranger
will return a range of UNDEFINED ...  meaning it can never take that edge.

ie the function in the first example shows me:
=== BB 4 
x_2(D)  unsigned int [0, 2]
 :
switch (x_2(D))  [INV], case 0:  [INV], case 1: 
[INV], case 2:  [INV]>

4->7  x_2(D) :  UNDEFINED
4->8  x_2(D) :  unsigned int [0, 0]
4->5  x_2(D) :  unsigned int [1, 1]
4->6  x_2(D) :  unsigned int [2, 2]

where edge 4->7 is the default case...

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-12-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Jakub Jelinek  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #12 from Jakub Jelinek  ---
So, for start I think we should do:
1) query->range_of_expr (m_index_range, m_index_expr, swtch)
   where query is address of gimple_ranger passed down from the caller
   to figure out range of the index expression; case labels without CASE_HIGH
for
   which m_index_range.contains_p (CASE_LOW (case)) is false can be ignored
   for the optimization (well, handled the way it is better for the
optimization)
   For CASE_HIGH labels, similarly query if the range overlaps the index range.
   Note, I don't remember exactly in which way the types of the index
expression
   and of the case labels can differ, I believe all case labels have to have
the
   same type, so probably for the computation of the range if it has different
   type from the case label ones, it needs to call something that would emulate
   the effect of conversion to that type (or vice versa?)
   CCing Andrew for the range stuff
2) similarly, cases that refer to blocks which have EDGE_COUNT (bb->succs) == 0
   && gimple_seq_unreachable_p (bb_seq (bb)) should be treated again like cases
   one shouldn't need to care about
3) to handle the #c0 testcase in C (C++ already adds __builtin_unreachable),
   handle also the case where case refers to block which has EDGE_COUNT
(bb->succs) and ends in GIMPLE_RETURN without expression in a function that
   returns integral or pointer value (for simplicity) and has no statements
   other than debug stmts or clobber stmts before it.  Note, this case can't be
   treated completely like a UB case, there is no UB in C unless the caller
   checks the return value, but it could be handled conditionally, as long as
   the code we emit for that doesn't invoke UB in the function, we really don't 
   care about the value we return to the caller.  So e.g. if we are considering
   a linear function and other case labels return that linear value and some
   return unspecified value, just use the linear function to cover those too.
4) to be able to optimize the int f1(unsigned x) { if (x >= 3)
__builtin_unreachable();
   switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; } }
   testcase, we should for consideration virtually undo the evrp optimization
   which removed the case 0: and replaced it with default: - here the handled
   cases (that should include the really handled ones and ones that and the
   UB ones (if the case is outside of range or __builtin_unreachable) cover
   everything but one case, assume the default label is that for that single
   case rather than handling anything else; similarly, if the whole range
   is covered, ignore the default label

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #11 from Jakub Jelinek  ---
Related PR is PR89059 though, while we can have a useful range info already in
the early opts from evrp, in many cases we can get much better info after
inlining.  So, if we during the first switchconv pass could just perform
analysis and stick on the gswitch info that e.g. the inlining cost computations
could use and only perform lowering later, or if we could perform the
conversion right away but in some way note for the second switchconv pass that
this and this came from switch conversion and that if the range info is
narrowed second switchconv pass should attempt to adjust it, it would be really
appreciated.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-05-06 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #10 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #9)
> Martin, I think switchconv pass should use get_range_info and completely
> ignore labels with values outside of the range of the switch SSA_NAME range
> (including the default label if all values in the SSA_NAME range have
> explicit case labels).
> That wouldn't change everything in this report, e.g. the #c0 one I think
> doesn't even make switchconv as it is lowered earlier.

Good point, I'll work on that!

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Martin, I think switchconv pass should use get_range_info and completely ignore
labels with values outside of the range of the switch SSA_NAME range (including
the default label if all values in the SSA_NAME range have explicit case
labels).
That wouldn't change everything in this report, e.g. the #c0 one I think
doesn't even make switchconv as it is lowered earlier.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #8 from Gabriel Ravier  ---
Also, this code : 

int f1(unsigned x)
{
if (x >= 3)
__builtin_unreachable();

switch (x)
{
case 0:
return 1;
case 1:
return 2;
case 2:
return 3;
}
}

doesn't optimize as well as 

int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
case 2:
return 3;
}
}

, despite being both equivalent and providing more information to the early
stages of optimization. Should I open a seperate bug report for this or is it
close enough to this that it would be considered a duplicate ?

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Richard Biener  changed:

   What|Removed |Added

 Status|REOPENED|NEW

--- Comment #7 from Richard Biener  ---
Confirmed.  I think we have linear function detection in switch lowering which
for some reason doesn't trigger either.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #6 from Gabriel Ravier  ---
There is another thing I realised : This code :

int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
case 2:
return 3;
}
}

always gets optimized to `return x + 1;` with no problems. The switchconv pass
handles it. Maybe that pass should be doing the same optimisation on the code
from this issue, too. That seems like the simplest solution to this.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #5 from Gabriel Ravier  ---
Going to take a quick look at how it gets optimized in the tree passes.

This is the first case :

int f1(unsigned x)
{
if (x >= 2)
__builtin_unreachable();

switch (x)
{
case 0:
return 1;
case 1:
return 2;
}
}

This is the case which properly compiles down to `return x + 1;`, and should be
equivalent to the other code.

The ssa tree pass ends up, early in optimization, combining some expressions
into `# _1 = PHI <_4(5), _3(6)>`
Then, phiopt1 optimizes it to `_3 = x_2(D) + 1;`, at which point the
optimization we're talking about has been made

This is the second case : 

int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
}
}

The ssa tree pass still combines some expressions into `# _1 = PHI <1(2),
2(3)>`
However, phiopt1 doesn't optimize into `x + 1`.

For me, this seems to be because in the first case, the tree is then this : 

f1 (unsigned int x)
{
  int _1;

   :
  if (x_2(D) > 1)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_unreachable ();

   :
  if (x_2(D) == 1)
goto ; [INV]
  else
goto ; [INV]

   :
:

   :
  # _1 = PHI <1(4), 2(5)>
:
  return _1;

}

And in the second case, it is instead this : 

f1 (unsigned int x)
{
  int _1;

   :
  switch (x_2(D))  [INV], case 0:  [INV], case 1:  [INV]>

   :
:
  goto ; [INV]

   :
:
  __builtin_unreachable ();

   :
  # _1 = PHI <1(2), 2(3)>
:
  return _1;

}

I believe phiopt1 is unable to optimize the PHI into `x + 1` in the second case
because the `switch` hasn't been optimized away into an if yet
It eventually gets optimized by switchlower1
phiopt4 is then, again, unable to optimize the PHI into `x + 1`. The tree is
then as such : 

f1 (unsigned int x)
{
  int _1;

   [local count: 1073741824]:
  if (x_2(D) == 0)
goto ; [65.00%]
  else
goto ; [35.00%]

   [local count: 1073741824]:
  if (x_2(D) == 1)
goto ; [100.00%]
  else
goto ; [0.00%]

   [count: 0]:
:
  __builtin_unreachable ();

   [local count: 1073741824]:
  # _1 = PHI <2(3), 1(2)>
:
  return _1;

}

And then there are no further tree passes that look like they should be
relevant to this

Assuming my analysis is correct (though considering I have little to no
experience with gcc's tree system, there is a pretty big chance that my
analysis is completely wrong), I believe that to solve this, there are two
possibilities : 
- Either there is a missing tree pass that would make phiopt4 capable of
optimizing the PHI into `x_2(D) + 1`
- Or there should there be some extra code added to phiopt4 to recognise the
idiom in this case

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Martin Liška  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|RESOLVED|REOPENED
 Resolution|INVALID |---
 Ever confirmed|0   |1
   Last reconfirmed||2020-04-27

--- Comment #4 from Martin Liška  ---
Then it's a missed optimization.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #3 from Gabriel Ravier  ---
Just fyi : When I said "gcc fails to optimize this to `return x + 1`, instead
opting for some rather weird code generation (involving `sbb` on x86)" the
"weird code generation" I was referring to is the exact one you sent and called
"fully optimized". I highly doubt this is optimal code generation over simply
adding 1 to x and returning that.

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

--- Comment #2 from Gabriel Ravier  ---
It's fully optimized ? I don't see how. This is exactly what I was complaining
about : It could be further optimized to 

leal1(%rdi), %eax
ret

but it isn't

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 CC||marxin at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Liška  ---
Note that each switch contains an implicit default label which is missing in
your code. So your function invokes an undefined behavior (when you use the
return value of the function) for values different from 0 and 1.
I recommend:

int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
default:
__builtin_unreachable ();
}
}

which is fully optimized to:

cmpl$1, %edi
movl$1, %eax
sbbl$-1, %eax
ret