[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #8 from Jan Hubicka  ---
The negative return value branch predictor is set to have 98% hitrate (measured
on SPEC2k17 some time ago).  There is --param predictable-branch-outcome that
is also set to 2% so indeed we consider the branch as well predictable by this
heuristics.

Reducing --param should make cmov to happen.

With profile_probability data type we could try something smarter on guessing
if given branch is predictable (such as ignoring guessed values and let
predictor to optionally mark branches as (un)predictable). But it is not quite
clear to me what desired behavior would be...

Guessing predictability of data branches is generally quite hard problem.
Predictablity of loop branches is easier, but we hardly apply BRANCH_COST on
branch closing loop since those are not if-conversion candidates.

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #7 from Richard Biener  ---
I think a return of a negative value is predicted to be cold (aka "error"):

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  if (c == 14)
goto ; [INV]
  else
goto ; [INV]
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
  D.2771 = -9;
  // predicted unlikely by early return (on trees) predictor.
  goto ; [INV]

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #6 from Uroš Bizjak  ---
This is by design, CMOV should not be used instead of well predicted jumps.

FYI, CMOV is quite problematic on x86, there are several PRs where conversion
to CMOV resulted in 2x slower execution. Please see e.g.:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309#c26

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #5 from Uroš Bizjak  ---
Digging a bit further:

if_info.max_seq_cost is calculated via targetm.max_noce_ifcvt_seq_cost, where
without params set we return:

  return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2);

with:

#define BRANCH_COST(speed_p, predictable_p) \
  (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost)

So, the conversion is clearly not desirable for well predicted jumps.

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #4 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Andrew Pinski from comment #2)
> 
> > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works
> > on aarch64.  Note there are some new changes to ifcvt.cc in review which
> > might improve this, though I am not sure.
> 
> x86_64 targetm.noce_conversion_profitable_p returns false for:

Actually, the cost function goes to default_noce_conversion_profitable_p,
where:

(gdb) p cost
$1 = 16
(gdb) p if_info->original_cost 
$2 = 8
(gdb) p if_info->max_seq_cost 
$3 = 0

For some reason, max_seq_cost remains zero, while on aarch64:

(gdb) p cost
$2 = 12
(gdb) p if_info->original_cost
$3 = 8
(gdb) p if_info->max_seq_cost
$4 = 12

So, x86_64 returns false from the default cost function:

  /* When compiling for size, we can make a reasonably accurately guess
 at the size growth.  When compiling for speed, use the maximum.  */
  return speed_p && cost <= if_info->max_seq_cost;

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #3 from Uroš Bizjak  ---
(In reply to Andrew Pinski from comment #2)

> Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works
> on aarch64.  Note there are some new changes to ifcvt.cc in review which
> might improve this, though I am not sure.

x86_64 targetm.noce_conversion_profitable_p returns false for:

(insn 20 0 19 (set (reg:SI 101)
(const_int -9 [0xfff7])) 85 {*movsi_internal}
 (nil))

(insn 19 20 21 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 99 [ c ])
(const_int 14 [0xe]))) 11 {*cmpsi_1}
 (nil))

(insn 21 19 0 (set (reg/v:SI 99 [ c ])
(if_then_else:SI (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 99 [ c ])
(reg:SI 101))) 1438 {*movsicc_noc}
 (nil))

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
  Known to fail||13.1.0
  Known to work||12.3.0
Summary|missed optimization: cmove  |[13/14 Regression] missed
   |not used with multiple  |optimization: cmove not
   |returns |used with multiple returns
   Target Milestone|--- |13.3

--- Comment #2 from Andrew Pinski  ---
The difference between GCC 12 and GCC 13 is:

GCC 13:
```
IF-THEN-JOIN block found, pass 1, test 2, then 3, join 4
```
GCC 12 and before:
```
IF-THEN-ELSE-JOIN block found, pass 1, test 2, then 3, else 4, join 5
```

Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works on
aarch64.  Note there are some new changes to ifcvt.cc in review which might
improve this, though I am not sure.