[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 CC||law at gcc dot gnu.org
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #8 from Jan Hubicka --- The negative return value branch predictor is set to have 98% hitrate (measured on SPEC2k17 some time ago). There is --param predictable-branch-outcome that is also set to 2% so indeed we consider the branch as well predictable by this heuristics. Reducing --param should make cmov to happen. With profile_probability data type we could try something smarter on guessing if given branch is predictable (such as ignoring guessed values and let predictor to optionally mark branches as (un)predictable). But it is not quite clear to me what desired behavior would be... Guessing predictability of data branches is generally quite hard problem. Predictablity of loop branches is easier, but we hardly apply BRANCH_COST on branch closing loop since those are not if-conversion candidates.
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #7 from Richard Biener --- I think a return of a negative value is predicted to be cold (aka "error"): ;; basic block 2, loop depth 0 ;;pred: ENTRY if (c == 14) goto ; [INV] else goto ; [INV] ;;succ: 3 ;;4 ;; basic block 3, loop depth 0 ;;pred: 2 D.2771 = -9; // predicted unlikely by early return (on trees) predictor. goto ; [INV]
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #6 from Uroš Bizjak --- This is by design, CMOV should not be used instead of well predicted jumps. FYI, CMOV is quite problematic on x86, there are several PRs where conversion to CMOV resulted in 2x slower execution. Please see e.g.: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309#c26
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #5 from Uroš Bizjak --- Digging a bit further: if_info.max_seq_cost is calculated via targetm.max_noce_ifcvt_seq_cost, where without params set we return: return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2); with: #define BRANCH_COST(speed_p, predictable_p) \ (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost) So, the conversion is clearly not desirable for well predicted jumps.
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #4 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #3) > (In reply to Andrew Pinski from comment #2) > > > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works > > on aarch64. Note there are some new changes to ifcvt.cc in review which > > might improve this, though I am not sure. > > x86_64 targetm.noce_conversion_profitable_p returns false for: Actually, the cost function goes to default_noce_conversion_profitable_p, where: (gdb) p cost $1 = 16 (gdb) p if_info->original_cost $2 = 8 (gdb) p if_info->max_seq_cost $3 = 0 For some reason, max_seq_cost remains zero, while on aarch64: (gdb) p cost $2 = 12 (gdb) p if_info->original_cost $3 = 8 (gdb) p if_info->max_seq_cost $4 = 12 So, x86_64 returns false from the default cost function: /* When compiling for size, we can make a reasonably accurately guess at the size growth. When compiling for speed, use the maximum. */ return speed_p && cost <= if_info->max_seq_cost;
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #3 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #2) > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works > on aarch64. Note there are some new changes to ifcvt.cc in review which > might improve this, though I am not sure. x86_64 targetm.noce_conversion_profitable_p returns false for: (insn 20 0 19 (set (reg:SI 101) (const_int -9 [0xfff7])) 85 {*movsi_internal} (nil)) (insn 19 20 21 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:SI 99 [ c ]) (const_int 14 [0xe]))) 11 {*cmpsi_1} (nil)) (insn 21 19 0 (set (reg/v:SI 99 [ c ]) (if_then_else:SI (ne (reg:CCZ 17 flags) (const_int 0 [0])) (reg/v:SI 99 [ c ]) (reg:SI 101))) 1438 {*movsicc_noc} (nil))
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 Andrew Pinski changed: What|Removed |Added Component|target |rtl-optimization Known to fail||13.1.0 Known to work||12.3.0 Summary|missed optimization: cmove |[13/14 Regression] missed |not used with multiple |optimization: cmove not |returns |used with multiple returns Target Milestone|--- |13.3 --- Comment #2 from Andrew Pinski --- The difference between GCC 12 and GCC 13 is: GCC 13: ``` IF-THEN-JOIN block found, pass 1, test 2, then 3, join 4 ``` GCC 12 and before: ``` IF-THEN-ELSE-JOIN block found, pass 1, test 2, then 3, else 4, join 5 ``` Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works on aarch64. Note there are some new changes to ifcvt.cc in review which might improve this, though I am not sure.