[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-09 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #11 from Xi Ruoyao --- > and tree_nonzero_bits fails to _21 is either 0 or 1, for some reason. ^ report

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-09 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #10 from Xi Ruoyao --- For the original test case we get: x16_27 = _26 & 1; data_28 = data_15 >> 1; _29 = crc_18 >> 1; _21 = (short unsigned int) x16_27; _13 = _21 * 40961; and tree_nonzero_bits fails to _21 is either 0

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #9 from Xi Ruoyao --- And in fact the optimal code for int t(int x, _Bool y) { return x * y; } should be maskeqz $r4,$r4,$r5 jr $r1 like int t(int x, _Bool y) { return y ? x : 0; }

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #8 from Xi Ruoyao --- (In reply to Andrew Pinski from comment #7) > (In reply to Xi Ruoyao from comment #5) > > > > so we still slightly penalty multiplication. To me we should code > > COSTS_N_INSNS (1) + 1 into

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #7 from Andrew Pinski --- (In reply to Xi Ruoyao from comment #5) > > so we still slightly penalty multiplication. To me we should code > COSTS_N_INSNS (1) + 1 into loongarch_rtx_cost_optimize_size instead of > special casing it

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #6 from Xi Ruoyao --- On a LA664 it seems a mul.w instruction costs 4 times a "simple" instruction like add.w/sub.w/and, and div.w costs 5 times.

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #5 from Xi Ruoyao --- (In reply to Andrew Pinski from comment #4) > /* Default RTX cost initializer. */ > ... > int_mult_si (COSTS_N_INSNS (1)), > int_mult_di (COSTS_N_INSNS (1)), > > > That seems wrong. > I suspect you

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 --- Comment #4 from Andrew Pinski --- /* Default RTX cost initializer. */ ... int_mult_si (COSTS_N_INSNS (1)), int_mult_di (COSTS_N_INSNS (1)), That seems wrong. I suspect you will get other improvements when you touch this. E.g.

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

2023-12-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935 Andrew Pinski changed: What|Removed |Added Keywords|needs-bisection |missed-optimization