https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935
Bug ID: 112935 Summary: [14 Regression] Performance regression in Coremarks crcu8 function Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: xry111 at gcc dot gnu.org Target Milestone: --- typedef __UINT8_TYPE__ ee_u8; typedef __UINT16_TYPE__ ee_u16; ee_u16 crcu8(ee_u8 data, ee_u16 crc) { ee_u8 i = 0, x16 = 0, carry = 0; for (i = 0; i < 8; i++) { x16 = (ee_u8)((data & 1) ^ ((ee_u8)crc & 1)); data >>= 1; if (x16 == 1) { crc ^= 0x4002; carry = 1; } else carry = 0; crc >>= 1; if (carry) crc |= 0x8000; else crc &= 0x7fff; } return crc; } With GCC 13.2.0 -O2, on LoongArch we get: .L2: xor $r12,$r4,$r14 andi $r12,$r12,1 sub.w $r12,$r0,$r12 srli.w $r4,$r4,1 and $r12,$r12,$r15 addi.w $r13,$r13,-1 xor $r12,$r12,$r4 bstrpick.w $r13,$r13,7,0 srli.d $r14,$r14,1 bstrpick.w $r4,$r12,15,0 bnez $r13,.L2 With GCC 14.0.0 -O2: .L2: xor $r12,$r4,$r14 andi $r12,$r12,1 mul.w $r12,$r12,$r15 srli.w $r4,$r4,1 addi.w $r13,$r13,-1 bstrpick.w $r13,$r13,7,0 srli.d $r14,$r14,1 xor $r12,$r12,$r4 bstrpick.w $r4,$r12,15,0 bnez $r13,.L2 mul.w is slower than sub.w + and. I'm now setting components to tree-optimization because the difference already exists in 254t.optimized vs 263t.optimized. But maybe the tree optimizer is doing things correctly and we should just add a target-specific optimization.