https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840
Bug ID: 108840 Summary: Aarch64 doesn't optimize away shift counter masking Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- As mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612214.html aarch64 doesn't optimize away and instructions masking shift count if there is more than one shift with the same count. Consider -O2 -fno-tree-vectorize: int foo (int x, int y) { return x << (y & 31); } void bar (int x[3], int y) { x[0] <<= (y & 31); x[1] <<= (y & 31); x[2] <<= (y & 31); } void baz (int x[3], int y) { y &= 31; x[0] <<= y; x[1] <<= y; x[2] <<= y; } void corge (int, int, int); void qux (int x, int y, int z, int n) { n &= 31; corge (x << n, y << n, z >> n); } foo is optimized correctly, combine matches the shift with masking, but in the rest of cases due to costs the desirable combination is rejected. Shift with embedded masking of the count should have rtx_cost the same as normal shift when it is actually under the hood the shift itself.