https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107445
Bug ID: 107445 Summary: Redundant moves for subregs move Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Here is the testcase: https://godbolt.org/z/4bcfx5a86 coalesce: ld4d {z24.d - z27.d}, p0/z, [x0] mov z5.d, z24.d mov z4.d, z25.d mov z3.d, z26.d mov z2.d, z27.d cmp w1, 0 ble .L2 mov w0, 0 ld1d z1.d, p0/z, [x2] ld1d z0.d, p0/z, [x3] .L3: add w0, w0, 1 movprfx z5.d, p0/z, z5.d mad z5.d, p0/m, z1.d, z0.d movprfx z4.d, p0/z, z4.d mad z4.d, p0/m, z1.d, z0.d movprfx z3.d, p0/z, z3.d mad z3.d, p0/m, z1.d, z0.d movprfx z2.d, p0/z, z2.d mad z2.d, p0/m, z1.d, z0.d cmp w1, w0 bne .L3 It's obvious that we should be able to directly use z24 - z27 to perform mad oberation and remove the redundant moves in the prologue. I tried to implement register coalescing in IRA to fix it, but I am not sure whether it's a good idea to fix it?