[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335 --- Comment #5 from Jakub Jelinek --- So I think we want to improve that + /* If more than a word remains, then make sure to keep the + starting point at least word aligned. */ + if (last_live - first_live > UNITS_PER_WORD) +*trim_head &= (UNITS_PER_WORD - 1); Note, last_live is the start of the last live byte (so last_live + 1 is the end of that). For the small sizes, I'd say we should consider both alignment and exact head/tail trim values. Whole word store is definitely more efficient than 7 bytes store at offset 1, ditto head trim 2 and 3, storing just second half is ok. So shall we e.g. call by_pieces_ninsns for the before/after the expected triming and determine only trim if it doesn't increase number of by pieces store insns? It could also iterate on those.
[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335 Jakub Jelinek changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- I agree it would be weird to try to undo the - D.2365 = {}; + MEM [(struct Data *) + 1B] = {}; transformation by DSE in store_merging instead of adjusting the DSE optimization to take into account costs and likely ways how the clearing will be expanded. On the other side, the user could have written it that way. Regressed with r9-1663-g99e87c0eef2f6020a3ded2c785389939c07ac04e aka PR86010 fix.
[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335 --- Comment #3 from Eric Botcazou --- > We expand the first case from > > MEM [(struct Data *) + 1B] = {}; > c.0_1 = c; > D.2365.a = c.0_1; > return D.2365; But why generate a 7-byte zeroing instead of a 8-byte one? I gather this is the cause of the regression.
[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335 Richard Biener changed: What|Removed |Added CC||ebotcazou at gcc dot gnu.org, ||jakub at gcc dot gnu.org Keywords||missed-optimization Priority|P3 |P2 Component|rtl-optimization|tree-optimization Last reconfirmed||2021-01-04 Ever confirmed|0 |1 Target Milestone|--- |9.4 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener --- We expand the first case from MEM [(struct Data *) + 1B] = {}; c.0_1 = c; D.2365.a = c.0_1; return D.2365; I guess store-merging could "merge" the stores as D.2365 = {}; D.2365.a = c.0_1; thus figure the partial unaligned zeroing is better done aligned (and redundant). Alternatively it could emit V_C_E = (unsigned) c.0_1; The second testcase looks vectorization/ABI related for which we have plenty of dups.