[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization

2021-01-21 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #5 from Jakub Jelinek  ---
So I think we want to improve that
+  /* If more than a word remains, then make sure to keep the
+ starting point at least word aligned.  */
+  if (last_live - first_live > UNITS_PER_WORD)
+*trim_head &= (UNITS_PER_WORD - 1);

Note, last_live is the start of the last live byte (so last_live + 1 is the end
of that).
For the small sizes, I'd say we should consider both alignment and exact
head/tail trim values.
Whole word store is definitely more efficient than 7 bytes store at offset 1,
ditto head trim 2 and 3, storing just second half is ok.
So shall we e.g. call by_pieces_ninsns for the before/after the expected
triming and determine only trim if it doesn't increase number of by pieces
store insns?
It could also iterate on those.

[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization

2021-01-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Jakub Jelinek  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
I agree it would be weird to try to undo the
-  D.2365 = {};
+  MEM  [(struct Data *) + 1B] = {};
transformation by DSE in store_merging instead of adjusting the DSE
optimization to take into account costs and likely ways how the clearing will
be expanded.
On the other side, the user could have written it that way.

Regressed with r9-1663-g99e87c0eef2f6020a3ded2c785389939c07ac04e aka PR86010
fix.

[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization

2021-01-06 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #3 from Eric Botcazou  ---
> We expand the first case from
> 
>   MEM  [(struct Data *) + 1B] = {};
>   c.0_1 = c;
>   D.2365.a = c.0_1;
>   return D.2365;

But why generate a 7-byte zeroing instead of a 8-byte one?  I gather this is
the cause of the regression.

[Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization

2021-01-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Richard Biener  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
   Keywords||missed-optimization
   Priority|P3  |P2
  Component|rtl-optimization|tree-optimization
   Last reconfirmed||2021-01-04
 Ever confirmed|0   |1
   Target Milestone|--- |9.4
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
We expand the first case from

  MEM  [(struct Data *) + 1B] = {};
  c.0_1 = c;
  D.2365.a = c.0_1;
  return D.2365;

I guess store-merging could "merge" the stores as

  D.2365 = {};
  D.2365.a = c.0_1;

thus figure the partial unaligned zeroing is better done aligned
(and redundant).  Alternatively it could emit

  V_C_E = (unsigned) c.0_1;

The second testcase looks vectorization/ABI related for which we have plenty
of dups.