[Bug tree-optimization/88709] Improve store-merging

2019-05-10 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

Ramana Radhakrishnan  changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu.org

--- Comment #13 from Ramana Radhakrishnan  ---
(In reply to Richard Earnshaw from comment #9)
> (In reply to Jakub Jelinek from comment #7)
> > (In reply to Christophe Lyon from comment #6)
> > > I've noticed that the new test store_merging_29.c fails on
> > > arm-none-eabi --with-cpu cortex-a9
> > > FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence
> > > of 3 stores to replace old one of 6 stores"
> > 
> > That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9
> > STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but
> > check_effective_target_non_strict_align returns true on arm anyway.
> > I've already added hacks for it in r256783 in another testcase though, guess
> > I'll do something similar now, but I must say I'm not very excited about
> > that.
> 
> Support for misaligned accesses is a three(.5!)-valued problem:
> 
> 1) There's no support in the architecture at all
> 2) There's some support with a limited set of instructions
> 3) There's full support: any memory access can handle any alignment.
> 3.5) There's full support: but some accesses may be very slow
> 
> I would think that these days most CPU architectures actually fall into
> either 1 or 2.  Many architectures have limitations, for example on atomic
> accesses that are unaligned.
> 
> STRICT_ALIGNMENT only covers, in reality case 3.  I'm not even sure if it
> would be defined on a machine with case 3.5.
> 
> I think the real problem here is that it's not clear what question this
> target-supports macro is really asking - does the CPU have the capability to
> do (some) unaligned acceses?  or can it arbitrarily support casts from
> unaligned pointers to standard types?

I agree - it sounds like this should be put in a comment next to the
target-supports query.

[Bug tree-optimization/88709] Improve store-merging

2019-05-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #12 from Jakub Jelinek  ---
Author: jakub
Date: Fri May 10 07:53:23 2019
New Revision: 271056

URL: https://gcc.gnu.org/viewcvs?rev=271056=gcc=rev
Log:
PR tree-optimization/88709
PR tree-optimization/90271
* gcc.dg/store_merging_29.c: Allow 4 stores to replace 6 stores on
arm*-*-*.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/store_merging_29.c

[Bug tree-optimization/88709] Improve store-merging

2019-05-09 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #11 from Richard Earnshaw  ---
And in the testcase that prompted Ramana's original patch it clearly wanted to
ask something else.

We can't have it both ways.

[Bug tree-optimization/88709] Improve store-merging

2019-05-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #10 from Jakub Jelinek  ---
In this and many other testcases it wants to ask is STRICT_ALIGNMENT non-zero?

[Bug tree-optimization/88709] Improve store-merging

2019-05-09 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #9 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to Christophe Lyon from comment #6)
> > I've noticed that the new test store_merging_29.c fails on
> > arm-none-eabi --with-cpu cortex-a9
> > FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence
> > of 3 stores to replace old one of 6 stores"
> 
> That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9
> STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but
> check_effective_target_non_strict_align returns true on arm anyway.
> I've already added hacks for it in r256783 in another testcase though, guess
> I'll do something similar now, but I must say I'm not very excited about
> that.

Support for misaligned accesses is a three(.5!)-valued problem:

1) There's no support in the architecture at all
2) There's some support with a limited set of instructions
3) There's full support: any memory access can handle any alignment.
3.5) There's full support: but some accesses may be very slow

I would think that these days most CPU architectures actually fall into either
1 or 2.  Many architectures have limitations, for example on atomic accesses
that are unaligned.

STRICT_ALIGNMENT only covers, in reality case 3.  I'm not even sure if it would
be defined on a machine with case 3.5.

I think the real problem here is that it's not clear what question this
target-supports macro is really asking - does the CPU have the capability to do
(some) unaligned acceses?  or can it arbitrarily support casts from unaligned
pointers to standard types?

[Bug tree-optimization/88709] Improve store-merging

2019-05-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #8 from Jakub Jelinek  ---
Created attachment 46327
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46327=edit
gcc10-pr88709-test.patch

Untested patch for the testsuite (well, I've tested it on x86_64-linux,
together with some tweaks to test the 3 variants - replaced the [34] with [4],
so that the second regex doesn't match for me on x86_64 and then testing
normally, with be and le swapped and with x86_64 instead of arm).

[Bug tree-optimization/88709] Improve store-merging

2019-05-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #7 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #6)
> I've noticed that the new test store_merging_29.c fails on
> arm-none-eabi --with-cpu cortex-a9
> FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence
> of 3 stores to replace old one of 6 stores"

That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9
STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but
check_effective_target_non_strict_align returns true on arm anyway.
I've already added hacks for it in r256783 in another testcase though, guess
I'll do something similar now, but I must say I'm not very excited about that.

[Bug tree-optimization/88709] Improve store-merging

2019-05-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #6 from Christophe Lyon  ---
I've noticed that the new test store_merging_29.c fails on
arm-none-eabi --with-cpu cortex-a9
FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence of 3
stores to replace old one of 6 stores"

[Bug tree-optimization/88709] Improve store-merging

2019-05-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #5 from Jakub Jelinek  ---
Author: jakub
Date: Mon May  6 21:50:14 2019
New Revision: 270924

URL: https://gcc.gnu.org/viewcvs?rev=270924=gcc=rev
Log:
PR tree-optimization/88709
PR tree-optimization/90271
* params.def (PARAM_STORE_MERGING_MAX_SIZE): New parameter.
* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Handle
non-clobber CONSTRUCTORs with no elts.  Remove useless tmp_int
variable.
(imm_store_chain_info::coalesce_immediate_stores): Punt if the size
of the store merging group is larger than
PARAM_STORE_MERGING_MAX_SIZE parameter.
(split_group): Add bzero_first argument.  If set, always emit first
the first store which must be = {} of the whole area and then for the
rest of the stores consider all zero bytes as paddings.
(imm_store_chain_info::output_merged_store): Check if first store
is = {} of the whole area and if yes, determine which setting of
bzero_first for split_group gives smaller number of stores.  Adjust
split_group callers.
(lhs_valid_for_store_merging_p): Allow decls.
(rhs_valid_for_store_merging_p): Allow non-clobber CONTRUCTORs with
no elts.
(pass_store_merging::process_store): Likewise.

* gcc.dg/store_merging_26.c: New test.
* gcc.dg/store_merging_27.c: New test.
* gcc.dg/store_merging_28.c: New test.
* gcc.dg/store_merging_29.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/store_merging_26.c
trunk/gcc/testsuite/gcc.dg/store_merging_27.c
trunk/gcc/testsuite/gcc.dg/store_merging_28.c
trunk/gcc/testsuite/gcc.dg/store_merging_29.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-ssa-store-merging.c
trunk/gcc/params.def
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/88709] Improve store-merging

2019-05-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 46278
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46278=edit
gcc10-pr88709-wip.patch

Further progress on the patch.

[Bug tree-optimization/88709] Improve store-merging

2019-04-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #3 from Jakub Jelinek  ---
WIP:
--- gcc/gimple-ssa-store-merging.c.jj   2019-01-01 12:37:19.063943678 +0100
+++ gcc/gimple-ssa-store-merging.c  2019-04-29 19:02:55.992151104 +0200
@@ -1615,13 +1615,31 @@ encode_tree_to_bitpos (tree expr, unsign
   unsigned int total_bytes)
 {
   unsigned int first_byte = bitpos / BITS_PER_UNIT;
-  tree tmp_int = expr;
   bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
|| (bitpos % BITS_PER_UNIT)
|| !int_mode_for_size (bitlen, 0).exists ());
+  bool empty_ctor_p
+= (TREE_CODE (expr) == CONSTRUCTOR
+   && CONSTRUCTOR_NELTS (expr) == 0
+   && TYPE_SIZE_UNIT (TREE_TYPE (expr))
+  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE
(expr;

   if (!sub_byte_op_p)
-return native_encode_expr (tmp_int, ptr + first_byte, total_bytes) != 0;
+{
+  if (first_byte >= total_bytes)
+   return false;
+  total_bytes -= first_byte;
+  if (empty_ctor_p)
+   {
+ unsigned HOST_WIDE_INT rhs_bytes
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
+ if (rhs_bytes > total_bytes)
+   return false;
+ memset (ptr + first_byte, '\0', rhs_bytes);
+ return true;
+   }
+  return native_encode_expr (expr, ptr + first_byte, total_bytes) != 0;
+}

   /* LITTLE-ENDIAN
  We are writing a non byte-sized quantity or at a position that is not
@@ -1667,14 +1685,29 @@ encode_tree_to_bitpos (tree expr, unsign

   /* We must be dealing with fixed-size data at this point, since the
  total size is also fixed.  */
-  fixed_size_mode mode = as_a  (TYPE_MODE (TREE_TYPE
(expr)));
+  unsigned int byte_size;
+  if (empty_ctor_p)
+{
+  unsigned HOST_WIDE_INT rhs_bytes
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
+  if (rhs_bytes > total_bytes)
+   return false;
+  byte_size = rhs_bytes;
+}
+  else
+{
+  fixed_size_mode mode
+   = as_a  (TYPE_MODE (TREE_TYPE (expr)));
+  byte_size = GET_MODE_SIZE (mode);
+}
   /* Allocate an extra byte so that we have space to shift into.  */
-  unsigned int byte_size = GET_MODE_SIZE (mode) + 1;
+  byte_size++;
   unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size);
   memset (tmpbuf, '\0', byte_size);
   /* The store detection code should only have allowed constants that are
- accepted by native_encode_expr.  */
-  if (native_encode_expr (expr, tmpbuf, byte_size - 1) == 0)
+ accepted by native_encode_expr or empty ctors.  */
+  if (!empty_ctor_p
+  && native_encode_expr (expr, tmpbuf, byte_size - 1) == 0)
 gcc_unreachable ();

   /* The native_encode_expr machinery uses TYPE_MODE to determine how many
@@ -4164,7 +4197,8 @@ lhs_valid_for_store_merging_p (tree lhs)
   tree_code code = TREE_CODE (lhs);

   if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF
-  || code == COMPONENT_REF || code == BIT_FIELD_REF)
+  || code == COMPONENT_REF || code == BIT_FIELD_REF
+  || DECL_P (lhs))
 return true;

   return false;
@@ -4178,6 +4212,11 @@ static bool
 rhs_valid_for_store_merging_p (tree rhs)
 {
   unsigned HOST_WIDE_INT size;
+  if (TREE_CODE (rhs) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (rhs) == 0
+  && TYPE_SIZE_UNIT (TREE_TYPE (rhs))
+  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (rhs
+return true;
   return (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (rhs))).is_constant ()
  && native_encode_expr (rhs, NULL, size) != 0);
 }

[Bug tree-optimization/88709] Improve store-merging

2019-01-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

--- Comment #2 from Jakub Jelinek  ---
Note, especially with large constructors, unlike other overlapping stores it
might not be feasible to merge the large clearing or memset with the other
stores, but still it might be possible to merge several later stores using info
on what we know has been stored in the gaps in between.
So, say:
struct S { char a[16]; };

void
foo (void)
{
  struct S s = {};
  s.a[3] = 1; // We don't want to bump this into a larger store unless we can
  // eliminate the whole 16 byte initialization through it (still, in this case
  // it is likely beneficial, but we should see what would that = {} be
expanded
  // to, depending on can_store_by_pieces etc.
  bar ();
}

struct T { char a[1024]; };

void
foo (void)
{
  struct T t = {};
  t.a[54] = 1;
  t.a[52] = 2;  // But we can using the info that t = {} cleared the whole
  // var merge these two stores, not using unnecessary masking etc. because
  // we know what the gaps will contain and can even extend the store
  // on either side or both to find optimal store size
  bar ();
}

[Bug tree-optimization/88709] Improve store-merging

2019-01-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-07
 Ever confirmed|0   |1

[Bug tree-optimization/88709] Improve store-merging

2019-01-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709

Jakub Jelinek  changed:

   What|Removed |Added

 CC||redi at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
Compared to the first testcase, we do handle
struct S { char buf[8]; };
void bar (struct S *);

void
foo (void)
{
  struct S s;
  int a = 0;
  __builtin_memcpy ([4], , sizeof (int));
  s.buf[0] = 5;
  s.buf[1] = 2;
  s.buf[2] = 3;
  s.buf[3] = 2;
  s.buf[5] = 7;
  bar ();
}

though, because the store is in that case MEM[ + 4B] = {} and thus valid for
lhs.