Re: [PR 80293] Don't totally-sRA char arrays

2017-04-20 Thread Martin Jambor
Hi,

On Mon, Apr 17, 2017 at 09:35:18PM +0200, Martin Jambor wrote:
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  
> > wrote:

...

> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of 
> > non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.
> 

To gather some data for this, I have looked at SRA statistics gathered
for SPEC 2006 benchmarks.  Complete tables are below, let me comment
on the diffs, though.  The most interesting column is probably the
third one (the second one with numbers).

This is the difference between my patch that special-cases char arrays
and disallowing any non-const-pool array total scalarization:

 | Benchmark - pass  | scalarized aggregates | Totally scalarized agregates 
| # scalar replacements | Modified exprs | Deleted stmts |
 
|---+---+--+---++---|
-| 403.gcc - esra|   138 |   12 
|   458 |   1274 |20 |
-| 403.gcc - sra |42 |   11 
|   141 |457 |13 |
+| 403.gcc - esra|   132 |4 
|   370 |   1274 |13 |
+| 403.gcc - sra |43 |8 
|   140 |459 |12 |
-| 447.dealII - esra | 12899 | 7826 
| 16347 |  29046 |  1840 |
-| 447.dealII - sra  |  1083 |  567 
|  2298 |   5479 |   392 |
+| 447.dealII - esra | 12897 | 7790 
| 16329 |  29046 |  1838 |
+| 447.dealII - sra  |  1079 |  553 
|  2262 |   5479 |   388 |
-| 450.soplex - sra  |42 |2 
|87 |263 | 2 |
+| 450.soplex - sra  |42 |0 
|81 |263 | 2 |
-| 453.povray - esra |   265 |   10 
|   812 |   3422 |12 |
-| 453.povray - sra  |76 |6 
|   217 |786 |57 |
+| 453.povray - esra |   264 |8 
|   806 |   3422 |11 |
+| 453.povray - sra  |76 |5 
|   215 |786 |57 |
-| 465.tonto - esra  |  4029 |5 
|  8269 |  21351 | 5 |
+| 465.tonto - esra  |  4024 |0 
|  8233 |  21351 | 0 |

For reference, this is the difference between (2 week old) trunk and
my proposed patch:

 | Benchmark - pass  | scalarized aggregates | Totally scalarized agregates 
| # scalar replacements | Modified exprs | Deleted stmts |
 
|---+---+--+---++---|
-| 401.bzip2 - sra   | 3 |2 
|22 | 62 | 0 |
+| 401.bzip2 - sra   | 3 |0 
|22 | 62 | 0 |
-| 403.gcc - esra|   138 |   13 
|   458 |   1274 |20 |
-| 403.gcc - sra |42 |   12 
|   141 |457 |13 |
+| 403.gcc - esra|   138 |   12 
|   458 |   1274 |20 |
+| 403.gcc - sra |42 |   

Re: [PR 80293] Don't totally-sRA char arrays

2017-04-20 Thread Richard Biener
On Mon, 17 Apr 2017, Martin Jambor wrote:

> Hi,
> 
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  
> > wrote:
> > >Hi,
> > >
> > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> > >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> > >> 
> > >> > Hi,
> > >> > 
> > >> > the patch below is an attempt to deal with PR 80293 as
> > >non-invasively
> > >> > as possible.  Basically, it switches off total SRA scalarization of
> > >> > any local aggregates which contains an array of elements that have
> > >one
> > >> > byte (or less).
> > >> > 
> > >> > The logic behind this is that accessing such arrays element-wise
> > >> > usually results in poor code and that such char arrays are often
> > >used
> > >> > for non-statically-typed content anyway, and we do not want to copy
> > >> > that byte per byte.
> > >> > 
> > >> > Alan, do you think this could impact your constant pool
> > >scalarization
> > >> > too severely?
> > >> 
> > >> Hmm, isn't one of the issues that we have
> > >> 
> > >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >>   {
> > >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > >> <= max_scalarization_size)
> > >>   {
> > >> create_total_scalarization_access (var);
> > >> 
> > >> which limits the size of scalarizable vars but not the number
> > >> of accesses we create for total scalarization?
> > >
> > >Well, yes, but at least I understood from your comment #4 in the bug
> > >that you did not want to limit the number of replacements for gcc 7?
> > >
> > >> 
> > >> Is scalarizable_type_p only used in contexts where we have no hint
> > >> of the actual accesses?
> > >
> > >There are should_scalarize_away_bitmap and
> > >cannot_scalarize_away_bitmap hints.
> > >
> > >Total scalarization is a hack process where we chop up small
> > >aggregates according to their types - as opposed to actual accesses,
> > >meaning it is an alien process to the rest of SRA - knowing that they
> > >will go completely away afterwards (that is ensured by
> > >cannot_scalarize_away_bitmap) and that this will facilitate copy
> > >propagation (this is indicated by should_scalarize_away_bitmap, which
> > >has a bit set if there is a non-scalar assignment _from_ (a part of)
> > >the aggregate).
> > 
> > OK, but for the copy x = y where x would go away it still depends on uses 
> > of x what pieces we actually want?  Or is total scalarization only done for 
> > x  = y; z = x;?
> > Thus no further accesses to x?
> 
> Total scalarization adds artificial accesses only to y, but
> (in both cases of total and "natural" scalarization) for all aggregate
> assignments between SRA candidates, SRA attempts to recreate all
> accesses covering RHS to LHS.  Transitively.  So the artificial
> accesses created for y will then get created for x and z even if they
> would not be candidates for total scalarization.  So e.g. if z cannot
> go away because it is passed to a function, it will be loaded
> piece-wise from y.

So what I was trying to figure out is whether there is anything that
fores us to totally scalarize using "natural" elements rather than
using larger accesses?  For

  x = y;
  z = x;

the best pieces to chop x to depends on the write accesses to y
and the read accesses from z (we eventually want to easily match them
up for CSE).

I see we first create total scalarization accesses and only then
doing propagate_all_subaccesses.

> > 
> > >> That is, for the constant pool case we
> > >> usually have
> > >> 
> > >>   x = .LC0;
> > >>   .. = x[2];
> > >> 
> > >> so we have a "hint" that accesses on x are those we'd want to
> > >> optimize to accesses to .LC0.
> > >
> > >You don't need total scalarization for this, just the existence of
> > >read from x[2] would be sufficient but thanks for pointing me to the
> > >right direction.  For constant pool decl scalarization, it is not
> > >important to introduce artificial accesses for x but for .LC0.
> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of 
> > non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.

Sure but this total scalarization was added to be able to defer
ctor "lowering" at gimplification time to SRA.  So it would be ok
(IMHO) to limit it to constant initializers.

But agreed 

Re: [PR 80293] Don't totally-sRA char arrays

2017-04-17 Thread Martin Jambor
Hi,

On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  wrote:
> >Hi,
> >
> >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> >> 
> >> > Hi,
> >> > 
> >> > the patch below is an attempt to deal with PR 80293 as
> >non-invasively
> >> > as possible.  Basically, it switches off total SRA scalarization of
> >> > any local aggregates which contains an array of elements that have
> >one
> >> > byte (or less).
> >> > 
> >> > The logic behind this is that accessing such arrays element-wise
> >> > usually results in poor code and that such char arrays are often
> >used
> >> > for non-statically-typed content anyway, and we do not want to copy
> >> > that byte per byte.
> >> > 
> >> > Alan, do you think this could impact your constant pool
> >scalarization
> >> > too severely?
> >> 
> >> Hmm, isn't one of the issues that we have
> >> 
> >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> >>   {
> >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> >> <= max_scalarization_size)
> >>   {
> >> create_total_scalarization_access (var);
> >> 
> >> which limits the size of scalarizable vars but not the number
> >> of accesses we create for total scalarization?
> >
> >Well, yes, but at least I understood from your comment #4 in the bug
> >that you did not want to limit the number of replacements for gcc 7?
> >
> >> 
> >> Is scalarizable_type_p only used in contexts where we have no hint
> >> of the actual accesses?
> >
> >There are should_scalarize_away_bitmap and
> >cannot_scalarize_away_bitmap hints.
> >
> >Total scalarization is a hack process where we chop up small
> >aggregates according to their types - as opposed to actual accesses,
> >meaning it is an alien process to the rest of SRA - knowing that they
> >will go completely away afterwards (that is ensured by
> >cannot_scalarize_away_bitmap) and that this will facilitate copy
> >propagation (this is indicated by should_scalarize_away_bitmap, which
> >has a bit set if there is a non-scalar assignment _from_ (a part of)
> >the aggregate).
> 
> OK, but for the copy x = y where x would go away it still depends on uses of 
> x what pieces we actually want?  Or is total scalarization only done for x  = 
> y; z = x;?
> Thus no further accesses to x?

Total scalarization adds artificial accesses only to y, but
(in both cases of total and "natural" scalarization) for all aggregate
assignments between SRA candidates, SRA attempts to recreate all
accesses covering RHS to LHS.  Transitively.  So the artificial
accesses created for y will then get created for x and z even if they
would not be candidates for total scalarization.  So e.g. if z cannot
go away because it is passed to a function, it will be loaded
piece-wise from y.

> 
> >> That is, for the constant pool case we
> >> usually have
> >> 
> >>   x = .LC0;
> >>   .. = x[2];
> >> 
> >> so we have a "hint" that accesses on x are those we'd want to
> >> optimize to accesses to .LC0.
> >
> >You don't need total scalarization for this, just the existence of
> >read from x[2] would be sufficient but thanks for pointing me to the
> >right direction.  For constant pool decl scalarization, it is not
> >important to introduce artificial accesses for x but for .LC0.
> >Therefore, I have adapted the patch to allow char arrays for const
> >decls only and verified that it scalarizes a const-pool array of chars
> >on Aarch64.  The (otherwise yet untested) result is below.
> >
> >What do you think?
> 
> Why special case char arrays?  I'd simply disallow total scalarization of 
> non-const arrays completely.

Well, currently we will get element-wise copy propagation for things
like "int rgb[3];" (possibly embeded in a small struct).  If I remove
it, someone will complain.  Maybe the correct limit is SI mode size or
BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
I just wanted to be conservative at this point in the release cycle.

Martin

> 
> >Martin
> >
> >
> >2017-04-13  Martin Jambor  
> >
> > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make
> > char arrays not totally scalarizable if it is false.
> > (analyze_all_variable_accesses): Pass correct value in the new
> > parameter.
> >
> >testsuite/
> > * g++.dg/tree-ssa/pr80293.C: New test.
> >---
> >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
> >+
> > gcc/tree-sra.c  | 21 ++-
> > 2 files changed, 60 insertions(+), 6 deletions(-)
> > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >
> >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >new file mode 100644
> >index 000..7faf35ae983
> >--- /dev/null
> >+++ 

Re: [PR 80293] Don't totally-sRA char arrays

2017-04-13 Thread Richard Biener
On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
>> On Wed, 12 Apr 2017, Martin Jambor wrote:
>> 
>> > Hi,
>> > 
>> > the patch below is an attempt to deal with PR 80293 as
>non-invasively
>> > as possible.  Basically, it switches off total SRA scalarization of
>> > any local aggregates which contains an array of elements that have
>one
>> > byte (or less).
>> > 
>> > The logic behind this is that accessing such arrays element-wise
>> > usually results in poor code and that such char arrays are often
>used
>> > for non-statically-typed content anyway, and we do not want to copy
>> > that byte per byte.
>> > 
>> > Alan, do you think this could impact your constant pool
>scalarization
>> > too severely?
>> 
>> Hmm, isn't one of the issues that we have
>> 
>> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>>   {
>> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>> <= max_scalarization_size)
>>   {
>> create_total_scalarization_access (var);
>> 
>> which limits the size of scalarizable vars but not the number
>> of accesses we create for total scalarization?
>
>Well, yes, but at least I understood from your comment #4 in the bug
>that you did not want to limit the number of replacements for gcc 7?
>
>> 
>> Is scalarizable_type_p only used in contexts where we have no hint
>> of the actual accesses?
>
>There are should_scalarize_away_bitmap and
>cannot_scalarize_away_bitmap hints.
>
>Total scalarization is a hack process where we chop up small
>aggregates according to their types - as opposed to actual accesses,
>meaning it is an alien process to the rest of SRA - knowing that they
>will go completely away afterwards (that is ensured by
>cannot_scalarize_away_bitmap) and that this will facilitate copy
>propagation (this is indicated by should_scalarize_away_bitmap, which
>has a bit set if there is a non-scalar assignment _from_ (a part of)
>the aggregate).

OK, but for the copy x = y where x would go away it still depends on uses of x 
what pieces we actually want?  Or is total scalarization only done for x  = y; 
z = x;?
Thus no further accesses to x?

>> That is, for the constant pool case we
>> usually have
>> 
>>   x = .LC0;
>>   .. = x[2];
>> 
>> so we have a "hint" that accesses on x are those we'd want to
>> optimize to accesses to .LC0.
>
>You don't need total scalarization for this, just the existence of
>read from x[2] would be sufficient but thanks for pointing me to the
>right direction.  For constant pool decl scalarization, it is not
>important to introduce artificial accesses for x but for .LC0.
>Therefore, I have adapted the patch to allow char arrays for const
>decls only and verified that it scalarizes a const-pool array of chars
>on Aarch64.  The (otherwise yet untested) result is below.
>
>What do you think?

Why special case char arrays?  I'd simply disallow total scalarization of 
non-const arrays completely.

>Martin
>
>
>2017-04-13  Martin Jambor  
>
>   * tree-sra.c (scalarizable_type_p): New parameter const_decl, make
>   char arrays not totally scalarizable if it is false.
>   (analyze_all_variable_accesses): Pass correct value in the new
>   parameter.
>
>testsuite/
>   * g++.dg/tree-ssa/pr80293.C: New test.
>---
>gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
>+
> gcc/tree-sra.c  | 21 ++-
> 2 files changed, 60 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>new file mode 100644
>index 000..7faf35ae983
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>@@ -0,0 +1,45 @@
>+// { dg-do compile }
>+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
>+
>+#include 
>+
>+// Return a copy of the underlying memory of an arbitrary value.
>+template <
>+typename T,
>+typename = typename
>std::enable_if::type
>+>
>+auto getMem(
>+T const & value
>+) -> std::array {
>+auto ret = std::array{};
>+__builtin_memcpy(ret.data(), , sizeof(T));
>+return ret;
>+}
>+
>+template <
>+typename T,
>+typename = typename
>std::enable_if::type
>+>
>+auto fromMem(
>+std::array const & buf
>+) -> T {
>+auto ret = T{};
>+__builtin_memcpy(, buf.data(), sizeof(T));
>+return ret;
>+}
>+
>+double foo1(std::uint64_t arg) {
>+return fromMem(getMem(arg));
>+}
>+
>+double foo2(std::uint64_t arg) {
>+return *reinterpret_cast();
>+}
>+
>+double foo3(std::uint64_t arg) {
>+double ret;
>+__builtin_memcpy(, , sizeof(arg));
>+return ret;
>+}
>+
>+// { dg-final { scan-tree-dump-not 

Re: [PR 80293] Don't totally-sRA char arrays

2017-04-13 Thread Martin Jambor
Hi,

On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> On Wed, 12 Apr 2017, Martin Jambor wrote:
> 
> > Hi,
> > 
> > the patch below is an attempt to deal with PR 80293 as non-invasively
> > as possible.  Basically, it switches off total SRA scalarization of
> > any local aggregates which contains an array of elements that have one
> > byte (or less).
> > 
> > The logic behind this is that accessing such arrays element-wise
> > usually results in poor code and that such char arrays are often used
> > for non-statically-typed content anyway, and we do not want to copy
> > that byte per byte.
> > 
> > Alan, do you think this could impact your constant pool scalarization
> > too severely?
> 
> Hmm, isn't one of the issues that we have
> 
> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>   {
> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> <= max_scalarization_size)
>   {
> create_total_scalarization_access (var);
> 
> which limits the size of scalarizable vars but not the number
> of accesses we create for total scalarization?

Well, yes, but at least I understood from your comment #4 in the bug
that you did not want to limit the number of replacements for gcc 7?

> 
> Is scalarizable_type_p only used in contexts where we have no hint
> of the actual accesses?

There are should_scalarize_away_bitmap and
cannot_scalarize_away_bitmap hints.

Total scalarization is a hack process where we chop up small
aggregates according to their types - as opposed to actual accesses,
meaning it is an alien process to the rest of SRA - knowing that they
will go completely away afterwards (that is ensured by
cannot_scalarize_away_bitmap) and that this will facilitate copy
propagation (this is indicated by should_scalarize_away_bitmap, which
has a bit set if there is a non-scalar assignment _from_ (a part of)
the aggregate).

> That is, for the constant pool case we
> usually have
> 
>   x = .LC0;
>   .. = x[2];
> 
> so we have a "hint" that accesses on x are those we'd want to
> optimize to accesses to .LC0.

You don't need total scalarization for this, just the existence of
read from x[2] would be sufficient but thanks for pointing me to the
right direction.  For constant pool decl scalarization, it is not
important to introduce artificial accesses for x but for .LC0.
Therefore, I have adapted the patch to allow char arrays for const
decls only and verified that it scalarizes a const-pool array of chars
on Aarch64.  The (otherwise yet untested) result is below.

What do you think?

Martin


2017-04-13  Martin Jambor  

* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
char arrays not totally scalarizable if it is false.
(analyze_all_variable_accesses): Pass correct value in the new
parameter.

testsuite/
* g++.dg/tree-ssa/pr80293.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 +
 gcc/tree-sra.c  | 21 ++-
 2 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
new file mode 100644
index 000..7faf35ae983
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
@@ -0,0 +1,45 @@
+// { dg-do compile }
+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
+
+#include 
+
+// Return a copy of the underlying memory of an arbitrary value.
+template <
+typename T,
+typename = typename 
std::enable_if::type
+>
+auto getMem(
+T const & value
+) -> std::array {
+auto ret = std::array{};
+__builtin_memcpy(ret.data(), , sizeof(T));
+return ret;
+}
+
+template <
+typename T,
+typename = typename 
std::enable_if::type
+>
+auto fromMem(
+std::array const & buf
+) -> T {
+auto ret = T{};
+__builtin_memcpy(, buf.data(), sizeof(T));
+return ret;
+}
+
+double foo1(std::uint64_t arg) {
+return fromMem(getMem(arg));
+}
+
+double foo2(std::uint64_t arg) {
+return *reinterpret_cast();
+}
+
+double foo3(std::uint64_t arg) {
+double ret;
+__builtin_memcpy(, , sizeof(arg));
+return ret;
+}
+
+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 02453d3ed9a..ab06be66131 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool write)
 
 /* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length
ARRAY_TYPE with fields that are either of gimple register types (excluding
-   bit-fields) or (recursively) scalarizable types.  */
+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must be 

Re: [PR 80293] Don't totally-sRA char arrays

2017-04-12 Thread Richard Biener
On Wed, 12 Apr 2017, Martin Jambor wrote:

> Hi,
> 
> the patch below is an attempt to deal with PR 80293 as non-invasively
> as possible.  Basically, it switches off total SRA scalarization of
> any local aggregates which contains an array of elements that have one
> byte (or less).
> 
> The logic behind this is that accessing such arrays element-wise
> usually results in poor code and that such char arrays are often used
> for non-statically-typed content anyway, and we do not want to copy
> that byte per byte.
> 
> Alan, do you think this could impact your constant pool scalarization
> too severely?

Hmm, isn't one of the issues that we have

if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
  {
if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
<= max_scalarization_size)
  {
create_total_scalarization_access (var);

which limits the size of scalarizable vars but not the number
of accesses we create for total scalarization?

Is scalarizable_type_p only used in contexts where we have no hint
of the actual accesses?  That is, for the constant pool case we
usually have

  x = .LC0;
  .. = x[2];

so we have a "hint" that accesses on x are those we'd want to
optimize to accesses to .LC0.  If we have no accesses on x
then we can as well scalarize using word_mode for example?

> Richi, if you or Alan does not object in a few days, I'd like to
> commit this in time for gcc7.  It has passed bootstrap and testing on
> x86_64-linux (but the constant pool SRA work was aimed primarily at
> ARM).

Maybe we can -- if this is the case here -- not completely scalarize
in case we don't know how the destination of the aggregate copy
is used?

> Thanks,
> 
> Martin
> 
> 
> 2017-04-10  Martin Jambor  
> 
>   * tree-sra.c (scalarizable_type_p): Make char arrays not totally
>   scalarizable.
> 
> testsuite/
>   * g++.dg/tree-ssa/pr80293.C: New test.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 
> +
>  gcc/tree-sra.c  |  2 +-
>  2 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> new file mode 100644
> index 000..7faf35ae983
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> @@ -0,0 +1,45 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
> +
> +#include 
> +
> +// Return a copy of the underlying memory of an arbitrary value.
> +template <
> +typename T,
> +typename = typename 
> std::enable_if::type
> +>
> +auto getMem(
> +T const & value
> +) -> std::array {
> +auto ret = std::array{};
> +__builtin_memcpy(ret.data(), , sizeof(T));
> +return ret;
> +}
> +
> +template <
> +typename T,
> +typename = typename 
> std::enable_if::type
> +>
> +auto fromMem(
> +std::array const & buf
> +) -> T {
> +auto ret = T{};
> +__builtin_memcpy(, buf.data(), sizeof(T));
> +return ret;
> +}
> +
> +double foo1(std::uint64_t arg) {
> +return fromMem(getMem(arg));
> +}
> +
> +double foo2(std::uint64_t arg) {
> +return *reinterpret_cast();
> +}
> +
> +double foo3(std::uint64_t arg) {
> +double ret;
> +__builtin_memcpy(, , sizeof(arg));
> +return ret;
> +}
> +
> +// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 02453d3ed9a..cbe9e862a2f 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -981,7 +981,7 @@ scalarizable_type_p (tree type)
>if (TYPE_DOMAIN (type) == NULL_TREE
> || !tree_fits_shwi_p (TYPE_SIZE (type))
> || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
> -   || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
> +   || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= BITS_PER_UNIT)
> || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type
>   return false;
>if (tree_to_shwi (TYPE_SIZE (type)) == 0
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PR 80293] Don't totally-sRA char arrays

2017-04-12 Thread Martin Jambor
Hi,

the patch below is an attempt to deal with PR 80293 as non-invasively
as possible.  Basically, it switches off total SRA scalarization of
any local aggregates which contains an array of elements that have one
byte (or less).

The logic behind this is that accessing such arrays element-wise
usually results in poor code and that such char arrays are often used
for non-statically-typed content anyway, and we do not want to copy
that byte per byte.

Alan, do you think this could impact your constant pool scalarization
too severely?

Richi, if you or Alan does not object in a few days, I'd like to
commit this in time for gcc7.  It has passed bootstrap and testing on
x86_64-linux (but the constant pool SRA work was aimed primarily at
ARM).

Thanks,

Martin


2017-04-10  Martin Jambor  

* tree-sra.c (scalarizable_type_p): Make char arrays not totally
scalarizable.

testsuite/
* g++.dg/tree-ssa/pr80293.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 +
 gcc/tree-sra.c  |  2 +-
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
new file mode 100644
index 000..7faf35ae983
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
@@ -0,0 +1,45 @@
+// { dg-do compile }
+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
+
+#include 
+
+// Return a copy of the underlying memory of an arbitrary value.
+template <
+typename T,
+typename = typename 
std::enable_if::type
+>
+auto getMem(
+T const & value
+) -> std::array {
+auto ret = std::array{};
+__builtin_memcpy(ret.data(), , sizeof(T));
+return ret;
+}
+
+template <
+typename T,
+typename = typename 
std::enable_if::type
+>
+auto fromMem(
+std::array const & buf
+) -> T {
+auto ret = T{};
+__builtin_memcpy(, buf.data(), sizeof(T));
+return ret;
+}
+
+double foo1(std::uint64_t arg) {
+return fromMem(getMem(arg));
+}
+
+double foo2(std::uint64_t arg) {
+return *reinterpret_cast();
+}
+
+double foo3(std::uint64_t arg) {
+double ret;
+__builtin_memcpy(, , sizeof(arg));
+return ret;
+}
+
+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 02453d3ed9a..cbe9e862a2f 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -981,7 +981,7 @@ scalarizable_type_p (tree type)
   if (TYPE_DOMAIN (type) == NULL_TREE
  || !tree_fits_shwi_p (TYPE_SIZE (type))
  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= BITS_PER_UNIT)
  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type
return false;
   if (tree_to_shwi (TYPE_SIZE (type)) == 0
-- 
2.12.0