Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3

2017-01-11 Thread Jeff Law

On 01/04/2017 06:23 AM, Richard Biener wrote:

On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
 wrote:

On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law  wrote:

This is the first of the 4 part patchkit to address deficiencies in our DSE
implementation.

This patch addresses the P2 regression 33562 which has been a low priority
regression since gcc-4.3.  To summarize, DSE no longer has the ability to
detect an aggregate store as dead if subsequent stores are done in a
piecemeal fashion.

I originally tackled this by changing how we lower complex objects. That was
sufficient to address 33562, but was reasonably rejected.

This version attacks the problem by improving DSE to track stores to memory
at a byte level.  That allows us to determine if a series of stores
completely covers an earlier store (thus making the earlier store dead).

A useful side effect of this is we can detect when parts of a store are dead
and potentially rewrite the store.  This patch implements that for complex
object initializations.  While not strictly part of 33562, it's so closely
related that I felt it belongs as part of this patch.

This originally limited the size of the tracked memory space to 64 bytes.  I
bumped the limit after working through the CONSTRUCTOR and mem* trimming
patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
sleep if we throttled back to 64 or 128 bytes.

Later patches in the kit will build upon this patch.  So if pieces look like
skeleton code, that's because it is.

The changes since the V2 patch are:

1. Using sbitmaps rather than bitmaps.
2. Returning a tri-state from dse_classify_store (renamed from
dse_possible_dead_store_p)
3. More efficient trim computation
4. Moving trimming code out of dse_classify_store
5. Refactoring code to delete dead calls/assignments
6. dse_optimize_stmt moves into the dse_dom_walker class

Not surprisingly, this patch has most of the changes based on prior feedback
as it includes the raw infrastructure.

Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?


New functions in sbitmap.c lack function comments.

bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
is not important, but non-GCC host compilers are).  See bitmap.c for a
fallback.

Both bitmap_clear_range and bitmap_set_range look rather inefficient...
(it's not likely anybody will clean this up after you)

I'd say split out the sbitmap.[ch] changes.

+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store
elimination.",
+256, 0, 0)

the docs suggest that DSE doesn't handle larger stores but it does (just in
the original limited way).  Maybe "tracked bytewise" is better.


Oh, and new --params need documeting in invoke.texi.

Fixed.

jeff



Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3

2017-01-11 Thread Jeff Law

On 01/04/2017 06:22 AM, Richard Biener wrote:



Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?


New functions in sbitmap.c lack function comments.

Bah.  Sophomoric on my part.  Fixed.



bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
is not important, but non-GCC host compilers are).  See bitmap.c for a
fallback.
Mistake on my part.  I keep thinking we support starting the bootstrap 
process with the most recently released GCC, but we support 3.4 as well 
as other C++98/C++03 compilers.  Fixed in the next update (and tested by 
forcing the fallback method).




Both bitmap_clear_range and bitmap_set_range look rather inefficient...
(it's not likely anybody will clean this up after you)
They were, but not anymore.  Now they build a mask to deal with any 
partial clearing/setting in the first word, then a single memset for any 
whole words in the middle, then another masking operation on residuals 
in the last word.  Verified behavior by by keeping two bitmaps, one with 
the old slow approach and one with the faster implementation and 
checking for equality.  Obviously those verification bits won't be in 
the final patch.



I'd say split out the sbitmap.[ch] changes.

Sure.  That's easy enough.



+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store
elimination.",
+256, 0, 0)

the docs suggest that DSE doesn't handle larger stores but it does (just in
the original limited way).  Maybe "tracked bytewise" is better.

Agreed and fixed.




+static bool
+valid_ao_ref_for_dse (ao_ref *ref)
+{
+  return (ao_ref_base (ref)
+ && ref->max_size != -1
+ && (ref->offset % BITS_PER_UNIT) == 0
+ && (ref->size % BITS_PER_UNIT) == 0
+ && (ref->size / BITS_PER_UNIT) > 0);

I think the last test is better written as ref->size != -1.

Seems reasonable.  Fixed.




Btw, seeing you discount non-byte size/offset stores this somehow asks
for store-merging being done before the last DSE (it currently runs after).
Sth to keep in mind for GCC 8.
Yea, probably.  Of course it may also be the case that DSE enables store 
merging.  Worth some experimentation.





+/* Delete a dead store STMT, which is mem* call of some kind.  */

call STMT

Fixed.



+static void
+delete_dead_call (gimple *stmt)
+{
+
excess vertical space

Likewise.


..
+  if (lhs)
+{
+  tree ptr = gimple_call_arg (stmt, 0);
+  gimple *new_stmt = gimple_build_assign (lhs, ptr);
+  unlink_stmt_vdef (stmt);
+  if (gsi_replace (, new_stmt, true))
+bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);

  release_ssa_name (gimple_vdef (stmt));


+  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
(PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }

formatting.

Yea. Fixed.



The DSE parts look good to me with the nits above fixed.  Just re-spin
the sbitmap.[ch] part please.

Will repost the sbitmap.c bits after retesting the series.

jeff




Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3

2017-01-04 Thread Richard Biener
On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
 wrote:
> On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law  wrote:
>> This is the first of the 4 part patchkit to address deficiencies in our DSE
>> implementation.
>>
>> This patch addresses the P2 regression 33562 which has been a low priority
>> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
>> detect an aggregate store as dead if subsequent stores are done in a
>> piecemeal fashion.
>>
>> I originally tackled this by changing how we lower complex objects. That was
>> sufficient to address 33562, but was reasonably rejected.
>>
>> This version attacks the problem by improving DSE to track stores to memory
>> at a byte level.  That allows us to determine if a series of stores
>> completely covers an earlier store (thus making the earlier store dead).
>>
>> A useful side effect of this is we can detect when parts of a store are dead
>> and potentially rewrite the store.  This patch implements that for complex
>> object initializations.  While not strictly part of 33562, it's so closely
>> related that I felt it belongs as part of this patch.
>>
>> This originally limited the size of the tracked memory space to 64 bytes.  I
>> bumped the limit after working through the CONSTRUCTOR and mem* trimming
>> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
>> sleep if we throttled back to 64 or 128 bytes.
>>
>> Later patches in the kit will build upon this patch.  So if pieces look like
>> skeleton code, that's because it is.
>>
>> The changes since the V2 patch are:
>>
>> 1. Using sbitmaps rather than bitmaps.
>> 2. Returning a tri-state from dse_classify_store (renamed from
>> dse_possible_dead_store_p)
>> 3. More efficient trim computation
>> 4. Moving trimming code out of dse_classify_store
>> 5. Refactoring code to delete dead calls/assignments
>> 6. dse_optimize_stmt moves into the dse_dom_walker class
>>
>> Not surprisingly, this patch has most of the changes based on prior feedback
>> as it includes the raw infrastructure.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
>
> New functions in sbitmap.c lack function comments.
>
> bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
> is not important, but non-GCC host compilers are).  See bitmap.c for a
> fallback.
>
> Both bitmap_clear_range and bitmap_set_range look rather inefficient...
> (it's not likely anybody will clean this up after you)
>
> I'd say split out the sbitmap.[ch] changes.
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +"dse-max-object-size",
> +"Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +256, 0, 0)
>
> the docs suggest that DSE doesn't handle larger stores but it does (just in
> the original limited way).  Maybe "tracked bytewise" is better.

Oh, and new --params need documeting in invoke.texi.

Richard.

> +static bool
> +valid_ao_ref_for_dse (ao_ref *ref)
> +{
> +  return (ao_ref_base (ref)
> + && ref->max_size != -1
> + && (ref->offset % BITS_PER_UNIT) == 0
> + && (ref->size % BITS_PER_UNIT) == 0
> + && (ref->size / BITS_PER_UNIT) > 0);
>
> I think the last test is better written as ref->size != -1.
>
> Btw, seeing you discount non-byte size/offset stores this somehow asks
> for store-merging being done before the last DSE (it currently runs after).
> Sth to keep in mind for GCC 8.
>
> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>
> call STMT
>
> +static void
> +delete_dead_call (gimple *stmt)
> +{
> +
> excess vertical space
> ..
> +  if (lhs)
> +{
> +  tree ptr = gimple_call_arg (stmt, 0);
> +  gimple *new_stmt = gimple_build_assign (lhs, ptr);
> +  unlink_stmt_vdef (stmt);
> +  if (gsi_replace (, new_stmt, true))
> +bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>
>   release_ssa_name (gimple_vdef (stmt));
>
>
> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>
> formatting.
>
> The DSE parts look good to me with the nits above fixed.  Just re-spin
> the sbitmap.[ch] part please.
>
> Thanks,
> Richard.
>
>>
>> PR tree-optimization/33562
>> * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
>> * sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
>> functions.
>> (bitmap_count_bits): Likewise.
>> * sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
>> (bitmap_count_bits): Likewise.
>> * tree-ssa-dse.c: Include params.h.
>> (dse_store_status): New enum.
>> (initialize_ao_ref_for_dse): New, partially extracted from
>> dse_optimize_stmt.
>> (valid_ao_ref_for_dse, normalize_ref): New.
>> (setup_live_bytes_from_ref, compute_trims): Likewise.
>> (clear_bytes_written_by, trim_complex_store): 

Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3

2017-01-04 Thread Richard Biener
On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law  wrote:
> This is the first of the 4 part patchkit to address deficiencies in our DSE
> implementation.
>
> This patch addresses the P2 regression 33562 which has been a low priority
> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
> detect an aggregate store as dead if subsequent stores are done in a
> piecemeal fashion.
>
> I originally tackled this by changing how we lower complex objects. That was
> sufficient to address 33562, but was reasonably rejected.
>
> This version attacks the problem by improving DSE to track stores to memory
> at a byte level.  That allows us to determine if a series of stores
> completely covers an earlier store (thus making the earlier store dead).
>
> A useful side effect of this is we can detect when parts of a store are dead
> and potentially rewrite the store.  This patch implements that for complex
> object initializations.  While not strictly part of 33562, it's so closely
> related that I felt it belongs as part of this patch.
>
> This originally limited the size of the tracked memory space to 64 bytes.  I
> bumped the limit after working through the CONSTRUCTOR and mem* trimming
> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
> sleep if we throttled back to 64 or 128 bytes.
>
> Later patches in the kit will build upon this patch.  So if pieces look like
> skeleton code, that's because it is.
>
> The changes since the V2 patch are:
>
> 1. Using sbitmaps rather than bitmaps.
> 2. Returning a tri-state from dse_classify_store (renamed from
> dse_possible_dead_store_p)
> 3. More efficient trim computation
> 4. Moving trimming code out of dse_classify_store
> 5. Refactoring code to delete dead calls/assignments
> 6. dse_optimize_stmt moves into the dse_dom_walker class
>
> Not surprisingly, this patch has most of the changes based on prior feedback
> as it includes the raw infrastructure.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

New functions in sbitmap.c lack function comments.

bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
is not important, but non-GCC host compilers are).  See bitmap.c for a
fallback.

Both bitmap_clear_range and bitmap_set_range look rather inefficient...
(it's not likely anybody will clean this up after you)

I'd say split out the sbitmap.[ch] changes.

+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store
elimination.",
+256, 0, 0)

the docs suggest that DSE doesn't handle larger stores but it does (just in
the original limited way).  Maybe "tracked bytewise" is better.

+static bool
+valid_ao_ref_for_dse (ao_ref *ref)
+{
+  return (ao_ref_base (ref)
+ && ref->max_size != -1
+ && (ref->offset % BITS_PER_UNIT) == 0
+ && (ref->size % BITS_PER_UNIT) == 0
+ && (ref->size / BITS_PER_UNIT) > 0);

I think the last test is better written as ref->size != -1.

Btw, seeing you discount non-byte size/offset stores this somehow asks
for store-merging being done before the last DSE (it currently runs after).
Sth to keep in mind for GCC 8.

+/* Delete a dead store STMT, which is mem* call of some kind.  */

call STMT

+static void
+delete_dead_call (gimple *stmt)
+{
+
excess vertical space
..
+  if (lhs)
+{
+  tree ptr = gimple_call_arg (stmt, 0);
+  gimple *new_stmt = gimple_build_assign (lhs, ptr);
+  unlink_stmt_vdef (stmt);
+  if (gsi_replace (, new_stmt, true))
+bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);

  release_ssa_name (gimple_vdef (stmt));


+  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
(PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }

formatting.

The DSE parts look good to me with the nits above fixed.  Just re-spin
the sbitmap.[ch] part please.

Thanks,
Richard.

>
> PR tree-optimization/33562
> * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
> * sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
> functions.
> (bitmap_count_bits): Likewise.
> * sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
> (bitmap_count_bits): Likewise.
> * tree-ssa-dse.c: Include params.h.
> (dse_store_status): New enum.
> (initialize_ao_ref_for_dse): New, partially extracted from
> dse_optimize_stmt.
> (valid_ao_ref_for_dse, normalize_ref): New.
> (setup_live_bytes_from_ref, compute_trims): Likewise.
> (clear_bytes_written_by, trim_complex_store): Likewise.
> (maybe_trim_partially_dead_store): Likewise.
> (maybe_trim_complex_store): Likewise.
> (dse_classify_store): Renamed from dse_possibly_dead_store_p.
> Track what bytes live from the original store.  Return tri-state
> for dead, partially dead or live.
> 

[RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE - v3

2016-12-21 Thread Jeff Law
This is the first of the 4 part patchkit to address deficiencies in our 
DSE implementation.


This patch addresses the P2 regression 33562 which has been a low 
priority regression since gcc-4.3.  To summarize, DSE no longer has the 
ability to detect an aggregate store as dead if subsequent stores are 
done in a piecemeal fashion.


I originally tackled this by changing how we lower complex objects. That 
was sufficient to address 33562, but was reasonably rejected.


This version attacks the problem by improving DSE to track stores to 
memory at a byte level.  That allows us to determine if a series of 
stores completely covers an earlier store (thus making the earlier store 
dead).


A useful side effect of this is we can detect when parts of a store are 
dead and potentially rewrite the store.  This patch implements that for 
complex object initializations.  While not strictly part of 33562, it's 
so closely related that I felt it belongs as part of this patch.


This originally limited the size of the tracked memory space to 64 
bytes.  I bumped the limit after working through the CONSTRUCTOR and 
mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
I wouldn't lose sleep if we throttled back to 64 or 128 bytes.


Later patches in the kit will build upon this patch.  So if pieces look 
like skeleton code, that's because it is.


The changes since the V2 patch are:

1. Using sbitmaps rather than bitmaps.
2. Returning a tri-state from dse_classify_store (renamed from 
dse_possible_dead_store_p)

3. More efficient trim computation
4. Moving trimming code out of dse_classify_store
5. Refactoring code to delete dead calls/assignments
6. dse_optimize_stmt moves into the dse_dom_walker class

Not surprisingly, this patch has most of the changes based on prior 
feedback as it includes the raw infrastructure.


Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

PR tree-optimization/33562
* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
* sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
functions.
(bitmap_count_bits): Likewise.
* sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
(bitmap_count_bits): Likewise.
* tree-ssa-dse.c: Include params.h.
(dse_store_status): New enum.
(initialize_ao_ref_for_dse): New, partially extracted from
dse_optimize_stmt.
(valid_ao_ref_for_dse, normalize_ref): New.
(setup_live_bytes_from_ref, compute_trims): Likewise.
(clear_bytes_written_by, trim_complex_store): Likewise.
(maybe_trim_partially_dead_store): Likewise.
(maybe_trim_complex_store): Likewise.
(dse_classify_store): Renamed from dse_possibly_dead_store_p.
Track what bytes live from the original store.  Return tri-state
for dead, partially dead or live.
(dse_dom_walker): Add constructor, destructor and new private members.
(delete_dead_call, delete_dead_assignment): New extracted from
dse_optimize_stmt.
(dse_optimize_stmt): Make a member of dse_dom_walker.
Use initialize_ao_ref_for_dse.


* gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
* gcc.dg/tree-ssa/complex-5.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

diff --git a/gcc/params.def b/gcc/params.def
index 50f75a7..f367c1d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
 "Average number of iterations of a loop.",
 10, 1, 0)
 
+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store 
elimination.",
+256, 0, 0)
+
 DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
 "scev-max-expr-size",
 "Bound on size of expressions used in the scalar evolutions analyzer.",
diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index 10b4347..2b66a6c 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -202,6 +202,39 @@ bitmap_empty_p (const_sbitmap bmap)
   return true;
 }
 
+void
+bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+bitmap_clear_bit (bmap, i);
+}
+
+void
+bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+bitmap_set_bit (bmap, i);
+}
+
+
+unsigned int
+bitmap_count_bits (const_sbitmap bmap)
+{
+  unsigned int count = 0;
+  for (unsigned int i = 0; i < bmap->size; i++)
+if (bmap->elms[i])
+  {
+# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
+   count += __builtin_popcountl (bmap->elms[i]);
+# elif HOST_BITS_PER_WIDEST_FAST_INT == 

Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-21 Thread Jeff Law

On 12/16/2016 06:57 AM, Richard Biener wrote:


Apart from what Trevor says about using sbitmaps (try to avoid the initial
zeroing please) and the missed freeing (you can use auto_[s]bitmap?)
some comments below.

New version uses sbitmaps and avoids zero-ing when we can.


+static void
+trim_complex_store (bitmap orig, bitmap live, gimple *stmt)
+{
+  bitmap dead = BITMAP_ALLOC (NULL);
+  bitmap_and_compl (dead, orig, live);
+
+  /* So we have to verify that either the real or imag part as a whole
+ is dead.  They are always the same size.  Thus for one to be dead
+ the number of live bytes would have to be the same as the number of
+ dead bytes.  */
+  if (bitmap_count_bits (live) == bitmap_count_bits (dead))


popcount is expensive, so is this really a short-cut?
Note this can all be done more efficiently.  Using sbitmaps forces us to 
normalize the offset/size and I've dropped the original bitmap as the 
ao_ref carries the data we need.  Those two changes make other 
simplifications fall-out and what we get in compute_trims a last_set_bit 
and first_set_bit on the live bitmap -- no other bitmap 
traversals/searches/counts.


That makes compute_trims more efficient than the code in 
trim_complex_store.  So I've made trim_complex_store re-use the more 
general trimming code, and it's still agnostic of the underlying sizes, 
it just cares that the underlying components are the same size.




The actual transform in dse_possible_dead_store_p looks a bit misplaced.
I see it's somehow convenient but then maybe returning a enum from this
function might be cleaner.  Well, I'm not too torn about this, so maybe
just rename the function a bit (no good suggestion either).

The rest of the patch (the infrastructure) looks reasonable.
I think I mentioned it earlier, but moving towards a single allocated 
bitmap for the entire pass makes it much easier to have 
dse_possible_dead_store_p to return a tri-state and live information. 
You'll see that change in the upcoming revision to the patchkit.


Jeff




Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-21 Thread Jeff Law

On 12/16/2016 12:29 AM, Trevor Saunders wrote:

On Thu, Dec 15, 2016 at 06:54:43PM -0700, Jeff Law wrote:

   unsigned cnt = 0;
+  bitmap live_bytes = NULL;
+  bitmap orig_live_bytes = NULL;

   *use_stmt = NULL;

+  /* REF is a memory write.  Go ahead and get its base, size, extent
+ information and encode the bytes written into LIVE_BYTES.  We can
+ handle any case where we have a known base and maximum size.
+
+ However, experimentation has shown that bit level tracking is not
+ useful in practice, so we only track at the byte level.
+
+ Furthermore, experimentation has shown that 99% of the cases
+ that require byte tracking are 64 bytes or less.  */
+  if (valid_ao_ref_for_dse (ref)
+  && (ref->max_size / BITS_PER_UNIT
+ <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+{
+  live_bytes = BITMAP_ALLOC (NULL);
+  orig_live_bytes = BITMAP_ALLOC (NULL);
+  bitmap_set_range (live_bytes,
+   ref->offset / BITS_PER_UNIT,
+   ref->max_size / BITS_PER_UNIT);
+  bitmap_copy (orig_live_bytes, live_bytes);


would it maybe make sense to use sbitmaps since the length is known to
be short, and doesn't change after allocation?

New version will use auto_sbitmap and sbitmaps.

Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-21 Thread Jeff Law

On 12/21/2016 06:43 AM, Trevor Saunders wrote:

So a few interesting things have to be dealt if we want to make this change.
I already mentioned the need to bias based on ref->offset so that the range
of bytes we're tracking is represented 0..size.

While we know the length of the potential dead store we don't know the
length of the subsequent stores that we're hoping make the original a dead
store.  Thus when we start clearing LIVE_BYTES based on those subsequent
stores, we have to normalize those against the ref->offset of the original
store.

What's even more fun is sizing.  Those subsequent stores may be considerably
larger.  Which means that a bitmap_clear_range call has to be a hell of a
lot more careful when working with sbitmaps (we just happily stop all over
memory in that case) whereas a bitmap the right things will "just happen".

On a positive size since we've normalized the potential dead store's byte
range to 0..size, it means computing trims is easier because we inherently
know how many bits were originally set.  So compute_trims becomes trivial
and we can simplify trim_complex_store a bit as well.

And, of course, we don't have a bitmap_{clear,set}_range or a
bitmap_count_bits implementation for sbitmaps.


It's all a bit of "ugh", but should be manageable.


yeah, but that seems like enough work that you could reasonably stick
with bitmap instead.
Well, the conversion is done :-)  It was largely as I expected with the 
devil being in the normalization details which are well isolated.




p.s. sorry I've been falling behind lately.
It happens.  You might want to peek briefly at Aldy's auto_bitmap class 
as part of the 71691 patches.  It looks like a fairly straightforward 
conversion to auto_*.  I'm sure there's all kinds of places we could use it.


Jeff



Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-21 Thread Trevor Saunders
On Sat, Dec 17, 2016 at 01:19:41AM -0700, Jeff Law wrote:
> On 12/16/2016 12:29 AM, Trevor Saunders wrote:
> > On Thu, Dec 15, 2016 at 06:54:43PM -0700, Jeff Law wrote:
> > >unsigned cnt = 0;
> > > +  bitmap live_bytes = NULL;
> > > +  bitmap orig_live_bytes = NULL;
> > > 
> > >*use_stmt = NULL;
> > > 
> > > +  /* REF is a memory write.  Go ahead and get its base, size, extent
> > > + information and encode the bytes written into LIVE_BYTES.  We can
> > > + handle any case where we have a known base and maximum size.
> > > +
> > > + However, experimentation has shown that bit level tracking is not
> > > + useful in practice, so we only track at the byte level.
> > > +
> > > + Furthermore, experimentation has shown that 99% of the cases
> > > + that require byte tracking are 64 bytes or less.  */
> > > +  if (valid_ao_ref_for_dse (ref)
> > > +  && (ref->max_size / BITS_PER_UNIT
> > > +   <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
> > > +{
> > > +  live_bytes = BITMAP_ALLOC (NULL);
> > > +  orig_live_bytes = BITMAP_ALLOC (NULL);
> > > +  bitmap_set_range (live_bytes,
> > > + ref->offset / BITS_PER_UNIT,
> > > + ref->max_size / BITS_PER_UNIT);
> > > +  bitmap_copy (orig_live_bytes, live_bytes);
> > 
> > would it maybe make sense to use sbitmaps since the length is known to
> > be short, and doesn't change after allocation?
> So a few interesting things have to be dealt if we want to make this change.
> I already mentioned the need to bias based on ref->offset so that the range
> of bytes we're tracking is represented 0..size.
> 
> While we know the length of the potential dead store we don't know the
> length of the subsequent stores that we're hoping make the original a dead
> store.  Thus when we start clearing LIVE_BYTES based on those subsequent
> stores, we have to normalize those against the ref->offset of the original
> store.
> 
> What's even more fun is sizing.  Those subsequent stores may be considerably
> larger.  Which means that a bitmap_clear_range call has to be a hell of a
> lot more careful when working with sbitmaps (we just happily stop all over
> memory in that case) whereas a bitmap the right things will "just happen".
> 
> On a positive size since we've normalized the potential dead store's byte
> range to 0..size, it means computing trims is easier because we inherently
> know how many bits were originally set.  So compute_trims becomes trivial
> and we can simplify trim_complex_store a bit as well.
> 
> And, of course, we don't have a bitmap_{clear,set}_range or a
> bitmap_count_bits implementation for sbitmaps.
> 
> 
> It's all a bit of "ugh", but should be manageable.

yeah, but that seems like enough work that you could reasonably stick
with bitmap instead.

Trev

p.s. sorry I've been falling behind lately.

> 
> Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-17 Thread Richard Sandiford
Jeff Law  writes:
> This is the first of the 4 part patchkit to address deficiencies in our 
> DSE implementation.
>
> This patch addresses the P2 regression 33562 which has been a low 
> priority regression since gcc-4.3.  To summarize, DSE no longer has the 
> ability to detect an aggregate store as dead if subsequent stores are 
> done in a piecemeal fashion.
>
> I originally tackled this by changing how we lower complex objects. 
> That was sufficient to address 33562, but was reasonably rejected.
>
> This version attacks the problem by improving DSE to track stores to 
> memory at a byte level.  That allows us to determine if a series of 
> stores completely covers an earlier store (thus making the earlier store 
> dead).
>
> A useful side effect of this is we can detect when parts of a store are 
> dead and potentially rewrite the store.  This patch implements that for 
> complex object initializations.  While not strictly part of 33562, it's 
> so closely related that I felt it belongs as part of this patch.
>
> This originally limited the size of the tracked memory space to 64 
> bytes.  I bumped the limit after working through the CONSTRUCTOR and 
> mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
> I wouldn't lose sleep if we throttled back to 64 or 128 bytes.

FWIW (and shouldn't affect whether the patch goes in)...

If SVE support is accepted for GCC 8 then we have the additional problem
that the sizes of useful aggregates can be runtime invariants.  For example,
in the SVE equivalent of float64x2x2_t, it would be good to be able to
detect that an assignment to the full structure is dead if we later
assign to both of the individual vectors.  Probably the most convenient
way of doing that would be to track ranges rather than individual bytes,
like the local part of the RTL DSE pass does.  (It was relatively easy
to convert local RTL DSE to variable-length modes, but the global part
also uses byte bitmaps.)

I guess bitmaps are going to be more efficient for small structures or
for accesses that occur in an arbitrary order.  But tracking ranges
means adding at most one fixed-size range record per update, so I suppose
the throttle would be on the number of records (= number of gaps + 1)
rather than the size of the structure.

Thanks,
Richard


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-17 Thread Richard Biener
On December 16, 2016 7:43:22 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Fri, Dec 16, 2016 at 06:35:58PM +, Joseph Myers wrote:
>> On Thu, 15 Dec 2016, Jeff Law wrote:
>> 
>> > This version attacks the problem by improving DSE to track stores
>to memory at
>> > a byte level.  That allows us to determine if a series of stores
>completely
>> > covers an earlier store (thus making the earlier store dead).
>> 
>> Question: suppose you have an assignment for a struct with padding,
>then 
>> stores for all the elements.  Does it detect that the original
>assignment 
>> is dead (because there is no need to copy padding on struct
>assignment)?

In GIMPLE struct assignment is equal to memcpy, so that info is lost after 
leaving frontend realm.

>We fold memcpy into struct assignment early, does the same apply also
>to
>memcpy?  Or shall we fold memcpy into struct assignment only when there
>is
>no padding (or find different IL representation of that)?

If we want to change that we can use a char[] type for copying on GIMPLE.

Richard.

>   Jakub




Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-17 Thread Jeff Law

On 12/16/2016 12:29 AM, Trevor Saunders wrote:

On Thu, Dec 15, 2016 at 06:54:43PM -0700, Jeff Law wrote:

   unsigned cnt = 0;
+  bitmap live_bytes = NULL;
+  bitmap orig_live_bytes = NULL;

   *use_stmt = NULL;

+  /* REF is a memory write.  Go ahead and get its base, size, extent
+ information and encode the bytes written into LIVE_BYTES.  We can
+ handle any case where we have a known base and maximum size.
+
+ However, experimentation has shown that bit level tracking is not
+ useful in practice, so we only track at the byte level.
+
+ Furthermore, experimentation has shown that 99% of the cases
+ that require byte tracking are 64 bytes or less.  */
+  if (valid_ao_ref_for_dse (ref)
+  && (ref->max_size / BITS_PER_UNIT
+ <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+{
+  live_bytes = BITMAP_ALLOC (NULL);
+  orig_live_bytes = BITMAP_ALLOC (NULL);
+  bitmap_set_range (live_bytes,
+   ref->offset / BITS_PER_UNIT,
+   ref->max_size / BITS_PER_UNIT);
+  bitmap_copy (orig_live_bytes, live_bytes);


would it maybe make sense to use sbitmaps since the length is known to
be short, and doesn't change after allocation?
So a few interesting things have to be dealt if we want to make this 
change.  I already mentioned the need to bias based on ref->offset so 
that the range of bytes we're tracking is represented 0..size.


While we know the length of the potential dead store we don't know the 
length of the subsequent stores that we're hoping make the original a 
dead store.  Thus when we start clearing LIVE_BYTES based on those 
subsequent stores, we have to normalize those against the ref->offset of 
the original store.


What's even more fun is sizing.  Those subsequent stores may be 
considerably larger.  Which means that a bitmap_clear_range call has to 
be a hell of a lot more careful when working with sbitmaps (we just 
happily stop all over memory in that case) whereas a bitmap the right 
things will "just happen".


On a positive size since we've normalized the potential dead store's 
byte range to 0..size, it means computing trims is easier because we 
inherently know how many bits were originally set.  So compute_trims 
becomes trivial and we can simplify trim_complex_store a bit as well.


And, of course, we don't have a bitmap_{clear,set}_range or a 
bitmap_count_bits implementation for sbitmaps.



It's all a bit of "ugh", but should be manageable.

Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 06:57 AM, Richard Biener wrote:

On Fri, Dec 16, 2016 at 2:54 AM, Jeff Law  wrote:

+  /* REF is a memory write.  Go ahead and get its base, size, extent
+ information and encode the bytes written into LIVE_BYTES.  We can
+ handle any case where we have a known base and maximum size.
+
+ However, experimentation has shown that bit level tracking is not
+ useful in practice, so we only track at the byte level.
+
+ Furthermore, experimentation has shown that 99% of the cases
+ that require byte tracking are 64 bytes or less.  */
+  if (valid_ao_ref_for_dse (ref)
+  && (ref->max_size / BITS_PER_UNIT
+ <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+{
+  live_bytes = BITMAP_ALLOC (NULL);
+  orig_live_bytes = BITMAP_ALLOC (NULL);
+  bitmap_set_range (live_bytes,
+   ref->offset / BITS_PER_UNIT,
+   ref->max_size / BITS_PER_UNIT);
+  bitmap_copy (orig_live_bytes, live_bytes);


So I'd use a once-per-pass allocated sbitmap here.  I don't see why you need
the orig_live_bytes bitmap though (just keep that implicitely by the known
range?)
So if we use a once-per-pass allocated bitmap, that actually facilitates 
returning a tri-state from dse_possible_dead_store_p and moving the 
trimming into dse_optimize_stmt.


ORIG_LIVE_BYTES was more convenience than anything -- we want it so that 
we can compute the dead bytes for trimming.  But we can certainly 
compute it on-demand at that time.


Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 06:57 AM, Richard Biener wrote:

On Fri, Dec 16, 2016 at 2:54 AM, Jeff Law  wrote:

+   {
+ /* STMT might be partially dead and we may be able to reduce
+how many memory locations it stores into.  */
+ if (live_bytes
+ && !bitmap_equal_p (live_bytes, orig_live_bytes)
+ && !gimple_clobber_p (stmt))
+   trim_partially_dead_store (orig_live_bytes, live_bytes, stmt);


The actual transform in dse_possible_dead_store_p looks a bit misplaced.
I see it's somehow convenient but then maybe returning a enum from this
function might be cleaner.  Well, I'm not too torn about this, so maybe
just rename the function a bit (no good suggestion either).
So the problem with returning a tri-state is that we'd also need to 
return the bitmaps or trimming data so that our caller would know how to 
trim in the partially dead case.  Certainly possible, though marginally 
more awkward from an implementation standpoint.  Renaming seems 
appropriate since dse_possible_dead_store_p does more than just detect a 
dead store with these changes.


Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Joseph Myers
On Fri, 16 Dec 2016, Jakub Jelinek wrote:

> On Fri, Dec 16, 2016 at 06:35:58PM +, Joseph Myers wrote:
> > On Thu, 15 Dec 2016, Jeff Law wrote:
> > 
> > > This version attacks the problem by improving DSE to track stores to 
> > > memory at
> > > a byte level.  That allows us to determine if a series of stores 
> > > completely
> > > covers an earlier store (thus making the earlier store dead).
> > 
> > Question: suppose you have an assignment for a struct with padding, then 
> > stores for all the elements.  Does it detect that the original assignment 
> > is dead (because there is no need to copy padding on struct assignment)?
> 
> We fold memcpy into struct assignment early, does the same apply also to
> memcpy?  Or shall we fold memcpy into struct assignment only when there is
> no padding (or find different IL representation of that)?

There is at least an arguable case that memset needs to set padding to 0 
(unlike initializers), and memcpy needs to copy it (unlike struct 
assignment).  Atomic compare-exchange definitely needs to work on the 
whole representation including padding.

However, when you store in a member of a structure (or union), that 
results in all the padding becoming undefined.  So in the cases of memcpy 
/ memset where this optimization is interesting (element stores follow the 
memcpy / memset), you can still optimize.

(Note the desire for an option to avoid such padding-related 
optimizations, by treating padding for optimization purposes as if there 
were dummy fields there that can all be cleared by a {} initializer or 
memset: PR 77992.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 11:43 AM, Jakub Jelinek wrote:

On Fri, Dec 16, 2016 at 06:35:58PM +, Joseph Myers wrote:

On Thu, 15 Dec 2016, Jeff Law wrote:


This version attacks the problem by improving DSE to track stores to memory at
a byte level.  That allows us to determine if a series of stores completely
covers an earlier store (thus making the earlier store dead).


Question: suppose you have an assignment for a struct with padding, then
stores for all the elements.  Does it detect that the original assignment
is dead (because there is no need to copy padding on struct assignment)?


We fold memcpy into struct assignment early, does the same apply also to
memcpy?  Or shall we fold memcpy into struct assignment only when there is
no padding (or find different IL representation of that)?

So a memcpy would presumably cover the entire object, padding included.

Subsequent component stores would not cover the padding.  So we'd trim 
the head/tail of the memcpy.



A memcpy folded to a structure assignment almost certainly covers the 
whole assignment when represented in an ao_ref.  So it'd be no different 
-- except that I haven't written code to trim structure assignments 
(unless they're represented by a CONSTRUCTOR assignment).  Regardless I 
would not suggest changing if/how/when we fold memcpy into a structure 
assignment.


Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 11:35 AM, Joseph Myers wrote:

On Thu, 15 Dec 2016, Jeff Law wrote:


This version attacks the problem by improving DSE to track stores to memory at
a byte level.  That allows us to determine if a series of stores completely
covers an earlier store (thus making the earlier store dead).


Question: suppose you have an assignment for a struct with padding, then
stores for all the elements.  Does it detect that the original assignment
is dead (because there is no need to copy padding on struct assignment)?
It would not detect the original assignment as dead in that case as it 
does not know anything special about the padding bytes.


So we'd likely end up just trimming the original assignment, 
particularly if it was done via a CONSTRUCTOR assignment or memset.


This situation could be addressed by handling the padding bytes 
specially -- ie, just clear them from LIVE when LIVE is initialized. 
It'd likely "just work" then.


Seems like it might be a reasonable follow-up as it'd hopefully make 
more of the original assignments fully dead rather than just trimmable.


jeff




Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jakub Jelinek
On Fri, Dec 16, 2016 at 06:35:58PM +, Joseph Myers wrote:
> On Thu, 15 Dec 2016, Jeff Law wrote:
> 
> > This version attacks the problem by improving DSE to track stores to memory 
> > at
> > a byte level.  That allows us to determine if a series of stores completely
> > covers an earlier store (thus making the earlier store dead).
> 
> Question: suppose you have an assignment for a struct with padding, then 
> stores for all the elements.  Does it detect that the original assignment 
> is dead (because there is no need to copy padding on struct assignment)?

We fold memcpy into struct assignment early, does the same apply also to
memcpy?  Or shall we fold memcpy into struct assignment only when there is
no padding (or find different IL representation of that)?

Jakub


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Joseph Myers
On Thu, 15 Dec 2016, Jeff Law wrote:

> This version attacks the problem by improving DSE to track stores to memory at
> a byte level.  That allows us to determine if a series of stores completely
> covers an earlier store (thus making the earlier store dead).

Question: suppose you have an assignment for a struct with padding, then 
stores for all the elements.  Does it detect that the original assignment 
is dead (because there is no need to copy padding on struct assignment)?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 06:57 AM, Richard Biener wrote:


Apart from what Trevor says about using sbitmaps (try to avoid the initial
zeroing please) and the missed freeing (you can use auto_[s]bitmap?)
some comments below.
In progress.  We'll need a few routines for sbitmaps that don't 
currently exist, but nothing that should be hard to implement correctly.



+
+static void
+trim_complex_store (bitmap orig, bitmap live, gimple *stmt)
+{
+  bitmap dead = BITMAP_ALLOC (NULL);
+  bitmap_and_compl (dead, orig, live);
+
+  /* So we have to verify that either the real or imag part as a whole
+ is dead.  They are always the same size.  Thus for one to be dead
+ the number of live bytes would have to be the same as the number of
+ dead bytes.  */
+  if (bitmap_count_bits (live) == bitmap_count_bits (dead))


popcount is expensive, so is this really a short-cut?
It's less about being a short cut and more about realizing both halves 
are the same size.  Thus to be trimmable the number of live bytes must 
exactly equal the number of dead bytes.  This works regardless of the 
size of the underlying components of the complex type.






+{
+  /* Hopefully we have live bits that look like 0xff00 or 0xff
(assuming
+8 bytes for the underlying real/imag part).  If so we can optimize
+this case.  */
+  if (bitmap_first_set_bit (live) == 0
+ && bitmap_first_set_bit (dead) == bitmap_count_bits (live))


Hmm, but does that properly handle byte-wise reads from it?  Like
reading the realpart plus the last byte from the imagpart?
It should.  We've already verified that exactly half of the object is 
live and half is dead.  So we just need to make sure that the live part 
is contiguous at the start or end of the object.


Again, written this way we don't have to care about the size of the 
components of the COMPLEX_EXPR.


You're right in that if we keep the test in this form we should avoid 
the redundant popcount calls.



+
+/* STMT is a memory write where one or more bytes written are dead
+   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
+   bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
+   now we only support trimming at the start or end of the memory region.
+   It's not clear how much there is to be gained by trimming from the
middle
+   of the region.  */
+
+static void
+trim_partially_dead_store (bitmap orig, bitmap live, gimple *stmt)
+{
+  if (is_gimple_assign (stmt))
+{
+  switch (gimple_assign_rhs_code (stmt))
+   {
+   case COMPLEX_CST:
+ trim_complex_store (orig, live, stmt);


so patch #1 only handles stores from COMPLEX_CST?  Ok...
Patch #1 handles detecting of any aggregate store that is made fully 
dead by subsequent partial stores.   It also handles trimming of a 
partially dead complex constant initialization.



+  if (valid_ao_ref_for_dse (ref)
+  && (ref->max_size / BITS_PER_UNIT
+ <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+{
+  live_bytes = BITMAP_ALLOC (NULL);
+  orig_live_bytes = BITMAP_ALLOC (NULL);
+  bitmap_set_range (live_bytes,
+   ref->offset / BITS_PER_UNIT,
+   ref->max_size / BITS_PER_UNIT);
+  bitmap_copy (orig_live_bytes, live_bytes);


So I'd use a once-per-pass allocated sbitmap here.  I don't see why you need
the orig_live_bytes bitmap though (just keep that implicitely by the known
range?)
That would require resizing the bitmap or biasing.  While we have a 
consistent maximum size, ref->offset varies.  And that's probably the 
one case that sparse bitmaps handle better than simple bitmaps.


So I guess we'll have to bias everything.  It's not particularly hard, 
but it can be error prone.





+}
+
   /* Find the first dominated statement that clobbers (part of) the
  memory stmt stores to with no intermediate statement that may use
  part of the memory stmt stores.  That is, find a store that may
@@ -164,7 +341,15 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
gimple **use_stmt)
}

   if (fail)
-   return false;
+   {
+ /* STMT might be partially dead and we may be able to reduce
+how many memory locations it stores into.  */
+ if (live_bytes
+ && !bitmap_equal_p (live_bytes, orig_live_bytes)
+ && !gimple_clobber_p (stmt))
+   trim_partially_dead_store (orig_live_bytes, live_bytes, stmt);


The actual transform in dse_possible_dead_store_p looks a bit misplaced.
I see it's somehow convenient but then maybe returning a enum from this
function might be cleaner.  Well, I'm not too torn about this, so maybe
just rename the function a bit (no good suggestion either).
I pondered that as well.  I don't have a strong opinion either way. 
I'll look into returning a tri-state (live, partially dead, dead).



Jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Jeff Law

On 12/16/2016 12:29 AM, Trevor Saunders wrote:

On Thu, Dec 15, 2016 at 06:54:43PM -0700, Jeff Law wrote:

   unsigned cnt = 0;
+  bitmap live_bytes = NULL;
+  bitmap orig_live_bytes = NULL;

   *use_stmt = NULL;

+  /* REF is a memory write.  Go ahead and get its base, size, extent
+ information and encode the bytes written into LIVE_BYTES.  We can
+ handle any case where we have a known base and maximum size.
+
+ However, experimentation has shown that bit level tracking is not
+ useful in practice, so we only track at the byte level.
+
+ Furthermore, experimentation has shown that 99% of the cases
+ that require byte tracking are 64 bytes or less.  */
+  if (valid_ao_ref_for_dse (ref)
+  && (ref->max_size / BITS_PER_UNIT
+ <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+{
+  live_bytes = BITMAP_ALLOC (NULL);
+  orig_live_bytes = BITMAP_ALLOC (NULL);
+  bitmap_set_range (live_bytes,
+   ref->offset / BITS_PER_UNIT,
+   ref->max_size / BITS_PER_UNIT);
+  bitmap_copy (orig_live_bytes, live_bytes);


would it maybe make sense to use sbitmaps since the length is known to
be short, and doesn't change after allocation?
It would.  I tend to default to bitmaps these days (odd since I was 
probably the largest user of sbitmaps for a long time).



@@ -164,7 +341,15 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, 
gimple **use_stmt)
}

   if (fail)
-   return false;
+   {
+ /* STMT might be partially dead and we may be able to reduce
+how many memory locations it stores into.  */
+ if (live_bytes
+ && !bitmap_equal_p (live_bytes, orig_live_bytes)
+ && !gimple_clobber_p (stmt))
+   trim_partially_dead_store (orig_live_bytes, live_bytes, stmt);
+ return false;


shouldn't you free the bitmaps here?

Yes.  I think switching to an auto-sbitmap would resolve this as well.


jeff


Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-16 Thread Richard Biener
On Fri, Dec 16, 2016 at 2:54 AM, Jeff Law  wrote:
>
> This is the first of the 4 part patchkit to address deficiencies in our DSE
> implementation.
>
>
> This patch addresses the P2 regression 33562 which has been a low priority
> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
> detect an aggregate store as dead if subsequent stores are done in a
> piecemeal fashion.
>
> I originally tackled this by changing how we lower complex objects. That was
> sufficient to address 33562, but was reasonably rejected.
>
> This version attacks the problem by improving DSE to track stores to memory
> at a byte level.  That allows us to determine if a series of stores
> completely covers an earlier store (thus making the earlier store dead).
>
> A useful side effect of this is we can detect when parts of a store are dead
> and potentially rewrite the store.  This patch implements that for complex
> object initializations.  While not strictly part of 33562, it's so closely
> related that I felt it belongs as part of this patch.
>
> This originally limited the size of the tracked memory space to 64 bytes.  I
> bumped the limit after working through the CONSTRUCTOR and mem* trimming
> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
> sleep if we throttled back to 64 or 128 bytes.
>
> Later patches in the kit will build upon this patch.  So if pieces look like
> skeleton code, that's because it is.
>
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

Apart from what Trevor says about using sbitmaps (try to avoid the initial
zeroing please) and the missed freeing (you can use auto_[s]bitmap?)
some comments below.

> PR tree-optimization/33562
> * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
> * tree-ssa-dse.c: Include params.h.
> (initialize_ao_ref_for_dse): New, partially extracted from
> dse_optimize_stmt.
> (valid_io_ref_for_dse): New.
> (clear_bytes_written_by, trim_complex_store): Likewise.
> (trim_partially_dead_store): Likewise.
> (dse_partially_dead_store_p): Track what bytes were originally
> stored
> into memory by the statement as well as the subset of bytes that
> are still live.   If we "fail", but have identified the store as
> partially dead, try to rewrite it to store fewer bytes of data.
> Exit the main loop if we find a full kill as a single statement
> or via group of statements.
> (dse_optimize_stmt): Use initialize_ao_ref_for_dse.
>
>
> * gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
> * gcc.dg/tree-ssa/complex-5.c: Likewise.
> * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
> * gcc.dg/tree-ssa/ssa-dse-18.c: New test.
> * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
> * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
> * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 50f75a7..ddc3d65 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
>  "Average number of iterations of a loop.",
>  10, 1, 0)
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +"dse-max-object-size",
> +"Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +256, 0, 0)
> +
>  DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
>  "scev-max-expr-size",
>  "Bound on size of expressions used in the scalar evolutions
> analyzer.",
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> index 87a2638..3155741 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> @@ -10,4 +10,4 @@ int f(void)
>return g();
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> index e2cd403..e6d027f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> @@ -8,4 +8,4 @@ int f(void)
>   __imag__ t = 2;
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> new file mode 100644
> index 000..92b2df8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +int g(_Complex int*);
> +int f(void)
> +{
> +  _Complex int t = 0;
> +  int i, j;
> + __imag__ t += 2;
> +  return g();
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "__complex__" 

Re: [RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-15 Thread Trevor Saunders
On Thu, Dec 15, 2016 at 06:54:43PM -0700, Jeff Law wrote:
>unsigned cnt = 0;
> +  bitmap live_bytes = NULL;
> +  bitmap orig_live_bytes = NULL;
>  
>*use_stmt = NULL;
>  
> +  /* REF is a memory write.  Go ahead and get its base, size, extent
> + information and encode the bytes written into LIVE_BYTES.  We can
> + handle any case where we have a known base and maximum size.
> +
> + However, experimentation has shown that bit level tracking is not
> + useful in practice, so we only track at the byte level.
> +
> + Furthermore, experimentation has shown that 99% of the cases
> + that require byte tracking are 64 bytes or less.  */
> +  if (valid_ao_ref_for_dse (ref)
> +  && (ref->max_size / BITS_PER_UNIT
> +   <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
> +{
> +  live_bytes = BITMAP_ALLOC (NULL);
> +  orig_live_bytes = BITMAP_ALLOC (NULL);
> +  bitmap_set_range (live_bytes,
> + ref->offset / BITS_PER_UNIT,
> + ref->max_size / BITS_PER_UNIT);
> +  bitmap_copy (orig_live_bytes, live_bytes);

would it maybe make sense to use sbitmaps since the length is known to
be short, and doesn't change after allocation?

> @@ -164,7 +341,15 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, 
> gimple **use_stmt)
>   }
>  
>if (fail)
> - return false;
> + {
> +   /* STMT might be partially dead and we may be able to reduce
> +  how many memory locations it stores into.  */
> +   if (live_bytes
> +   && !bitmap_equal_p (live_bytes, orig_live_bytes)
> +   && !gimple_clobber_p (stmt))
> + trim_partially_dead_store (orig_live_bytes, live_bytes, stmt);
> +   return false;

shouldn't you free the bitmaps here?

Trev



[RFA] [PR tree-optimization/33562] [PATCH 1/4] Byte tracking in DSE

2016-12-15 Thread Jeff Law


This is the first of the 4 part patchkit to address deficiencies in our 
DSE implementation.



This patch addresses the P2 regression 33562 which has been a low 
priority regression since gcc-4.3.  To summarize, DSE no longer has the 
ability to detect an aggregate store as dead if subsequent stores are 
done in a piecemeal fashion.


I originally tackled this by changing how we lower complex objects. 
That was sufficient to address 33562, but was reasonably rejected.


This version attacks the problem by improving DSE to track stores to 
memory at a byte level.  That allows us to determine if a series of 
stores completely covers an earlier store (thus making the earlier store 
dead).


A useful side effect of this is we can detect when parts of a store are 
dead and potentially rewrite the store.  This patch implements that for 
complex object initializations.  While not strictly part of 33562, it's 
so closely related that I felt it belongs as part of this patch.


This originally limited the size of the tracked memory space to 64 
bytes.  I bumped the limit after working through the CONSTRUCTOR and 
mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
I wouldn't lose sleep if we throttled back to 64 or 128 bytes.


Later patches in the kit will build upon this patch.  So if pieces look 
like skeleton code, that's because it is.



Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
PR tree-optimization/33562
* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
* tree-ssa-dse.c: Include params.h.
(initialize_ao_ref_for_dse): New, partially extracted from
dse_optimize_stmt.
(valid_io_ref_for_dse): New.
(clear_bytes_written_by, trim_complex_store): Likewise.
(trim_partially_dead_store): Likewise.
(dse_partially_dead_store_p): Track what bytes were originally stored
into memory by the statement as well as the subset of bytes that
are still live.   If we "fail", but have identified the store as
partially dead, try to rewrite it to store fewer bytes of data.
Exit the main loop if we find a full kill as a single statement
or via group of statements.
(dse_optimize_stmt): Use initialize_ao_ref_for_dse.


* gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
* gcc.dg/tree-ssa/complex-5.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

diff --git a/gcc/params.def b/gcc/params.def
index 50f75a7..ddc3d65 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
 "Average number of iterations of a loop.",
 10, 1, 0)
 
+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+"dse-max-object-size",
+"Maximum size (in bytes) of objects tracked by dead store 
elimination.",
+256, 0, 0)
+
 DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
 "scev-max-expr-size",
 "Bound on size of expressions used in the scalar evolutions analyzer.",
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
index 87a2638..3155741 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
@@ -10,4 +10,4 @@ int f(void)
   return g();
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
index e2cd403..e6d027f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
@@ -8,4 +8,4 @@ int f(void)
  __imag__ t = 2;
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
new file mode 100644
index 000..92b2df8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __imag__ t += 2;
+  return g();
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
new file mode 100644
index 000..718b746
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/*