Re: [PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use

2024-04-02 Thread Jakub Jelinek
On Tue, Apr 02, 2024 at 02:06:27PM +0200, Richard Biener wrote:
> On Tue, 2 Apr 2024, Richard Biener wrote:
> 
> > The following reduces peak memory use for the PR114480 testcase at -O1
> > which is almost exclusively spent by the ehcleanup pass in allocating
> > PHI nodes.  The free_phinodes cache we maintain isn't very effective
> > since it has effectively two slots, one for 4 and one for 9 argument
> > PHIs and it is only ever used for allocations up to 9 arguments but
> > we put all larger PHIs in the 9 argument bucket.  This proves
> > uneffective resulting in much garbage to be kept when incrementally
> > growing PHI nodes by edge redirection.
> > 
> > The mitigation is to rely on the GC freelist for larger sizes and
> > thus immediately return all larger bucket sized PHIs to it via ggc_free.
> > 
> > This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
> > from 359s to 168s.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > OK for trunk?  I'll leave more surgery for stage1.
> 
> Testing revealed on other use-after-free.  Revised patch as follows.

LGTM if there aren't similar further issues.

> This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
> from 359s to 168s.
> 
>   PR tree-optimization/114557
>   PR tree-optimization/114480
>   * tree-phinodes.cc (release_phi_node): Return PHIs from
>   allocation buckets not covered by free_phinodes to GC.
>   (remove_phi_node): Release the PHI LHS before freeing the
>   PHI node.
>   * tree-vect-loop.cc (vectorizable_live_operation): Get PHI lhs
>   before releasing it.

Jakub



Re: [PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use

2024-04-02 Thread Richard Biener
On Tue, 2 Apr 2024, Richard Biener wrote:

> The following reduces peak memory use for the PR114480 testcase at -O1
> which is almost exclusively spent by the ehcleanup pass in allocating
> PHI nodes.  The free_phinodes cache we maintain isn't very effective
> since it has effectively two slots, one for 4 and one for 9 argument
> PHIs and it is only ever used for allocations up to 9 arguments but
> we put all larger PHIs in the 9 argument bucket.  This proves
> uneffective resulting in much garbage to be kept when incrementally
> growing PHI nodes by edge redirection.
> 
> The mitigation is to rely on the GC freelist for larger sizes and
> thus immediately return all larger bucket sized PHIs to it via ggc_free.
> 
> This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
> from 359s to 168s.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK for trunk?  I'll leave more surgery for stage1.

Testing revealed on other use-after-free.  Revised patch as follows.

Richard.

>From 3507c14d05994eba5396492f08a919847b9e54ab Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Tue, 2 Apr 2024 12:31:04 +0200
Subject: [PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use
To: gcc-patches@gcc.gnu.org

The following reduces peak memory use for the PR114480 testcase at -O1
which is almost exclusively spent by the ehcleanup pass in allocating
PHI nodes.  The free_phinodes cache we maintain isn't very effective
since it has effectively two slots, one for 4 and one for 9 argument
PHIs and it is only ever used for allocations up to 9 arguments but
we put all larger PHIs in the 9 argument bucket.  This proves
uneffective resulting in much garbage to be kept when incrementally
growing PHI nodes by edge redirection.

The mitigation is to rely on the GC freelist for larger sizes and
thus immediately return all larger bucket sized PHIs to it via ggc_free.

This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
from 359s to 168s.

PR tree-optimization/114557
PR tree-optimization/114480
* tree-phinodes.cc (release_phi_node): Return PHIs from
allocation buckets not covered by free_phinodes to GC.
(remove_phi_node): Release the PHI LHS before freeing the
PHI node.
* tree-vect-loop.cc (vectorizable_live_operation): Get PHI lhs
before releasing it.
---
 gcc/ggc-page.cc   |  6 ++
 gcc/tree-phinodes.cc  | 10 +-
 gcc/tree-vect-loop.cc |  2 +-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
index ddd731323e1..5a7e4a94e57 100644
--- a/gcc/tree-phinodes.cc
+++ b/gcc/tree-phinodes.cc
@@ -223,6 +223,14 @@ release_phi_node (gimple *phi)
   delink_imm_use (imm);
 }
 
+  /* Immediately return the memory to the allocator when we would
+ only ever re-use it for a smaller size allocation.  */
+  if (len - 2 >= NUM_BUCKETS - 2)
+{
+  ggc_free (phi);
+  return;
+}
+
   bucket = len > NUM_BUCKETS - 1 ? NUM_BUCKETS - 1 : len;
   bucket -= 2;
   vec_safe_push (free_phinodes[bucket], phi);
@@ -445,9 +453,9 @@ remove_phi_node (gimple_stmt_iterator *gsi, bool 
release_lhs_p)
 
   /* If we are deleting the PHI node, then we should release the
  SSA_NAME node so that it can be reused.  */
-  release_phi_node (phi);
   if (release_lhs_p)
 release_ssa_name (gimple_phi_result (phi));
+  release_phi_node (phi);
 }
 
 /* Remove all the phi nodes from BB.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f33629e9b04..984636edbc5 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10962,8 +10962,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 lhs_type, _gsi);
 
  auto gsi = gsi_for_stmt (use_stmt);
- remove_phi_node (, false);
  tree lhs_phi = gimple_phi_result (use_stmt);
+ remove_phi_node (, false);
  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
  gsi_insert_before (_gsi, copy, GSI_SAME_STMT);
  break;
-- 
2.35.3



[PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use

2024-04-02 Thread Richard Biener
The following reduces peak memory use for the PR114480 testcase at -O1
which is almost exclusively spent by the ehcleanup pass in allocating
PHI nodes.  The free_phinodes cache we maintain isn't very effective
since it has effectively two slots, one for 4 and one for 9 argument
PHIs and it is only ever used for allocations up to 9 arguments but
we put all larger PHIs in the 9 argument bucket.  This proves
uneffective resulting in much garbage to be kept when incrementally
growing PHI nodes by edge redirection.

The mitigation is to rely on the GC freelist for larger sizes and
thus immediately return all larger bucket sized PHIs to it via ggc_free.

This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
from 359s to 168s.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk?  I'll leave more surgery for stage1.

Thanks,
Richard.

PR tree-optimization/114557
PR tree-optimization/114480
* tree-phinodes.cc (release_phi_node): Return PHIs from
allocation buckets not covered by free_phinodes to GC.
(remove_phi_node): Release the PHI LHS before freeing the
PHI node.
---
 gcc/tree-phinodes.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
index ddd731323e1..5a7e4a94e57 100644
--- a/gcc/tree-phinodes.cc
+++ b/gcc/tree-phinodes.cc
@@ -223,6 +223,14 @@ release_phi_node (gimple *phi)
   delink_imm_use (imm);
 }
 
+  /* Immediately return the memory to the allocator when we would
+ only ever re-use it for a smaller size allocation.  */
+  if (len - 2 >= NUM_BUCKETS - 2)
+{
+  ggc_free (phi);
+  return;
+}
+
   bucket = len > NUM_BUCKETS - 1 ? NUM_BUCKETS - 1 : len;
   bucket -= 2;
   vec_safe_push (free_phinodes[bucket], phi);
@@ -445,9 +453,9 @@ remove_phi_node (gimple_stmt_iterator *gsi, bool 
release_lhs_p)
 
   /* If we are deleting the PHI node, then we should release the
  SSA_NAME node so that it can be reused.  */
-  release_phi_node (phi);
   if (release_lhs_p)
 release_ssa_name (gimple_phi_result (phi));
+  release_phi_node (phi);
 }
 
 /* Remove all the phi nodes from BB.  */
-- 
2.35.3