Re: [COMMITTED 3/5] Add sbr_lazy_vector and adjust (e)vrp sparse cache

2023-04-26 Thread Martin Liška
On 4/26/23 21:26, Andrew MacLeod via Gcc-patches wrote:
> This implements a sparse vector class for rangers cache and uses it bey 
> default except when the CFG is very small, in qhich case the original full 
> vectors are faster.  It works like a normal vector cache (in fact it inherits 
> from it), but uses a sparse bitmap to determine whether a vector element is 
> set or not.  This provide better performance for clearing the vector, as well 
> as during initialization.
> 
> A new param is added for this transition "vrp_vector_threshold" which 
> defaults to 250.  Anything function with fewer than 250 basic blocks will use 
> the simple vectors.  Various timing runs have indicated this is about the 
> sweet spot where using the sparse bitmap overtakes the time required to clear 
> the vector initially. Should we make ranger live across functions in the 
> future, we'll probably want to lower this value again as clearing is 
> significantly cheaper.
> 
> This patch also rename the "evrp_*" params to "vrp_*" as there really is not 
> a serperate EVRP pass any more, its all one vrp pass.   Eventually we'll 
> probably want to change it to vrp1, vrp2 and vrp3 rather than evrp, vrp1  and 
> vrp2.    But thats a task for later, perhaps when we reconsider pass 
> orderings..
> 
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.
> 
> Andrew

Hello.

Please adjust also the documentation for the params in gcc/doc/invoke.texi.

Thanks,
Martin


[COMMITTED 3/5] Add sbr_lazy_vector and adjust (e)vrp sparse cache

2023-04-26 Thread Andrew MacLeod via Gcc-patches
This implements a sparse vector class for rangers cache and uses it bey 
default except when the CFG is very small, in qhich case the original 
full vectors are faster.  It works like a normal vector cache (in fact 
it inherits from it), but uses a sparse bitmap to determine whether a 
vector element is set or not.  This provide better performance for 
clearing the vector, as well as during initialization.


A new param is added for this transition "vrp_vector_threshold" which 
defaults to 250.  Anything function with fewer than 250 basic blocks 
will use the simple vectors.  Various timing runs have indicated this is 
about the sweet spot where using the sparse bitmap overtakes the time 
required to clear the vector initially. Should we make ranger live 
across functions in the future, we'll probably want to lower this value 
again as clearing is significantly cheaper.


This patch also rename the "evrp_*" params to "vrp_*" as there really is 
not a serperate EVRP pass any more, its all one vrp pass.   Eventually 
we'll probably want to change it to vrp1, vrp2 and vrp3 rather than 
evrp, vrp1  and vrp2.    But thats a task for later, perhaps when we 
reconsider pass orderings..


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 6a3babfbd9a2b18b9e86d3d3a91564fcb9b8f9d7 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 13 Apr 2023 14:47:47 -0400
Subject: [PATCH 3/5] Add sbr_lazy_vector and adjust (e)vrp sparse cache

Add a sparse vector class for cache and use if by default.
Rename the evrp_* params to vrp_*, and add a param for small CFGS which use
just the original basic vector.

	* gimple-range-cache.cc (sbr_vector::sbr_vector): Add parameter
	and local to optionally zero memory.
	(br_vector::grow): Only zero memory if flag is set.
	(class sbr_lazy_vector): New.
	(sbr_lazy_vector::sbr_lazy_vector): New.
	(sbr_lazy_vector::set_bb_range): New.
	(sbr_lazy_vector::get_bb_range): New.
	(sbr_lazy_vector::bb_range_p): New.
	(block_range_cache::set_bb_range): Check flags and Use sbr_lazy_vector.
	* gimple-range-gori.cc (gori_map::calculate_gori): Use
	param_vrp_switch_limit.
	(gori_compute::gori_compute): Use param_vrp_switch_limit.
	* params.opt (vrp_sparse_threshold): Rename from evrp_sparse_threshold.
	(vrp_switch_limit): Rename from evrp_switch_limit.
	(vrp_vector_threshold): New.
---
 gcc/gimple-range-cache.cc | 72 ++-
 gcc/gimple-range-gori.cc  |  4 +--
 gcc/params.opt| 20 ++-
 3 files changed, 78 insertions(+), 18 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 2314478d558..868d2dda424 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -79,7 +79,7 @@ ssa_block_ranges::dump (FILE *f)
 class sbr_vector : public ssa_block_ranges
 {
 public:
-  sbr_vector (tree t, vrange_allocator *allocator);
+  sbr_vector (tree t, vrange_allocator *allocator, bool zero_p = true);
 
   virtual bool set_bb_range (const_basic_block bb, const vrange &r) override;
   virtual bool get_bb_range (vrange &r, const_basic_block bb) override;
@@ -91,22 +91,25 @@ protected:
   vrange *m_undefined;
   tree m_type;
   vrange_allocator *m_range_allocator;
+  bool m_zero_p;
   void grow ();
 };
 
 
 // Initialize a block cache for an ssa_name of type T.
 
-sbr_vector::sbr_vector (tree t, vrange_allocator *allocator)
+sbr_vector::sbr_vector (tree t, vrange_allocator *allocator, bool zero_p)
   : ssa_block_ranges (t)
 {
   gcc_checking_assert (TYPE_P (t));
   m_type = t;
+  m_zero_p = zero_p;
   m_range_allocator = allocator;
   m_tab_size = last_basic_block_for_fn (cfun) + 1;
   m_tab = static_cast 
 (allocator->alloc (m_tab_size * sizeof (vrange *)));
-  memset (m_tab, 0, m_tab_size * sizeof (vrange *));
+  if (zero_p)
+memset (m_tab, 0, m_tab_size * sizeof (vrange *));
 
   // Create the cached type range.
   m_varying = m_range_allocator->alloc_vrange (t);
@@ -132,7 +135,8 @@ sbr_vector::grow ()
   vrange **t = static_cast 
 (m_range_allocator->alloc (new_size * sizeof (vrange *)));
   memcpy (t, m_tab, m_tab_size * sizeof (vrange *));
-  memset (t + m_tab_size, 0, (new_size - m_tab_size) * sizeof (vrange *));
+  if (m_zero_p)
+memset (t + m_tab_size, 0, (new_size - m_tab_size) * sizeof (vrange *));
 
   m_tab = t;
   m_tab_size = new_size;
@@ -183,6 +187,50 @@ sbr_vector::bb_range_p (const_basic_block bb)
   return false;
 }
 
+// Like an sbr_vector, except it uses a bitmap to manage whetehr  vale is set
+// or not rather than cleared memory.
+
+class sbr_lazy_vector : public sbr_vector
+{
+public:
+  sbr_lazy_vector (tree t, vrange_allocator *allocator, bitmap_obstack *bm);
+
+  virtual bool set_bb_range (const_basic_block bb, const vrange &r) override;
+  virtual bool get_bb_range (vrange &r, const_basic_block bb) override;
+  virtual bool bb_range_p (const_basic_block bb) override;
+protected:
+  bitmap m_has_value;
+};
+
+sbr_lazy_vector::sbr_lazy_vector (tree t, v