Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-28 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 4:15 PM Martin Liška  wrote:
>
> On 9/25/20 3:45 PM, Richard Biener wrote:
> > On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:
> >>
> >> On 9/25/20 3:18 PM, Richard Biener wrote:
> >>> On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
> 
>  Hello.
> 
>  All right, I come up with a rapid speed up that can allow us to remove
>  the introduced parameter. It contains 2 parts:
>  - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
>  - JT: we spent quite some time in density calculation, we can guess it 
>  first
>   and it leads to a fast bail out.
> 
>  Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  Ready to be installed?
> >>>
> >>> Err
> >>>
> >>> +  auto_vec dest_bbs;
> >>> -  auto_bitmap dest_bbs;
> >>>
> >>> -  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
> >>> +  if (!dest_bbs.contains (sc->m_case_bb->index))
> >>> +   {
> >>> + dest_bbs.safe_push (sc->m_case_bb->index);
> >>> + if (dest_bbs.length () > m_max_case_bit_tests)
> >>> +   return false;
> >>> +   }
> >>
> >> That's intentional as m_max_case_bit_tests is a very small number (3) and
> >> I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
> >> is a constant operation.
> >
> > You're storing bb->index and formerly set bb->index bit, what's the 
> > difference?
> >
> > For max 3 elements a vector is OK, of course but there should be a comment
> > that says this ;)  The static const is 'int' so it can in principle
> > hold up to two billion ;)
>
> Sure, comment is needed.
>
> >
> >>>
> >>> vec::contains is linear search so no.  Was this for the length check?
> >>> Just do
> >>>
> >>>if (bitmap_set_bit (...))
> >>> {
> >>>   length++;
> >>>   if (length > ...)
> >>
> >> I would need here bitmap_count_bits. Do you prefer it?
> >
> > bitmap_set_bit returns false if the bit was already set so you can
> > count as you add bits, see the length++ above.
>
> Ah, got it!
>
> >
> > For three elements the vec will be faster though.  May I suggest
> > to use
> >
> >   auto_vec dest_bbs;
> >
> > then and quick_push rather than safe_push (need to guard the
> > push with the max_case_bit_test).
>
> Yes.
>
> Is the patch fine with that (and Jakub's comment)?

Yes.

Thanks,
Richard.

> Martin
>
> >
> > Richard.
> >
> >
> >
> >> Martin
> >>
> >>>
>  Thanks,
>  Martin
> >>
>


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:45 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:


On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
 and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.


You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)


Sure, comment is needed.





vec::contains is linear search so no.  Was this for the length check?
Just do

   if (bitmap_set_bit (...))
{
  length++;
  if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?


bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.


Ah, got it!



For three elements the vec will be faster though.  May I suggest
to use

  auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).


Yes.

Is the patch fine with that (and Jakub's comment)?

Martin



Richard.




Martin




Thanks,
Martin






Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:52 PM, Jakub Jelinek wrote:

On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:

--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec 
,
if (range == 0)
  return false;
  
+  unsigned HOST_WIDE_INT lhs = 100 * range;

+  if (lhs < range)
+return false;


If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
   if (range > HOST_WIDE_INT_M1U / 100)
 return false;

   unsigned HOST_WIDE_INT lhs = 100 * range;
instead?


Yes, I'll add the check.

Thanks,
Martin



Jakub





Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec *> ,
>if (range == 0)
>  return false;
>  
> +  unsigned HOST_WIDE_INT lhs = 100 * range;
> +  if (lhs < range)
> +return false;

If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
  if (range > HOST_WIDE_INT_M1U / 100)
return false;

  unsigned HOST_WIDE_INT lhs = 100 * range;
instead?

Jakub



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:
>
> On 9/25/20 3:18 PM, Richard Biener wrote:
> > On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> All right, I come up with a rapid speed up that can allow us to remove
> >> the introduced parameter. It contains 2 parts:
> >> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> >> - JT: we spent quite some time in density calculation, we can guess it 
> >> first
> >> and it leads to a fast bail out.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > Err
> >
> > +  auto_vec dest_bbs;
> > -  auto_bitmap dest_bbs;
> >
> > -  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
> > +  if (!dest_bbs.contains (sc->m_case_bb->index))
> > +   {
> > + dest_bbs.safe_push (sc->m_case_bb->index);
> > + if (dest_bbs.length () > m_max_case_bit_tests)
> > +   return false;
> > +   }
>
> That's intentional as m_max_case_bit_tests is a very small number (3) and
> I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
> is a constant operation.

You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)

> >
> > vec::contains is linear search so no.  Was this for the length check?
> > Just do
> >
> >   if (bitmap_set_bit (...))
> >{
> >  length++;
> >  if (length > ...)
>
> I would need here bitmap_count_bits. Do you prefer it?

bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.

For three elements the vec will be faster though.  May I suggest
to use

 auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).

Richard.



> Martin
>
> >
> >> Thanks,
> >> Martin
>


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.



vec::contains is linear search so no.  Was this for the length check?
Just do

  if (bitmap_set_bit (...))
   {
 length++;
 if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?

Martin




Thanks,
Martin




Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
>
> Hello.
>
> All right, I come up with a rapid speed up that can allow us to remove
> the introduced parameter. It contains 2 parts:
> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> - JT: we spent quite some time in density calculation, we can guess it first
>and it leads to a fast bail out.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }

vec::contains is linear search so no.  Was this for the length check?
Just do

 if (bitmap_set_bit (...))
  {
length++;
if (length > ...)

> Thanks,
> Martin


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
  and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From dc4c1d129a50c7f51d28235506479f29d51dae07 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 24 Sep 2020 13:34:13 +0200
Subject: [PATCH 2/2] switch conversion: make a rapid speed up

gcc/ChangeLog:

	PR tree-optimization/96979
	* tree-switch-conversion.c (jump_table_cluster::can_be_handled):
	Make a fast bail out.
	(bit_test_cluster::can_be_handled): Likewise here.
	* tree-switch-conversion.h (get_range): Use wi::to_wide instead
	of a folding.

gcc/testsuite/ChangeLog:

	PR tree-optimization/96979
	* g++.dg/tree-ssa/pr96979.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 48 +
 gcc/tree-switch-conversion.c| 32 -
 gcc/tree-switch-conversion.h|  7 ++--
 3 files changed, 74 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
new file mode 100644
index 000..ec0f57a8548
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
@@ -0,0 +1,48 @@
+/* PR tree-optimization/96979 */
+/* { dg-do compile } */
+/* { dg-options "-std=c++17 -O2" } */
+
+using u64 = unsigned long long;
+
+constexpr inline u64
+foo (const char *str) noexcept
+{
+  u64 value = 0xcbf29ce484222325ULL;
+  for (u64 i = 0; str[i]; i++)
+value = (value ^ u64(str[i])) * 0x10001b3ULL;
+  return value;
+}
+
+struct V
+{
+  enum W
+  {
+#define A(n) n,
+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
+#define E D(foo1) D(foo2) D(foo3)
+E
+last
+  };
+
+  constexpr static W
+  bar (const u64 h) noexcept
+  {
+switch (h)
+  {
+#undef A
+#define F(n) #n
+#define A(n) case foo (F(n)): return n;
+E
+  }
+return last;
+  }
+};
+
+int
+baz (const char *s)
+{
+  const u64 h = foo (s);
+  return V::bar (h);
+}
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 186411ff3c4..3212e964b84 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec ,
   if (range == 0)
 return false;
 
+  unsigned HOST_WIDE_INT lhs = 100 * range;
+  if (lhs < range)
+return false;
+
+  /* First make quick guess as each cluster
+ can add at maximum 2 to the comparison_count.  */
+  if (lhs > 2 * max_ratio * (end - start + 1))
+return false;
+
   unsigned HOST_WIDE_INT comparison_count = 0;
   for (unsigned i = start; i <= end; i++)
 {
@@ -1275,10 +1284,6 @@ jump_table_cluster::can_be_handled (const vec ,
   comparison_count += sc->m_range_p ? 2 : 1;
 }
 
-  unsigned HOST_WIDE_INT lhs = 100 * range;
-  if (lhs < range)
-return false;
-
   return lhs <= max_ratio * comparison_count;
 }
 
@@ -1364,12 +1369,12 @@ bit_test_cluster::can_be_handled (unsigned HOST_WIDE_INT range,
 {
   /* Check overflow.  */
   if (range == 0)
-return 0;
+return false;
 
   if (range >= GET_MODE_BITSIZE (word_mode))
 return false;
 
-  return uniq <= 3;
+  return uniq <= m_max_case_bit_tests;
 }
 
 /* Return true when cluster starting at START and ending at END (inclusive)
@@ -1379,6 +1384,7 @@ bool
 bit_test_cluster::can_be_handled (const vec ,
   unsigned start, unsigned end)
 {
+  auto_vec dest_bbs;
   /* For algorithm correctness, bit test for a single case must return
  true.  We bail out in is_beneficial if it's called just for
  a single case.  */
@@ -1387,15 +1393,23 @@ bit_test_cluster::can_be_handled (const vec ,
 
   unsigned HOST_WIDE_INT range = get_range (clusters[start]->get_low (),
 	clusters[end]->get_high ());
-  auto_bitmap dest_bbs;
+
+  /* Make a guess first.  */
+  if (!can_be_handled (range, m_max_case_bit_tests))
+return false;
 
   for (unsigned i = start; i <= end; i++)
 {
   simple_cluster *sc = static_cast (clusters[i]);
-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+	{
+	  dest_bbs.safe_push (sc->m_case_bb->index);
+	  if (dest_bbs.length () > m_max_case_bit_tests)
+	return false;
+	}
 }
 
-  return can_be_handled (range, bitmap_count_bits (dest_bbs));
+  return true;
 }
 
 /* Return true when COUNT of cases of UNIQ labels is beneficial for bit test
diff --git a/gcc/tree-switch-conversion.h 

Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 22, 2020 at 02:19:12PM +0200, Richard Biener via Gcc-patches wrote:
> On September 22, 2020 1:22:12 PM GMT+02:00, "Martin Liška"  
> wrote:
> >Hi.
> >
> >The patch is about a bail out limit that needs to be added to switch
> >lowering.
> >Currently the algorithm is quadratic and needs some bail out. I've
> >tested value
> >of 100K which corresponds to about 0.2s in the problematic test-case
> >before
> >it's reached.
> >
> >Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> >Ready to be installed?
> 
> OK. Though the default limit looks high? 

This (with the new very low limit of 1) broke bootstrap on i686-linux
PATH=~/hbin:$PATH i386 ../configure 
--enable-languages=default,obj-c++,lto,go,brig,d --enable-checking=yes,rtl,extra
(where ~/hbin contains as/ld/gcc/g++ wrappers that ensure 32-bit
compilation).
When compiling insn-extract.c which contains a single large function with
a huge switch in it, we punt on the switch optimization and run out of
memory during DF (sure, latent problem).
I get
cc1plus: out of memory allocating 65536 bytes after a total of 2854985728 bytes
even with --param=max-switch-clustering-attempts=10 though, or even with
=100.  It works fine with =1000 (and compiles pretty quickly, 1m50sec).
Surprisingly, with =500 it takes huge amounts of time (in the switchconv
code - btw, couldn't it use wide_ints rather than const_binop to save
memory/compile time?, over 6 minutes, before it dies in DF).
What I see is that the function has slightly less than 3500 switch clusters,
so for O(n^2) that is definitely more than the 1 (which is for 100
switch clusters).

With =1000, I see that it finishes in find_jump_tables very quickly with
attempts equal to 5997916 when it exits successfully the loop.

Now, I wonder if it is really a good idea to measure attempts as the number
of the can_be_handled calls or something similar.

And
  static unsigned HOST_WIDE_INT get_range (tree low, tree high)
  {
tree r = fold_build2 (MINUS_EXPR, TREE_TYPE (low), high, low);
if (!tree_fits_uhwi_p (r))
  return 0;

return tree_to_uhwi (r) + 1;
  }
is a big waste of time and memory.
If all labels are already checked to be INTEGER_CSTs of the same type
(fold_build2 wouldn't work well if they have different types), then
  static unsigned HOST_WIDE_INT get_range (tree low, tree high)
  {
wide_int w = wi::to_wide (high) - wi::to_wide (low);
if (wi::neg_p (w, TYPE_SIGN (TREE_TYPE (low))) || !wi::fits_uhwi_p (w))
  return 0;
return wi::to_uhwi (w) + 1;
  }
should do it without having to look up new and new INTEGER_CSTs in the hash
tables and/or create them?

As for the DF problem, seems it is the MIR DF problem used by REE that runs
out of memory, perhaps REE should punt on certain kinds of cfgs?

Jakub



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Martin Liška

On 9/22/20 2:19 PM, Richard Biener wrote:

OK. Though the default limit looks high?


Yep, I'm going to install it with the param default value
equal to 1.

Thanks,
Martin


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Richard Biener via Gcc-patches
On September 22, 2020 1:22:12 PM GMT+02:00, "Martin Liška"  
wrote:
>Hi.
>
>The patch is about a bail out limit that needs to be added to switch
>lowering.
>Currently the algorithm is quadratic and needs some bail out. I've
>tested value
>of 100K which corresponds to about 0.2s in the problematic test-case
>before
>it's reached.
>
>Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
>Ready to be installed?

OK. Though the default limit looks high? 

Richard. 

>Thanks,
>Martin
>
>gcc/ChangeLog:
>
>   PR tree-optimization/96979
>   * doc/invoke.texi: Document new param max-switch-clustering-attempts.
>   * params.opt: Add new parameter.
>   * tree-switch-conversion.c (jump_table_cluster::find_jump_tables):
>   Limit number of attempts.
>   (bit_test_cluster::find_bit_tests): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/96979
>   * g++.dg/tree-ssa/pr96979.C: New test.
>---
>  gcc/doc/invoke.texi |  4 ++
>  gcc/params.opt  |  4 ++
> gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 50 +
>  gcc/tree-switch-conversion.c| 17 +
>  4 files changed, 75 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 665c0ffc4a1..6a7833b1d75 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -13452,6 +13452,10 @@ The smallest number of different values for
>which it is best to use a
> jump-table instead of a tree of conditional branches.  If the value is
>  0, use the default for the machine.
>  
>+@item max-switch-clustering-attempts
>+The maximum number of clustering attempts used
>+in bit-test and jump-table switch expansion.
>+
>  @item jump-table-max-growth-ratio-for-size
>  The maximum code size growth ratio when expanding
>  into a jump table (in percent).  The parameter is used when
>diff --git a/gcc/params.opt b/gcc/params.opt
>index 1d864047ad2..f4dcb5426c7 100644
>--- a/gcc/params.opt
>+++ b/gcc/params.opt
>@@ -82,6 +82,10 @@ The maximum length of a constant string for a
>builtin string cmp call eligible f
>Common Joined UInteger Var(param_case_values_threshold) Param
>Optimization
>The smallest number of different values for which it is best to use a
>jump-table instead of a tree of conditional branches, if 0, use the
>default for the machine.
>  
>+-param=max-switch-clustering-attempts=
>+Common Joined UInteger Var(param_max_switch_clustering_attempts) Param
>Optimization Init(10)
>+The maximum number of clustering attempts used in bit-test and
>jump-table switch expansion.
>+
>  -param=comdat-sharing-probability=
>Common Joined UInteger Var(param_comdat_sharing_probability) Init(20)
>Param Optimization
>Probability that COMDAT function will be shared with different
>compilation unit.
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>new file mode 100644
>index 000..85c703a140d
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
>@@ -0,0 +1,50 @@
>+/* PR tree-optimization/96979 */
>+/* { dg-do compile } */
>+/* { dg-options "-std=c++17 -O2 -fdump-tree-switchlower1" } */
>+
>+using u64 = unsigned long long;
>+
>+constexpr inline u64
>+foo (const char *str) noexcept
>+{
>+  u64 value = 0xcbf29ce484222325ULL;
>+  for (u64 i = 0; str[i]; i++)
>+value = (value ^ u64(str[i])) * 0x10001b3ULL;
>+  return value;
>+}
>+
>+struct V
>+{
>+  enum W
>+  {
>+#define A(n) n,
>+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6)
>A(n##7) A(n##8) A(n##9)
>+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6)
>B(n##7) B(n##8) B(n##9)
>+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6)
>C(n##7) C(n##8) C(n##9)
>+#define E D(foo1) D(foo2) D(foo3)
>+E
>+last
>+  };
>+
>+  constexpr static W
>+  bar (const u64 h) noexcept
>+  {
>+switch (h)
>+  {
>+#undef A
>+#define F(n) #n
>+#define A(n) case foo (F(n)): return n;
>+E
>+  }
>+return last;
>+  }
>+};
>+
>+int
>+baz (const char *s)
>+{
>+  const u64 h = foo (s);
>+  return V::bar (h);
>+}
>+
>+/* { dg-final { scan-tree-dump-times ";; Bail out:
>--param=max-switch-clustering-attempts reached" 2 "switchlower1" } } */
>diff --git a/gcc/tree-switch-conversion.c
>b/gcc/tree-switch-conversion.c
>index 186411ff3c4..e6a2c7a6a84 100644
>--- a/gcc/tree-switch-conversion.c
>+++ b/gcc/tree-switch-conversion.c
>@@ -1183,6 +1183,7 @@ jump_table_cluster::find_jump_tables (vec*> )
>  
>min.quick_push (min_cluster_item (0, 0, 0));
>  
>+  HOST_WIDE_INT attempts = 0;
>for (unsigned i = 1; i <= l; i++)
>  {
>/* Set minimal # of clusters with i-th item to infinite.  */
>@@ -1194,6 +1195,14 @@ jump_table_cluster::find_jump_tables
>(vec )
> if (i - j < case_values_threshold ())
>   s += i - j;
>  
>+if (attempts++ == param_max_switch_clustering_attempts)

[PATCH] switch lowering: limit number of cluster attemps

2020-09-22 Thread Martin Liška

Hi.

The patch is about a bail out limit that needs to be added to switch lowering.
Currently the algorithm is quadratic and needs some bail out. I've tested value
of 100K which corresponds to about 0.2s in the problematic test-case before
it's reached.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/96979
* doc/invoke.texi: Document new param max-switch-clustering-attempts.
* params.opt: Add new parameter.
* tree-switch-conversion.c (jump_table_cluster::find_jump_tables):
Limit number of attempts.
(bit_test_cluster::find_bit_tests): Likewise.

gcc/testsuite/ChangeLog:

PR tree-optimization/96979
* g++.dg/tree-ssa/pr96979.C: New test.
---
 gcc/doc/invoke.texi |  4 ++
 gcc/params.opt  |  4 ++
 gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 50 +
 gcc/tree-switch-conversion.c| 17 +
 4 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 665c0ffc4a1..6a7833b1d75 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13452,6 +13452,10 @@ The smallest number of different values for which it 
is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
 0, use the default for the machine.
 
+@item max-switch-clustering-attempts

+The maximum number of clustering attempts used
+in bit-test and jump-table switch expansion.
+
 @item jump-table-max-growth-ratio-for-size
 The maximum code size growth ratio when expanding
 into a jump table (in percent).  The parameter is used when
diff --git a/gcc/params.opt b/gcc/params.opt
index 1d864047ad2..f4dcb5426c7 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -82,6 +82,10 @@ The maximum length of a constant string for a builtin string 
cmp call eligible f
 Common Joined UInteger Var(param_case_values_threshold) Param Optimization
 The smallest number of different values for which it is best to use a 
jump-table instead of a tree of conditional branches, if 0, use the default for 
the machine.
 
+-param=max-switch-clustering-attempts=

+Common Joined UInteger Var(param_max_switch_clustering_attempts) Param 
Optimization Init(10)
+The maximum number of clustering attempts used in bit-test and jump-table 
switch expansion.
+
 -param=comdat-sharing-probability=
 Common Joined UInteger Var(param_comdat_sharing_probability) Init(20) Param 
Optimization
 Probability that COMDAT function will be shared with different compilation 
unit.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
new file mode 100644
index 000..85c703a140d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
@@ -0,0 +1,50 @@
+/* PR tree-optimization/96979 */
+/* { dg-do compile } */
+/* { dg-options "-std=c++17 -O2 -fdump-tree-switchlower1" } */
+
+using u64 = unsigned long long;
+
+constexpr inline u64
+foo (const char *str) noexcept
+{
+  u64 value = 0xcbf29ce484222325ULL;
+  for (u64 i = 0; str[i]; i++)
+value = (value ^ u64(str[i])) * 0x10001b3ULL;
+  return value;
+}
+
+struct V
+{
+  enum W
+  {
+#define A(n) n,
+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) 
A(n##8) A(n##9)
+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
B(n##8) B(n##9)
+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) 
C(n##8) C(n##9)
+#define E D(foo1) D(foo2) D(foo3)
+E
+last
+  };
+
+  constexpr static W
+  bar (const u64 h) noexcept
+  {
+switch (h)
+  {
+#undef A
+#define F(n) #n
+#define A(n) case foo (F(n)): return n;
+E
+  }
+return last;
+  }
+};
+
+int
+baz (const char *s)
+{
+  const u64 h = foo (s);
+  return V::bar (h);
+}
+
+/* { dg-final { scan-tree-dump-times ";; Bail out: --param=max-switch-clustering-attempts 
reached" 2 "switchlower1" } } */
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 186411ff3c4..e6a2c7a6a84 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1183,6 +1183,7 @@ jump_table_cluster::find_jump_tables (vec 
)
 
   min.quick_push (min_cluster_item (0, 0, 0));
 
+  HOST_WIDE_INT attempts = 0;

   for (unsigned i = 1; i <= l; i++)
 {
   /* Set minimal # of clusters with i-th item to infinite.  */
@@ -1194,6 +1195,14 @@ jump_table_cluster::find_jump_tables (vec 
)
  if (i - j < case_values_threshold ())
s += i - j;
 
+	  if (attempts++ == param_max_switch_clustering_attempts)

+   {
+ if (dump_file)
+   fprintf (dump_file, ";; Bail out: "
+"--param=max-switch-clustering-attempts reached\n");
+ return clusters.copy ();
+   }
+
  /* Prefer clusters with smaller