[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-30 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

JuzheZhong  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #34 from JuzheZhong  ---
Fixed.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #33 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:9dd10de15b183f7b662905e1383fdc3a08755f2e

commit r14-8639-g9dd10de15b183f7b662905e1383fdc3a08755f2e
Author: Juzhe-Zhong 
Date:   Mon Jan 29 19:32:02 2024 +0800

RISC-V: Fix VSETLV PASS compile-time issue

The compile time issue was discovered in SPEC 2017 wrf:

Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf
compilation .

Before this patch (Lazy vsetvl):

scheduling : 121.89 ( 15%)   0.53 ( 11%) 122.72 (
15%)13M (  1%)
machine dep reorg  : 424.61 ( 53%)   1.84 ( 37%) 427.44 (
53%)  5290k (  0%)
real13m27.074s
user13m19.539s
sys 0m5.180s

Simple vsetvl:

machine dep reorg  :   0.10 (  0%)   0.00 (  0%)   0.11 ( 
0%)  4138k (  0%)
real6m5.780s
user6m2.396s
sys 0m2.373s

The machine dep reorg is the compile time of VSETVL PASS (424 seconds)
which counts 53% of
the compilation time, spends much more time than scheduling.

After investigation, the critical patch of VSETVL pass is
compute_lcm_local_properties which
is called every iteration of phase 2 (earliest fusion) and phase 3 (global
lcm).

This patch optimized the codes of compute_lcm_local_properties to reduce
the compilation time.

After this patch:

scheduling : 117.51 ( 27%)   0.21 (  6%) 118.04 (
27%)13M (  1%)
machine dep reorg  :  80.13 ( 18%)   0.91 ( 26%)  81.26 (
18%)  5290k (  0%)
real7m25.374s
user7m20.116s
sys 0m3.795s

The optimization of this patch is very obvious, lazy VSETVL PASS: 424s
(53%) -> 80s (18%) which
spend less time than scheduling.

Tested on both RV32 and RV64 no regression.  Ok for trunk ?

PR target/113495

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (extract_single_source): Remove.
(pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue.
(pre_vsetvl::compute_transparent): New function.
(pre_vsetvl::compute_lcm_local_properties): Fix compile time time
issue.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #32 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:3132d2d36b4705bb762e61b1c8ca4da7c78a8321

commit r14-8378-g3132d2d36b4705bb762e61b1c8ca4da7c78a8321
Author: Juzhe-Zhong 
Date:   Tue Jan 23 18:12:49 2024 +0800

RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.

The root cause is wasting-memory variables:

unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);

I find that compute_avl_def_data can be achieved by RTL_SSA framework.
Replace the code implementation base on RTL_SSA framework.

After this patch, the memory-hog issue is fixed.

simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
is 1.673 GB.

lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
is 2.441 GB.

Tested on both RV32 and RV64, no regression.

gcc/ChangeLog:

PR target/113495
* config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
(get_regno): Ditto.
(get_bb_index): Ditto.
(pre_vsetvl::compute_avl_def_data): Ditto.
(pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.

gcc/testsuite/ChangeLog:

PR target/113495
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #31 from JuzheZhong  ---
machine dep reorg  : 403.69 ( 56%)  23.48 ( 93%) 427.17 ( 57%) 
5290k (  0%)

Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue.

VSETVL PASS count 56% compile-time.

Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory
usage
in htop.

Confirm both compile-hog and memory-hog are VSETVL PASS issue.

I will work on optimize compile-time as well as memory-usage of VSETVL PASS.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #30 from JuzheZhong  ---
Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better
algorthm.

Then the memory-hog should be fixed soon.

I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage
testingl
Since it's going to be a big change there.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #29 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #26)
> On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
> > 
> > --- Comment #22 from JuzheZhong  ---
> > (In reply to Richard Biener from comment #21)
> > > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > > blocks involved but failed.
> > 
> > I am considering whether we should disable LICM for RISC-V by default if 
> > vector
> > is enabled ?
> > Since the compile time explode 10 times is really horrible.
> 
> I think that's a bad idea.  It only explodes for some degenerate cases.
> The best would be to fix invariant motion to keep DF up-to-date so
> it can stop using df_analyze_loop and instead analyze the whole function.
> Or maybe change it to use the rtl-ssa framework instead.
> 
> There's already param_loop_invariant_max_bbs_in_loop:
> 
>   /* Process the loops, innermost first.  */
>   for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
> {
>   curr_loop = loop;
>   /* move_single_loop_invariants for very large loops is time 
> consuming
>  and might need a lot of memory.  For -O1 only do loop invariant
>  motion for very small loops.  */
>   unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
>   if (optimize < 2)
> max_bbs /= 10;
>   if (loop->num_nodes <= max_bbs)
> move_single_loop_invariants (loop);
> }
> 
> it might be possible to restrict invariant motion to innermost loops
> when the overall number of loops is too large (with a new param
> for that).  And when the number of innermost loops also exceeds
> the limit avoid even that?  The above also misses a
> optimize_loop_for_speed_p (loop) check (probably doesn't make
> a difference, but you could try).

Ah, sorry - I was mis-matching LICM to invariant motion above, still
invariant motion is the biggest offender (might be due to DF checking
if you enabled that).

As for sbitmap vs. bitmap it's a difficult call.  When there's big
profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit)
it might may off to use bitmap but with tree view.  There's also
sparseset but that requires even more memory.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #28 from JuzheZhong  ---
(In reply to Robin Dapp from comment #27)
> Following up on this:
> 
> I'm seeing the same thing Patrick does.  We create a lot of large non-sparse
> sbitmaps that amount to around 33G in total.
> 
> I did local experiments replacing all sbitmaps that are not needed for LCM
> by regular bitmaps.  Apart from output differences vs the original version
> the testsuite is unchanged.
> 
> As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before
> and we still use 2.7G of RAM for this single file (Likely because of the
> remaining sbitmaps) compared to a max of 1.2ish G that the rest of the
> commpilation uses.
> 
> One possibility to get the best of both worlds would be to threshold based
> on num_bbs * num_exprs.  Once we exceed it switch to the bitmap pass,
> otherwise keep sbitmaps for performance. 
> 
> Messaging with Juzhe offline, his best guess for the LICM time is that he
> enabled checking for dataflow which slows down this particular compilation
> by a lot.  Therefore it doesn't look like a generic problem.

Thanks. I don't think replacing sbitmap is the best solution.
Let's me first disable DF check and reproduce 33G memory consumption in my
local
machine.

I think the best way to optimize the memory consumption is to optimize the
VSETLV PASS algorithm and codes. I have an idea to optimize.
I am gonna work on it.

Thanks for reporting.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #27 from Robin Dapp  ---
Following up on this:

I'm seeing the same thing Patrick does.  We create a lot of large non-sparse
sbitmaps that amount to around 33G in total.

I did local experiments replacing all sbitmaps that are not needed for LCM by
regular bitmaps.  Apart from output differences vs the original version the
testsuite is unchanged.

As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before and
we still use 2.7G of RAM for this single file (Likely because of the remaining
sbitmaps) compared to a max of 1.2ish G that the rest of the commpilation uses.

One possibility to get the best of both worlds would be to threshold based on
num_bbs * num_exprs.  Once we exceed it switch to the bitmap pass, otherwise
keep sbitmaps for performance. 

Messaging with Juzhe offline, his best guess for the LICM time is that he
enabled checking for dataflow which slows down this particular compilation by a
lot.  Therefore it doesn't look like a generic problem.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #26 from rguenther at suse dot de  ---
On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
> 
> --- Comment #22 from JuzheZhong  ---
> (In reply to Richard Biener from comment #21)
> > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > blocks involved but failed.
> 
> I am considering whether we should disable LICM for RISC-V by default if 
> vector
> is enabled ?
> Since the compile time explode 10 times is really horrible.

I think that's a bad idea.  It only explodes for some degenerate cases.
The best would be to fix invariant motion to keep DF up-to-date so
it can stop using df_analyze_loop and instead analyze the whole function.
Or maybe change it to use the rtl-ssa framework instead.

There's already param_loop_invariant_max_bbs_in_loop:

  /* Process the loops, innermost first.  */
  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
{
  curr_loop = loop;
  /* move_single_loop_invariants for very large loops is time 
consuming
 and might need a lot of memory.  For -O1 only do loop invariant
 motion for very small loops.  */
  unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
  if (optimize < 2)
max_bbs /= 10;
  if (loop->num_nodes <= max_bbs)
move_single_loop_invariants (loop);
}

it might be possible to restrict invariant motion to innermost loops
when the overall number of loops is too large (with a new param
for that).  And when the number of innermost loops also exceeds
the limit avoid even that?  The above also misses a
optimize_loop_for_speed_p (loop) check (probably doesn't make
a difference, but you could try).

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #25 from JuzheZhong  ---
RISC-V backend memory-hog issue is fixed.
But compile time hog in LICM still there, so keep this PR open.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #24 from GCC Commits  ---
The master branch has been updated by Robin Dapp :

https://gcc.gnu.org/g:01260a823073675e13dd1fc85cf2657a5396adf2

commit r14-8282-g01260a823073675e13dd1fc85cf2657a5396adf2
Author: Juzhe-Zhong 
Date:   Fri Jan 19 16:34:25 2024 +0800

RISC-V: Fix RVV_VLMAX

This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.

After this patch fix, the memory hog is gone.

Time variable   usr   sys 
wall   GGC
 machine dep reorg  :   1.99 (  9%)   0.35 ( 56%)   2.33 (
10%)   939M ( 80%) [Before this patch]
 machine dep reorg  :   1.71 (  6%)   0.16 ( 27%)   3.77 ( 
6%)   659k (  0%) [After this patch]

Time variable   usr   sys 
wall   GGC
 machine dep reorg  :  75.93 ( 18%)  14.23 ( 88%)  90.15 (
21%) 33383M ( 95%) [Before this patch]
 machine dep reorg  :  56.00 ( 14%)   7.92 ( 77%)  63.93 (
15%)  4361k (  0%) [After this patch]

Test is running. Ok for trunk if I passed the test with no regresion ?

PR target/113495

gcc/ChangeLog:

* config/riscv/riscv-protos.h (RVV_VLMAX): Change to
regno_reg_rtx[X0_REGNUM].
(RVV_VUNDEF): Ditto.
* config/riscv/riscv-vsetvl.cc: Add timevar.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #23 from Kito Cheng  ---
> I am considering whether we should disable LICM for RISC-V by default if 
> vector is enabled ?

That's will cause regression for other program, also may hurt those program not
vectorized but benefited from LICM.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #22 from JuzheZhong  ---
(In reply to Richard Biener from comment #21)
> I once tried to avoid df_reorganize_refs and/or optimize this with the
> blocks involved but failed.

I am considering whether we should disable LICM for RISC-V by default if vector
is enabled ?
Since the compile time explode 10 times is really horrible.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #21 from Richard Biener  ---
I once tried to avoid df_reorganize_refs and/or optimize this with the blocks
involved but failed.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=111241,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=46590

--- Comment #20 from Richard Biener  ---
IIRC there's a duplicate for this.  It's df_analyze_loop calling
df_reorganize_refs_* which is doing O(function-size) work for each loop.

With -O3 and vectorization the number of loops tends to blow up, making the
issue worse.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #19 from JuzheZhong  ---
(In reply to JuzheZhong from comment #18)
> Hi, Robin.
> 
> I have fixed patch for memory-hog:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html
> 
> I will commit it after the testing.
> 
> But compile-time hog still exists which is loop invariant motion PASS.
> 
> with -fno-move-loop-invariants, we become quite faster.
> 
> Could you take a look at it ?

Note that with default -march=rv64gcv_zvl256b  -O3:
real63m18.771s
user60m19.036s
sys 2m59.787s

But with -march=rv64gcv_zvl256b -O3 -fno-move-loop-invariants:
real6m52.984s
user6m42.473s
sys 0m10.375s

10 times faster without loop invariant motion.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #18 from JuzheZhong  ---
Hi, Robin.

I have fixed patch for memory-hog:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html

I will commit it after the testing.

But compile-time hog still exists which is loop invariant motion PASS.

with -fno-move-loop-invariants, we become quite faster.

Could you take a look at it ?

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #17 from JuzheZhong  ---
Ok. Confirm the original test 33383M -> 4796k now.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #16 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #15)
> (In reply to JuzheZhong from comment #14)
> > Oh. I known the reason now.
> > 
> > The issue is not RISC-V backend VSETVL PASS.
> > 
> > It's memory bug of rtx_equal_p I think.
> 
> 
> It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
> riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)
> 
> Seems like you could cache that somewhere ...

Oh. Make sense to me. Thank you so much.
I think memory-hog issue will be fixed soon.

But the compile-time hog issue of loop invariant motion is still not fixed.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #15 from Andrew Pinski  ---
(In reply to JuzheZhong from comment #14)
> Oh. I known the reason now.
> 
> The issue is not RISC-V backend VSETVL PASS.
> 
> It's memory bug of rtx_equal_p I think.


It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)

Seems like you could cache that somewhere ...

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #14 from JuzheZhong  ---
Oh. I known the reason now.

The issue is not RISC-V backend VSETVL PASS.

It's memory bug of rtx_equal_p I think.

We are calling rtx_equal_p which is very costly.

For example, has_nonvlmax_reg_avl is calling rtx_equal_p.

So I keep all codes unchange, then replace comparison as follows:

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 93a1238a5ab..1c85c8ee3c6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4988,7 +4988,7 @@ nonvlmax_avl_type_p (rtx_insn *rinsn)
 bool
 vlmax_avl_p (rtx x)
 {
-  return x && rtx_equal_p (x, RVV_VLMAX);
+  return x && REG_P (x) && REGNO (x) == X0_REGNUM/*rtx_equal_p (x,
RVV_VLMAX)*/;
 }

Use REGNO (x) == X0_REGNUM instead of rtx_equal_p.

Memory-hog issue is gone:

939M -> 725k.

So I am gonna send a patch to walk around rtx_equal_p issues which cause
memory-hog.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #13 from JuzheZhong  ---
So I think we should investigate why calling has_nonvlmax_reg_avl cost so much
memory.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #12 from JuzheZhong  ---
Ok. Here is a simple fix which give some hints:


diff --git a/gcc/config/riscv/riscv-vsetvl.cc
b/gcc/config/riscv/riscv-vsetvl.cc
index 2067073185f..ede818140dc 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2719,10 +2719,11 @@ pre_vsetvl::compute_lcm_local_properties ()
  for (int i = 0; i < num_exprs; i += 1)
{
  const vsetvl_info  = *m_exprs[i];
- if (!info.has_nonvlmax_reg_avl () && !info.has_vl ())
+ bool has_nonvlmax_reg_avl_p = info.has_nonvlmax_reg_avl ();
+ if (!has_nonvlmax_reg_avl_p && !info.has_vl ())
continue;

- if (info.has_nonvlmax_reg_avl ())
+ if (has_nonvlmax_reg_avl_p)
{
  unsigned int regno;
  sbitmap_iterator sbi;
@@ -3556,7 +3557,7 @@ const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
   "vsetvl", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE,  /* tv_id */
+  TV_MACH_DEP,  /* tv_id */
   0,/* properties_required */
   0,/* properties_provided */
   0,/* properties_destroyed */


Memory usage from 931M -> 781M. Memory usage reduce significantly.

Note that I didn't change all has_nonvlmax_reg_avl, We have so many places
calling  has_nonvlmax_reg_avl...

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #11 from JuzheZhong  ---
It should be compute_lcm_local_properties. The memory usage reduce 50% after I
remove this function. I am still investigating.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #10 from JuzheZhong  ---
No, it's not caused here. I removed the whole function compute_avl_def_data.

The memory usage doesn't change.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #9 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #8)
> How sparse is this bitmap will be?  bitmap instead of sbitmap should be used
> if the bitmap is going to be sparse. sbitmap is a fixed sized based on the
> bitmap size while bitmap is better for sparse bitmaps as it is implemented
> as linked list.

Also it seems like DF already has def_in/def_out info, how much is duplicated
information from there?

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #8 from Andrew Pinski  ---
(In reply to Patrick O'Neill from comment #7)
> I believe the memory hog is caused by this:
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;
> h=2067073185f8c0f398908b164a99b592948e6d2d;
> hb=565935f93a7da629da89b05812a3e8c43287598f#l2427
> 
> In the slightly reduced test program I was using to debug there were ~35k
> bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k
> bitmaps of ~1 million bits.

How sparse is this bitmap will be?  bitmap instead of sbitmap should be used if
the bitmap is going to be sparse. sbitmap is a fixed sized based on the bitmap
size while bitmap is better for sparse bitmaps as it is implemented as linked
list.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Patrick O'Neill  changed:

   What|Removed |Added

 CC||patrick at rivosinc dot com

--- Comment #7 from Patrick O'Neill  ---
I believe the memory hog is caused by this:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;h=2067073185f8c0f398908b164a99b592948e6d2d;hb=565935f93a7da629da89b05812a3e8c43287598f#l2427

In the slightly reduced test program I was using to debug there were ~35k bb's
leading to num_expr being roughly 1 million. vsetvl then makes 35k bitmaps of
~1 million bits.

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #6 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #5)
> Note "loop invariant motion" is the RTL based loop invariant motion pass.

So you mean it should be still RISC-V issue, right ?

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |rtl-optimization

--- Comment #5 from Andrew Pinski  ---
Note "loop invariant motion" is the RTL based loop invariant motion pass.