[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 JuzheZhong changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #34 from JuzheZhong --- Fixed.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #33 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:9dd10de15b183f7b662905e1383fdc3a08755f2e commit r14-8639-g9dd10de15b183f7b662905e1383fdc3a08755f2e Author: Juzhe-Zhong Date: Mon Jan 29 19:32:02 2024 +0800 RISC-V: Fix VSETLV PASS compile-time issue The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%)13M ( 1%) machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 ( 53%) 5290k ( 0%) real13m27.074s user13m19.539s sys 0m5.180s Simple vsetvl: machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 4138k ( 0%) real6m5.780s user6m2.396s sys 0m2.373s The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which counts 53% of the compilation time, spends much more time than scheduling. After investigation, the critical patch of VSETVL pass is compute_lcm_local_properties which is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm). This patch optimized the codes of compute_lcm_local_properties to reduce the compilation time. After this patch: scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 ( 27%)13M ( 1%) machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 ( 18%) 5290k ( 0%) real7m25.374s user7m20.116s sys 0m3.795s The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 80s (18%) which spend less time than scheduling. Tested on both RV32 and RV64 no regression. Ok for trunk ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (extract_single_source): Remove. (pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue. (pre_vsetvl::compute_transparent): New function. (pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #32 from GCC Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:3132d2d36b4705bb762e61b1c8ca4da7c78a8321 commit r14-8378-g3132d2d36b4705bb762e61b1c8ca4da7c78a8321 Author: Juzhe-Zhong Date: Tue Jan 23 18:12:49 2024 +0800 RISC-V: Fix large memory usage of VSETVL PASS [PR113495] SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS that is, VSETVL PASS consume over 33 GB memory which make use impossible to compile SPEC 2017 wrf in a laptop. The root cause is wasting-memory variables: unsigned num_exprs = num_bbs * num_regs; sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs); sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs); I find that compute_avl_def_data can be achieved by RTL_SSA framework. Replace the code implementation base on RTL_SSA framework. After this patch, the memory-hog issue is fixed. simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 1.673 GB. lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 2.441 GB. Tested on both RV32 and RV64, no regression. gcc/ChangeLog: PR target/113495 * config/riscv/riscv-vsetvl.cc (get_expr_id): Remove. (get_regno): Ditto. (get_bb_index): Ditto. (pre_vsetvl::compute_avl_def_data): Ditto. (pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage. (pre_vsetvl::pre_global_vsetvl_info): Ditto. gcc/testsuite/ChangeLog: PR target/113495 * gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #31 from JuzheZhong --- machine dep reorg : 403.69 ( 56%) 23.48 ( 93%) 427.17 ( 57%) 5290k ( 0%) Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue. VSETVL PASS count 56% compile-time. Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory usage in htop. Confirm both compile-hog and memory-hog are VSETVL PASS issue. I will work on optimize compile-time as well as memory-usage of VSETVL PASS.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #30 from JuzheZhong --- Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better algorthm. Then the memory-hog should be fixed soon. I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage testingl Since it's going to be a big change there.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #29 from Richard Biener --- (In reply to rguent...@suse.de from comment #26) > On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 > > > > --- Comment #22 from JuzheZhong --- > > (In reply to Richard Biener from comment #21) > > > I once tried to avoid df_reorganize_refs and/or optimize this with the > > > blocks involved but failed. > > > > I am considering whether we should disable LICM for RISC-V by default if > > vector > > is enabled ? > > Since the compile time explode 10 times is really horrible. > > I think that's a bad idea. It only explodes for some degenerate cases. > The best would be to fix invariant motion to keep DF up-to-date so > it can stop using df_analyze_loop and instead analyze the whole function. > Or maybe change it to use the rtl-ssa framework instead. > > There's already param_loop_invariant_max_bbs_in_loop: > > /* Process the loops, innermost first. */ > for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) > { > curr_loop = loop; > /* move_single_loop_invariants for very large loops is time > consuming > and might need a lot of memory. For -O1 only do loop invariant > motion for very small loops. */ > unsigned max_bbs = param_loop_invariant_max_bbs_in_loop; > if (optimize < 2) > max_bbs /= 10; > if (loop->num_nodes <= max_bbs) > move_single_loop_invariants (loop); > } > > it might be possible to restrict invariant motion to innermost loops > when the overall number of loops is too large (with a new param > for that). And when the number of innermost loops also exceeds > the limit avoid even that? The above also misses a > optimize_loop_for_speed_p (loop) check (probably doesn't make > a difference, but you could try). Ah, sorry - I was mis-matching LICM to invariant motion above, still invariant motion is the biggest offender (might be due to DF checking if you enabled that). As for sbitmap vs. bitmap it's a difficult call. When there's big profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit) it might may off to use bitmap but with tree view. There's also sparseset but that requires even more memory.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #28 from JuzheZhong --- (In reply to Robin Dapp from comment #27) > Following up on this: > > I'm seeing the same thing Patrick does. We create a lot of large non-sparse > sbitmaps that amount to around 33G in total. > > I did local experiments replacing all sbitmaps that are not needed for LCM > by regular bitmaps. Apart from output differences vs the original version > the testsuite is unchanged. > > As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before > and we still use 2.7G of RAM for this single file (Likely because of the > remaining sbitmaps) compared to a max of 1.2ish G that the rest of the > commpilation uses. > > One possibility to get the best of both worlds would be to threshold based > on num_bbs * num_exprs. Once we exceed it switch to the bitmap pass, > otherwise keep sbitmaps for performance. > > Messaging with Juzhe offline, his best guess for the LICM time is that he > enabled checking for dataflow which slows down this particular compilation > by a lot. Therefore it doesn't look like a generic problem. Thanks. I don't think replacing sbitmap is the best solution. Let's me first disable DF check and reproduce 33G memory consumption in my local machine. I think the best way to optimize the memory consumption is to optimize the VSETLV PASS algorithm and codes. I have an idea to optimize. I am gonna work on it. Thanks for reporting.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #27 from Robin Dapp --- Following up on this: I'm seeing the same thing Patrick does. We create a lot of large non-sparse sbitmaps that amount to around 33G in total. I did local experiments replacing all sbitmaps that are not needed for LCM by regular bitmaps. Apart from output differences vs the original version the testsuite is unchanged. As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before and we still use 2.7G of RAM for this single file (Likely because of the remaining sbitmaps) compared to a max of 1.2ish G that the rest of the commpilation uses. One possibility to get the best of both worlds would be to threshold based on num_bbs * num_exprs. Once we exceed it switch to the bitmap pass, otherwise keep sbitmaps for performance. Messaging with Juzhe offline, his best guess for the LICM time is that he enabled checking for dataflow which slows down this particular compilation by a lot. Therefore it doesn't look like a generic problem.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #26 from rguenther at suse dot de --- On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 > > --- Comment #22 from JuzheZhong --- > (In reply to Richard Biener from comment #21) > > I once tried to avoid df_reorganize_refs and/or optimize this with the > > blocks involved but failed. > > I am considering whether we should disable LICM for RISC-V by default if > vector > is enabled ? > Since the compile time explode 10 times is really horrible. I think that's a bad idea. It only explodes for some degenerate cases. The best would be to fix invariant motion to keep DF up-to-date so it can stop using df_analyze_loop and instead analyze the whole function. Or maybe change it to use the rtl-ssa framework instead. There's already param_loop_invariant_max_bbs_in_loop: /* Process the loops, innermost first. */ for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) { curr_loop = loop; /* move_single_loop_invariants for very large loops is time consuming and might need a lot of memory. For -O1 only do loop invariant motion for very small loops. */ unsigned max_bbs = param_loop_invariant_max_bbs_in_loop; if (optimize < 2) max_bbs /= 10; if (loop->num_nodes <= max_bbs) move_single_loop_invariants (loop); } it might be possible to restrict invariant motion to innermost loops when the overall number of loops is too large (with a new param for that). And when the number of innermost loops also exceeds the limit avoid even that? The above also misses a optimize_loop_for_speed_p (loop) check (probably doesn't make a difference, but you could try).
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #25 from JuzheZhong --- RISC-V backend memory-hog issue is fixed. But compile time hog in LICM still there, so keep this PR open.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #24 from GCC Commits --- The master branch has been updated by Robin Dapp : https://gcc.gnu.org/g:01260a823073675e13dd1fc85cf2657a5396adf2 commit r14-8282-g01260a823073675e13dd1fc85cf2657a5396adf2 Author: Juzhe-Zhong Date: Fri Jan 19 16:34:25 2024 +0800 RISC-V: Fix RVV_VLMAX This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memory hog is gone. Time variable usr sys wall GGC machine dep reorg : 1.99 ( 9%) 0.35 ( 56%) 2.33 ( 10%) 939M ( 80%) [Before this patch] machine dep reorg : 1.71 ( 6%) 0.16 ( 27%) 3.77 ( 6%) 659k ( 0%) [After this patch] Time variable usr sys wall GGC machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%) 33383M ( 95%) [Before this patch] machine dep reorg : 56.00 ( 14%) 7.92 ( 77%) 63.93 ( 15%) 4361k ( 0%) [After this patch] Test is running. Ok for trunk if I passed the test with no regresion ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-protos.h (RVV_VLMAX): Change to regno_reg_rtx[X0_REGNUM]. (RVV_VUNDEF): Ditto. * config/riscv/riscv-vsetvl.cc: Add timevar.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #23 from Kito Cheng --- > I am considering whether we should disable LICM for RISC-V by default if > vector is enabled ? That's will cause regression for other program, also may hurt those program not vectorized but benefited from LICM.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #22 from JuzheZhong --- (In reply to Richard Biener from comment #21) > I once tried to avoid df_reorganize_refs and/or optimize this with the > blocks involved but failed. I am considering whether we should disable LICM for RISC-V by default if vector is enabled ? Since the compile time explode 10 times is really horrible.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #21 from Richard Biener --- I once tried to avoid df_reorganize_refs and/or optimize this with the blocks involved but failed.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=111241, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=46590 --- Comment #20 from Richard Biener --- IIRC there's a duplicate for this. It's df_analyze_loop calling df_reorganize_refs_* which is doing O(function-size) work for each loop. With -O3 and vectorization the number of loops tends to blow up, making the issue worse.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #19 from JuzheZhong --- (In reply to JuzheZhong from comment #18) > Hi, Robin. > > I have fixed patch for memory-hog: > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html > > I will commit it after the testing. > > But compile-time hog still exists which is loop invariant motion PASS. > > with -fno-move-loop-invariants, we become quite faster. > > Could you take a look at it ? Note that with default -march=rv64gcv_zvl256b -O3: real63m18.771s user60m19.036s sys 2m59.787s But with -march=rv64gcv_zvl256b -O3 -fno-move-loop-invariants: real6m52.984s user6m42.473s sys 0m10.375s 10 times faster without loop invariant motion.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #18 from JuzheZhong --- Hi, Robin. I have fixed patch for memory-hog: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html I will commit it after the testing. But compile-time hog still exists which is loop invariant motion PASS. with -fno-move-loop-invariants, we become quite faster. Could you take a look at it ?
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #17 from JuzheZhong --- Ok. Confirm the original test 33383M -> 4796k now.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #16 from JuzheZhong --- (In reply to Andrew Pinski from comment #15) > (In reply to JuzheZhong from comment #14) > > Oh. I known the reason now. > > > > The issue is not RISC-V backend VSETVL PASS. > > > > It's memory bug of rtx_equal_p I think. > > > It is not rtx_equal_p but rather RVV_VLMAX which is defined as: > riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM) > > Seems like you could cache that somewhere ... Oh. Make sense to me. Thank you so much. I think memory-hog issue will be fixed soon. But the compile-time hog issue of loop invariant motion is still not fixed.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #15 from Andrew Pinski --- (In reply to JuzheZhong from comment #14) > Oh. I known the reason now. > > The issue is not RISC-V backend VSETVL PASS. > > It's memory bug of rtx_equal_p I think. It is not rtx_equal_p but rather RVV_VLMAX which is defined as: riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM) Seems like you could cache that somewhere ...
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #14 from JuzheZhong --- Oh. I known the reason now. The issue is not RISC-V backend VSETVL PASS. It's memory bug of rtx_equal_p I think. We are calling rtx_equal_p which is very costly. For example, has_nonvlmax_reg_avl is calling rtx_equal_p. So I keep all codes unchange, then replace comparison as follows: diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 93a1238a5ab..1c85c8ee3c6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4988,7 +4988,7 @@ nonvlmax_avl_type_p (rtx_insn *rinsn) bool vlmax_avl_p (rtx x) { - return x && rtx_equal_p (x, RVV_VLMAX); + return x && REG_P (x) && REGNO (x) == X0_REGNUM/*rtx_equal_p (x, RVV_VLMAX)*/; } Use REGNO (x) == X0_REGNUM instead of rtx_equal_p. Memory-hog issue is gone: 939M -> 725k. So I am gonna send a patch to walk around rtx_equal_p issues which cause memory-hog.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #13 from JuzheZhong --- So I think we should investigate why calling has_nonvlmax_reg_avl cost so much memory.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #12 from JuzheZhong --- Ok. Here is a simple fix which give some hints: diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 2067073185f..ede818140dc 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2719,10 +2719,11 @@ pre_vsetvl::compute_lcm_local_properties () for (int i = 0; i < num_exprs; i += 1) { const vsetvl_info = *m_exprs[i]; - if (!info.has_nonvlmax_reg_avl () && !info.has_vl ()) + bool has_nonvlmax_reg_avl_p = info.has_nonvlmax_reg_avl (); + if (!has_nonvlmax_reg_avl_p && !info.has_vl ()) continue; - if (info.has_nonvlmax_reg_avl ()) + if (has_nonvlmax_reg_avl_p) { unsigned int regno; sbitmap_iterator sbi; @@ -3556,7 +3557,7 @@ const pass_data pass_data_vsetvl = { RTL_PASS, /* type */ "vsetvl", /* name */ OPTGROUP_NONE, /* optinfo_flags */ - TV_NONE, /* tv_id */ + TV_MACH_DEP, /* tv_id */ 0,/* properties_required */ 0,/* properties_provided */ 0,/* properties_destroyed */ Memory usage from 931M -> 781M. Memory usage reduce significantly. Note that I didn't change all has_nonvlmax_reg_avl, We have so many places calling has_nonvlmax_reg_avl...
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #11 from JuzheZhong --- It should be compute_lcm_local_properties. The memory usage reduce 50% after I remove this function. I am still investigating.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #10 from JuzheZhong --- No, it's not caused here. I removed the whole function compute_avl_def_data. The memory usage doesn't change.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #9 from Andrew Pinski --- (In reply to Andrew Pinski from comment #8) > How sparse is this bitmap will be? bitmap instead of sbitmap should be used > if the bitmap is going to be sparse. sbitmap is a fixed sized based on the > bitmap size while bitmap is better for sparse bitmaps as it is implemented > as linked list. Also it seems like DF already has def_in/def_out info, how much is duplicated information from there?
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #8 from Andrew Pinski --- (In reply to Patrick O'Neill from comment #7) > I believe the memory hog is caused by this: > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc; > h=2067073185f8c0f398908b164a99b592948e6d2d; > hb=565935f93a7da629da89b05812a3e8c43287598f#l2427 > > In the slightly reduced test program I was using to debug there were ~35k > bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k > bitmaps of ~1 million bits. How sparse is this bitmap will be? bitmap instead of sbitmap should be used if the bitmap is going to be sparse. sbitmap is a fixed sized based on the bitmap size while bitmap is better for sparse bitmaps as it is implemented as linked list.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Patrick O'Neill changed: What|Removed |Added CC||patrick at rivosinc dot com --- Comment #7 from Patrick O'Neill --- I believe the memory hog is caused by this: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;h=2067073185f8c0f398908b164a99b592948e6d2d;hb=565935f93a7da629da89b05812a3e8c43287598f#l2427 In the slightly reduced test program I was using to debug there were ~35k bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k bitmaps of ~1 million bits.
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #6 from JuzheZhong --- (In reply to Andrew Pinski from comment #5) > Note "loop invariant motion" is the RTL based loop invariant motion pass. So you mean it should be still RISC-V issue, right ?
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Andrew Pinski changed: What|Removed |Added Component|tree-optimization |rtl-optimization --- Comment #5 from Andrew Pinski --- Note "loop invariant motion" is the RTL based loop invariant motion pass.