Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Fri, 1 Sep 2023, Filip Kastl wrote: > > That's interesting. Your placement at > > > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > > NEXT_PASS (pass_phiopt, true /* early_p */); > > + NEXT_PASS (pass_sccp); > > > > and > > > >NEXT_PASS (pass_tsan); > >NEXT_PASS (pass_dse, true /* use DR analysis */); > >NEXT_PASS (pass_dce); > > + NEXT_PASS (pass_sccp); > > > > isn't immediately after the "best" existing pass we have to > > remove dead PHIs which is pass_cd_dce. phiopt might leave > > dead PHIs around and the second instance runs long after the > > last CD-DCE. > > > > So I wonder if your pass just detects unnecessary PHIs we'd have > > removed by other means and what survives until RTL expansion is > > what we should count? > > > > Can you adjust your original early placement to right after > > the cd-dce pass and for the late placement turn the dce pass > > before it into cd-dce and re-do your measurements? > > So I did this > > NEXT_PASS (pass_dse); > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_sccp); > NEXT_PASS (pass_phiopt, true /* early_p */); > NEXT_PASS (pass_tail_recursion); > > and this > > NEXT_PASS (pass_dse, true /* use DR analysis */); > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_sccp); > /* Pass group that runs when 1) enabled, 2) there are loops > > and got these results: > > 500.perlbench_r > Started with (1) 30318 > Ended with (1) 26219 > Removed PHI % (1) 13.52002110957187149600 > Started with (2) 39043 > Ended with (2) 38941 > Removed PHI % (2) .26125041620777092000 > > 502.gcc_r > Started with (1) 148361 > Ended with (1) 140464 > Removed PHI % (1) 5.32282742769326170700 > Started with (2) 216209 > Ended with (2) 215367 > Removed PHI % (2) .38943799749316633500 > > 505.mcf_r > Started with (1) 342 > Ended with (1) 304 > Removed PHI % (1) 11.1200 > Started with (2) 437 > Ended with (2) 433 > Removed PHI % (2) .91533180778032036700 > > 523.xalancbmk_r > Started with (1) 62995 > Ended with (1) 58289 > Removed PHI % (1) 7.47043416144138423700 > Started with (2) 134026 > Ended with (2) 133193 > Removed PHI % (2) .62152119737961291100 > > 531.deepsjeng_r > Started with (1) 1402 > Ended with (1) 1264 > Removed PHI % (1) 9.84308131241084165500 > Started with (2) 1928 > Ended with (2) 1920 > Removed PHI % (2) .41493775933609958600 > > 541.leela_r > Started with (1) 3398 > Ended with (1) 3060 > Removed PHI % (1) 9.94702766333137139500 > Started with (2) 4473 > Ended with (2) 4453 > Removed PHI % (2) .44712720769058797300 > > 557.xz_r > Started with (1) 47 > Ended with (1) 44 > Removed PHI % (1) 6.38297872340425532000 > Started with (2) 43 > Ended with (2) 43 > Removed PHI % (2) 0 > > These measurements don't differ very much from the previous. It seems to me > that phiopt does output some redundant PHIs but the vast majority of the > eliminated PHIs are generated in earlier passes and cd_dce isn't able to get > rid of them. > > A noteworthy information might be that most of the eliminated PHIs are > actually > trivial PHIs. I consider a PHI to be trivial if it only references itself or > one other SSA name. Ah. The early pass numbers are certainly intresting - can you elaborate on the last bit? We have for example loop-closed PHI nodes like _1 = PHI <_2> and there are non-trivial degenerate PHIs like _1 = PHI <_2, _2> those are generally removed by value-numbering (FRE, DOM and PRE) and SSA propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't remove them. But we do have passes removing these kind of PHIs. The issue with the early pass is likely that we have NEXT_PASS (pass_fre, true /* may_iterate */); ^^ would elimimate these kind of PHIs NEXT_PASS (pass_early_vrp); ^^ rewrites into loop-closed SSA, adding many such PHIs NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); and until here there's no pass eliding the LC SSA PHIs. You could add a pass_copy_prop after early_vrp, the later sccp pass shouldn't run into this issue I think so it must be other passes adding such kind of PHIs. Maybe you can count single-argument PHIs, degenerate multi-arg PHIs and "other" PHIs separately as you remove them? > Here is a comparison of the newest measurements (sccp after cd_dce) with the > previous ones (sccp after phiopt and dce): > > 500.perlbench_r > > Started with (1-PREV) 30287 > Started with (1-NEW) 30318 > > Ended with (1-PREV) 26188 > Ended with (1-NEW) 26219 > > Removed PHI % (1-PREV) 13.53385941162875161000 > Removed PHI % (1-NEW)
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
> That's interesting. Your placement at > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_phiopt, true /* early_p */); > + NEXT_PASS (pass_sccp); > > and > >NEXT_PASS (pass_tsan); >NEXT_PASS (pass_dse, true /* use DR analysis */); >NEXT_PASS (pass_dce); > + NEXT_PASS (pass_sccp); > > isn't immediately after the "best" existing pass we have to > remove dead PHIs which is pass_cd_dce. phiopt might leave > dead PHIs around and the second instance runs long after the > last CD-DCE. > > So I wonder if your pass just detects unnecessary PHIs we'd have > removed by other means and what survives until RTL expansion is > what we should count? > > Can you adjust your original early placement to right after > the cd-dce pass and for the late placement turn the dce pass > before it into cd-dce and re-do your measurements? So I did this NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); NEXT_PASS (pass_sccp); NEXT_PASS (pass_phiopt, true /* early_p */); NEXT_PASS (pass_tail_recursion); and this NEXT_PASS (pass_dse, true /* use DR analysis */); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); NEXT_PASS (pass_sccp); /* Pass group that runs when 1) enabled, 2) there are loops and got these results: 500.perlbench_r Started with (1) 30318 Ended with (1) 26219 Removed PHI % (1) 13.52002110957187149600 Started with (2) 39043 Ended with (2) 38941 Removed PHI % (2) .26125041620777092000 502.gcc_r Started with (1) 148361 Ended with (1) 140464 Removed PHI % (1) 5.32282742769326170700 Started with (2) 216209 Ended with (2) 215367 Removed PHI % (2) .38943799749316633500 505.mcf_r Started with (1) 342 Ended with (1) 304 Removed PHI % (1) 11.1200 Started with (2) 437 Ended with (2) 433 Removed PHI % (2) .91533180778032036700 523.xalancbmk_r Started with (1) 62995 Ended with (1) 58289 Removed PHI % (1) 7.47043416144138423700 Started with (2) 134026 Ended with (2) 133193 Removed PHI % (2) .62152119737961291100 531.deepsjeng_r Started with (1) 1402 Ended with (1) 1264 Removed PHI % (1) 9.84308131241084165500 Started with (2) 1928 Ended with (2) 1920 Removed PHI % (2) .41493775933609958600 541.leela_r Started with (1) 3398 Ended with (1) 3060 Removed PHI % (1) 9.94702766333137139500 Started with (2) 4473 Ended with (2) 4453 Removed PHI % (2) .44712720769058797300 557.xz_r Started with (1) 47 Ended with (1) 44 Removed PHI % (1) 6.38297872340425532000 Started with (2) 43 Ended with (2) 43 Removed PHI % (2) 0 These measurements don't differ very much from the previous. It seems to me that phiopt does output some redundant PHIs but the vast majority of the eliminated PHIs are generated in earlier passes and cd_dce isn't able to get rid of them. A noteworthy information might be that most of the eliminated PHIs are actually trivial PHIs. I consider a PHI to be trivial if it only references itself or one other SSA name. Here is a comparison of the newest measurements (sccp after cd_dce) with the previous ones (sccp after phiopt and dce): 500.perlbench_r Started with (1-PREV) 30287 Started with (1-NEW) 30318 Ended with (1-PREV) 26188 Ended with (1-NEW) 26219 Removed PHI % (1-PREV) 13.53385941162875161000 Removed PHI % (1-NEW) 13.52002110957187149600 Started with (2-PREV) 38005 Started with (2-NEW) 39043 Ended with (2-PREV) 37897 Ended with (2-NEW) 38941 Removed PHI % (2-PREV) .28417313511380081600 Removed PHI % (2-NEW) .26125041620777092000 502.gcc_r Started with (1-PREV) 148187 Started with (1-NEW) 148361 Ended with (1-PREV) 140292 Ended with (1-NEW) 140464 Removed PHI % (1-PREV) 5.32772780338356266100 Removed PHI % (1-NEW) 5.32282742769326170700 Started with (2-PREV) 211479 Started with (2-NEW) 216209 Ended with (2-PREV) 210635 Ended with (2-NEW) 215367 Removed PHI % (2-PREV) .39909399987705635100 Removed PHI % (2-NEW) .38943799749316633500 Filip K P.S. I made a small mistake and didn't compute the benchmark speedup percentages right in the previous email. Here are the corrected results. The correct percentages are a little bit smaller but very similar. There is still a ~2% speedup with 505.mcf_r and 541.leela_r. 500.perlbench_r Without SCCP: 244.151807s With SCCP: 242.448438s -0.6976679881791663% 502.gcc_r Without SCCP: 211.029606s With SCCP: 211.614523s +0.27717295742853737% 505.mcf_r Without SCCP: 298.782621s With SCCP: 291.671468s -2.380042378703145% 523.xalancbmk_r Without SCCP: 189.940639s With SCCP: 189.876261s -0.03389374719330334% 531.deepsjeng_r Without SCCP: 250.63648s With SCCP: 250.988624s +0.14049989849840747% 541.leela_r Without SCCP: 346.066278s With SCCP: 339.692987s
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Thu, 31 Aug 2023, Andrew Pinski wrote: > On Thu, Aug 31, 2023 at 5:15?AM Richard Biener via Gcc-patches > wrote: > > > > On Thu, 31 Aug 2023, Filip Kastl wrote: > > > > > > The most obvious places would be right after SSA construction and > > > > before RTL expansion. > > > > Can you provide measurements for those positions? > > > > > > The algorithm should only remove PHIs that break SSA form minimality. > > > Since > > > GCC's SSA construction already produces minimal SSA form, the algorithm > > > isn't > > > expected to remove any PHIs if run right after the construction. I even > > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, > > > where the > > > algorithm managed to remove exactly 1 PHI, which is weird). > > > > > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to > > > remove at that point, but there still are some. > > > > That's interesting. Your placement at > > > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > > NEXT_PASS (pass_phiopt, true /* early_p */); > > + NEXT_PASS (pass_sccp); > > > > and > > > >NEXT_PASS (pass_tsan); > >NEXT_PASS (pass_dse, true /* use DR analysis */); > >NEXT_PASS (pass_dce); > > + NEXT_PASS (pass_sccp); > > > > isn't immediately after the "best" existing pass we have to > > remove dead PHIs which is pass_cd_dce. phiopt might leave > > dead PHIs around and the second instance runs long after the > > last CD-DCE. > > Actually the last phiopt is run before last pass_cd_dce: I meant the second instance of pass_sccp, not phiopt. Richard.
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Thu, Aug 31, 2023 at 5:15 AM Richard Biener via Gcc-patches wrote: > > On Thu, 31 Aug 2023, Filip Kastl wrote: > > > > The most obvious places would be right after SSA construction and before > > > RTL expansion. > > > Can you provide measurements for those positions? > > > > The algorithm should only remove PHIs that break SSA form minimality. Since > > GCC's SSA construction already produces minimal SSA form, the algorithm > > isn't > > expected to remove any PHIs if run right after the construction. I even > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where > > the > > algorithm managed to remove exactly 1 PHI, which is weird). > > > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to > > remove at that point, but there still are some. > > That's interesting. Your placement at > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_phiopt, true /* early_p */); > + NEXT_PASS (pass_sccp); > > and > >NEXT_PASS (pass_tsan); >NEXT_PASS (pass_dse, true /* use DR analysis */); >NEXT_PASS (pass_dce); > + NEXT_PASS (pass_sccp); > > isn't immediately after the "best" existing pass we have to > remove dead PHIs which is pass_cd_dce. phiopt might leave > dead PHIs around and the second instance runs long after the > last CD-DCE. Actually the last phiopt is run before last pass_cd_dce: NEXT_PASS (pass_dce, true /* update_address_taken_p */); /* After late DCE we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_sink_code, true /* unsplit edges */); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_fold_builtins); NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_store_merging); /* If DCE is not run before checking for uninitialized uses, we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c). However, this also causes us to misdiagnose cases that should be real warnings (e.g., testsuite/gcc.dg/pr18501.c). */ NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); Thanks, Andrew Pinski > > So I wonder if your pass just detects unnecessary PHIs we'd have > removed by other means and what survives until RTL expansion is > what we should count? > > Can you adjust your original early placement to right after > the cd-dce pass and for the late placement turn the dce pass > before it into cd-dce and re-do your measurements? > > > 500.perlbench_r > > Started with 43111 > > Ended with 42942 > > Removed PHI % .39201131961680313700 > > > > 502.gcc_r > > Started with 141392 > > Ended with 140455 > > Removed PHI % .66269661649881181400 > > > > 505.mcf_r > > Started with 482 > > Ended with 478 > > Removed PHI % .82987551867219917100 > > > > 523.xalancbmk_r > > Started with 136040 > > Ended with 135629 > > Removed PHI % .30211702440458688700 > > > > 531.deepsjeng_r > > Started with 2150 > > Ended with 2148 > > Removed PHI % .09302325581395348900 > > > > 541.leela_r > > Started with 4664 > > Ended with 4650 > > Removed PHI % .30017152658662092700 > > > > 557.xz_r > > Started with 43 > > Ended with 43 > > Removed PHI % 0 > > > > > Can the pass somehow be used as part of propagations like during value > > > numbering? > > > > I don't think that the pass could be used as a part of different > > optimizations > > since it works on the whole CFG (except for copy propagation as I noted in > > the > > RFC). I'm adding Honza into Cc. He'll have more insight into this. > > > > > Could the new file be called gimple-ssa-sccp.cc or something similar? > > > > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more > > appropriate? > > > > I'm thinking about naming the pass 'scc-copy' and the file > > 'tree-ssa-scc-copy.cc'. > > > > > Removing some PHIs is nice, but it would be also interesting to know what > > > are the effects on generated code size and/or performance. > > > And also if it has any effects on debug information coverage. > > > > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with > > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r. > > Here are the full results. What do you think? Should I run more benchmarks? > > Or > > benchmark multiple times? Or run the benchmarks on different machines?* > > > > 500.perlbench_r > > Without SCCP: 244.151807s > > With SCCP: 242.448438s > > -0.7025695913124297% > > > > 502.gcc_r > > Without SCCP: 211.029606s > > With SCCP: 211.614523s > > +0.27640683243653763% > > > > 505.mcf_r > > Without SCCP: 298.782621s > > With SCCP: 291.671468s > > -2.438069465197046% > > > > 523.xalancbmk_r > > Without SCCP: 189.940639s > > With SCCP: 189.876261s > > -0.03390523894928332% > > > > 531.deepsjeng_r > > Without SCCP: 250.63648s > > With SCCP: 250.988624s > > +0.1403027732444051% > > > > 541.leela_r > >
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Thu, 31 Aug 2023, Filip Kastl wrote: > > The most obvious places would be right after SSA construction and before > > RTL expansion. > > Can you provide measurements for those positions? > > The algorithm should only remove PHIs that break SSA form minimality. Since > GCC's SSA construction already produces minimal SSA form, the algorithm isn't > expected to remove any PHIs if run right after the construction. I even > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the > algorithm managed to remove exactly 1 PHI, which is weird). > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to > remove at that point, but there still are some. That's interesting. Your placement at NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); NEXT_PASS (pass_phiopt, true /* early_p */); + NEXT_PASS (pass_sccp); and NEXT_PASS (pass_tsan); NEXT_PASS (pass_dse, true /* use DR analysis */); NEXT_PASS (pass_dce); + NEXT_PASS (pass_sccp); isn't immediately after the "best" existing pass we have to remove dead PHIs which is pass_cd_dce. phiopt might leave dead PHIs around and the second instance runs long after the last CD-DCE. So I wonder if your pass just detects unnecessary PHIs we'd have removed by other means and what survives until RTL expansion is what we should count? Can you adjust your original early placement to right after the cd-dce pass and for the late placement turn the dce pass before it into cd-dce and re-do your measurements? > 500.perlbench_r > Started with 43111 > Ended with 42942 > Removed PHI % .39201131961680313700 > > 502.gcc_r > Started with 141392 > Ended with 140455 > Removed PHI % .66269661649881181400 > > 505.mcf_r > Started with 482 > Ended with 478 > Removed PHI % .82987551867219917100 > > 523.xalancbmk_r > Started with 136040 > Ended with 135629 > Removed PHI % .30211702440458688700 > > 531.deepsjeng_r > Started with 2150 > Ended with 2148 > Removed PHI % .09302325581395348900 > > 541.leela_r > Started with 4664 > Ended with 4650 > Removed PHI % .30017152658662092700 > > 557.xz_r > Started with 43 > Ended with 43 > Removed PHI % 0 > > > Can the pass somehow be used as part of propagations like during value > > numbering? > > I don't think that the pass could be used as a part of different optimizations > since it works on the whole CFG (except for copy propagation as I noted in the > RFC). I'm adding Honza into Cc. He'll have more insight into this. > > > Could the new file be called gimple-ssa-sccp.cc or something similar? > > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more > appropriate? > > I'm thinking about naming the pass 'scc-copy' and the file > 'tree-ssa-scc-copy.cc'. > > > Removing some PHIs is nice, but it would be also interesting to know what > > are the effects on generated code size and/or performance. > > And also if it has any effects on debug information coverage. > > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r. > Here are the full results. What do you think? Should I run more benchmarks? Or > benchmark multiple times? Or run the benchmarks on different machines?* > > 500.perlbench_r > Without SCCP: 244.151807s > With SCCP: 242.448438s > -0.7025695913124297% > > 502.gcc_r > Without SCCP: 211.029606s > With SCCP: 211.614523s > +0.27640683243653763% > > 505.mcf_r > Without SCCP: 298.782621s > With SCCP: 291.671468s > -2.438069465197046% > > 523.xalancbmk_r > Without SCCP: 189.940639s > With SCCP: 189.876261s > -0.03390523894928332% > > 531.deepsjeng_r > Without SCCP: 250.63648s > With SCCP: 250.988624s > +0.1403027732444051% > > 541.leela_r > Without SCCP: 346.066278s > With SCCP: 339.692987s > -1.8761915152519792% > > Regarding size: The pass doesn't seem to significantly reduce or increase the > size of the result binary. The differences were at most ~0.1%. > > Regarding debug info coverage: I didn't notice any additional guality > testcases > failing after I applied the patch. *Is there any other way how I should check > debug info coverage?* > > > Filip K > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Thu, Aug 31, 2023 at 01:26:37PM +0200, Filip Kastl wrote: > Regarding debug info coverage: I didn't notice any additional guality > testcases > failing after I applied the patch. *Is there any other way how I should check > debug info coverage?* I'm usually using https://github.com/pmachata/dwlocstat for that, usually on cc1plus from the last stage of gcc bootstrap. Though of course, if one tree is unpatched and one patched, that results in different code not just because the optimization did something, but because it is different source. So, for such purposes, I usually after one of the 2 bootstraps apply resp. revert the patch, make a copy of the cc1plus binary and do make -jN cc1plus in the last stage directory (only there, not attempt to rebootstrap). Jakub
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
> The most obvious places would be right after SSA construction and before RTL > expansion. > Can you provide measurements for those positions? The algorithm should only remove PHIs that break SSA form minimality. Since GCC's SSA construction already produces minimal SSA form, the algorithm isn't expected to remove any PHIs if run right after the construction. I even measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the algorithm managed to remove exactly 1 PHI, which is weird). I tried putting the pass before pass_expand. There isn't a lot of PHIs to remove at that point, but there still are some. 500.perlbench_r Started with 43111 Ended with 42942 Removed PHI % .39201131961680313700 502.gcc_r Started with 141392 Ended with 140455 Removed PHI % .66269661649881181400 505.mcf_r Started with 482 Ended with 478 Removed PHI % .82987551867219917100 523.xalancbmk_r Started with 136040 Ended with 135629 Removed PHI % .30211702440458688700 531.deepsjeng_r Started with 2150 Ended with 2148 Removed PHI % .09302325581395348900 541.leela_r Started with 4664 Ended with 4650 Removed PHI % .30017152658662092700 557.xz_r Started with 43 Ended with 43 Removed PHI % 0 > Can the pass somehow be used as part of propagations like during value > numbering? I don't think that the pass could be used as a part of different optimizations since it works on the whole CFG (except for copy propagation as I noted in the RFC). I'm adding Honza into Cc. He'll have more insight into this. > Could the new file be called gimple-ssa-sccp.cc or something similar? Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more appropriate? I'm thinking about naming the pass 'scc-copy' and the file 'tree-ssa-scc-copy.cc'. > Removing some PHIs is nice, but it would be also interesting to know what > are the effects on generated code size and/or performance. > And also if it has any effects on debug information coverage. Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r. Here are the full results. What do you think? Should I run more benchmarks? Or benchmark multiple times? Or run the benchmarks on different machines?* 500.perlbench_r Without SCCP: 244.151807s With SCCP: 242.448438s -0.7025695913124297% 502.gcc_r Without SCCP: 211.029606s With SCCP: 211.614523s +0.27640683243653763% 505.mcf_r Without SCCP: 298.782621s With SCCP: 291.671468s -2.438069465197046% 523.xalancbmk_r Without SCCP: 189.940639s With SCCP: 189.876261s -0.03390523894928332% 531.deepsjeng_r Without SCCP: 250.63648s With SCCP: 250.988624s +0.1403027732444051% 541.leela_r Without SCCP: 346.066278s With SCCP: 339.692987s -1.8761915152519792% Regarding size: The pass doesn't seem to significantly reduce or increase the size of the result binary. The differences were at most ~0.1%. Regarding debug info coverage: I didn't notice any additional guality testcases failing after I applied the patch. *Is there any other way how I should check debug info coverage?* Filip K
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
On Thu, Aug 24, 2023 at 05:47:09PM +0200, Richard Biener via Gcc-patches wrote: > > Do you think that the pass is worthy of inclusion into upstream GCC? What > > are > > some things that I should change? Should I try to put the pass in different > > places in passes.def? > > The most obvious places would be right after SSA construction and before RTL > expansion. > Can you provide measurements for those positions? > Can the pass somehow be used as part of propagations like during value > numbering? Could the new file be called gimple-ssa-sccp.cc or something similar? Removing some PHIs is nice, but it would be also interesting to know what are the effects on generated code size and/or performance. And also if it has any effects on debug information coverage. Jakub
Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
> Am 24.08.2023 um 17:07 schrieb Filip Kastl : > > Hi, > > As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), > I > implemented a new PHI elimination algorithm into GCC. The algorithm is > described in the following article: > > Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A. > (2013). Simple and Efficient Construction of Static Single Assignment > Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC > 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin, > Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6 > > In the article the PHI elimination algorithm is only considered a part of > another algorithm. However, with Honza we tried running the algorithm as a > standalone pass and found that there is a surprisingly big number of PHI > functions it is able to remove -- sometimes even ~13% of PHI functions or > more. > This email contains a patch with the pass and with the placement in passes.def > we used to measure this. > > Do you think that the pass is worthy of inclusion into upstream GCC? What are > some things that I should change? Should I try to put the pass in different > places in passes.def? The most obvious places would be right after SSA construction and before RTL expansion. Can you provide measurements for those positions? Can the pass somehow be used as part of propagations like during value numbering? Richard > Things I already know I'd like to change: > - Split the patch into two (one for sccp, one for the utility functions) > - Change the name SCCP to something else since there already is a pass with > that name (any suggestions?) > - Add a comment into sccp.cc explaining the algorithm > > I successfully bootstrapped and tested GCC with the patch applied (with the > commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). > > Here are my measurements. I measured the number of PHIs before the PHI > elimination algorithm was run and after it was run. I measured on the standard > 2017 benchmarks with -O3. Since the pass is present in passes.def twice, > results of the first run are marked (1) and results of the second are marked > (2). Honza also did measurements with profile feedback and got even bigger > percentages. > > 500.perlbench_r > Started with (1) 30287 > Ended with (1) 26188 > Removed PHI % (1) 13.53385941162875161000 > Started with (2) 38005 > Ended with (2) 37897 > Removed PHI % (2) .28417313511380081600 > > 502.gcc_r > Started with (1) 148187 > Ended with (1) 140292 > Removed PHI % (1) 5.32772780338356266100 > Started with (2) 211479 > Ended with (2) 210635 > Removed PHI % (2) .39909399987705635100 > > 505.mcf_r > Started with (1) 341 > Ended with (1) 303 > Removed PHI % (1) 11.14369501466275659900 > Started with (2) 430 > Ended with (2) 426 > Removed PHI % (2) .93023255813953488400 > > 523.xalancbmk_r > Started with (1) 62514 > Ended with (1) 57785 > Removed PHI % (1) 7.5647055059346800 > Started with (2) 132561 > Ended with (2) 131726 > Removed PHI % (2) .62989868815111533600 > > 531.deepsjeng_r > Started with (1) 1388 > Ended with (1) 1250 > Removed PHI % (1) 9.94236311239193083600 > Started with (2) 1887 > Ended with (2) 1879 > Removed PHI % (2) .42395336512983571900 > > 541.leela_r > Started with (1) 3332 > Ended with (1) 2994 > Removed PHI % (1) 10.14405762304921968800 > Started with (2) 4372 > Ended with (2) 4352 > Removed PHI % (2) .45745654162854528900 > > Here is the patch: > > -- >8 -- > > This patch introduces two things: > - A new PHI elimination pass (major) > - New utility functions for passes that replace one ssa name with a > different one (minor) > > Strongly-connected copy propagation (SCCP) pass > > The PHI elimination pass is a lightweight optimization pass based on > strongly-connected components. Some set of PHIs may be redundant because > the PHIs only refer to each other or to a single value from outside the > set. This pass finds and eliminates these sets. As a bonus the pass also > does some basic copy propagation because it considers a copy statement > to be a PHI with a single argument. > > SCCP uses an algorithm from this article: > Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A. > (2013). Simple and Efficient Construction of Static Single Assignment > Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC > 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin, > Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6 > > cleanup_after_replace and cleanup_after_all_replaces_done > > Whenever you replace all uses of an ssa name by a different ssa name, > some GCC internal structures have to be updated. To streamline this > process, the patch adds the cleanup_after_replace function that should > be called after an ssa name is replaced by a different one and the > cleanup_after_all_replaces_done that should be called before a pass that > replaced one or more ssa names