Re: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
OK I guess change register_operand into REG_P should fix those ICEs. I will have a try and send a patch. Thanks. juzhe.zh...@rivai.ai From: Andrew Pinski Date: 2024-01-22 15:14 To: juzhe.zh...@rivai.ai CC: gcc-patches; Kito.cheng; kito.cheng; jeffreyalaw; vineetg; Robin Dapp; palmer; Patrick O'Neill Subject: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation. On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai wrote: > > Hi, this patch causes these regressions (all ICEs) in RISC-V backend: Note this is related to the fix for PR 109092. But right now register_operand still accepts `(SUBREG (MEM...))` before reload ... So basically there is a check against register_operand to then reg_or_subregno is called but since register_operand allows `(SUBREG (MEM())` but reg_or_subregno asserts on only having `REG` or `SUBREG(REG)`, things go downhill. What a mess. I really wish `(SUBREG (MEM...))` handling would go away ... Thanks, Andrew Pinski > > FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -Os (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -Os (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O3 -g (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O3 -g (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -Os (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -Os (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (te
Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai wrote: > > Hi, this patch causes these regressions (all ICEs) in RISC-V backend: Note this is related to the fix for PR 109092. But right now register_operand still accepts `(SUBREG (MEM...))` before reload ... So basically there is a check against register_operand to then reg_or_subregno is called but since register_operand allows `(SUBREG (MEM())` but reg_or_subregno asserts on only having `REG` or `SUBREG(REG)`, things go downhill. What a mess. I really wish `(SUBREG (MEM...))` handling would go away ... Thanks, Andrew Pinski > > FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (test for excess errors) > FAIL: gcc.dg/torture/float32-tg-2.c -Os (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg-2.c -Os (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -O3 -g (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -O3 -g (test for excess errors) > FAIL: gcc.dg/torture/float32-tg.c -Os (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/float32-tg.c -Os (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O1 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O1 (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at > jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (test for excess errors) > FAIL: gcc.dg/torture/pr48124-4.c -Os (internal compiler error: in > reg_or_subregno, at jump.cc:1895) > FAIL: gcc.dg/torture/pr48124-4.c -Os (test for excess errors) > > > juzhe.zh...@rivai.ai
Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
On Tue, Jan 16, 2024 at 2:13 AM Roger Sayle wrote: > > > This patch resolves PR rtl-optimization/111267 by improving RTL-level > forward propagation. This x86_64 code quality regression was caused > (exposed) by my changes to improve how x86's (TImode) argument passing > is represented at the RTL-level (reducing the use of SUBREGs to catch > more optimization opportunities in combine). The pitfall is that the > more complex RTL representations expose a limitation in RTL's fwprop > pass. > > At the heart of fwprop, in try_fwprop_subst_pattern, the logic can > be summarized as three steps. Step 1 is a heuristic that rejects the > propagation attempt if the expression is too complex, step 2 calls > the backend's recog to see if the propagated/simplified instruction > is recognizable/valid, and step 3 then calls src_cost to compare > the rtx costs of the replacement vs. the original, and accepts the > transformation if the final cost is the same of better. > > The logic error (or missed optimization opportunity) is that the > step 1 heuristic that attempts to predict (second guess) the > process is flawed. Ultimately the decision on whether to fwprop > or not should depend solely on actual improvement, as measured > by RTX costs. Hence the prototype fix in the bugzilla PR removes > the heuristic of calling prop.profitable_p entirely, relying > entirely on the cost comparison in step 3. > > Unfortunately, things are a tiny bit more complicated. The cost > comparison in fwprop uses the older set_src_cost API and not the > newer (preffered) insn_cost API as currently used in combine. > This means that the cost improvement comparisons are only done > for single_set instructions (more complex PARALLELs etc. aren't > supported). Hence we can only rely on skipping step 1 for that > subset of instructions actually evaluated by step 3. > > The other subtlety is that to avoid potential infinite loops > in fwprop we should only reply purely on rtx costs when the > transformation is obviously an improvement. If the replacement > has the same cost as the original, we can use the prop.profitable_p > test to preserve the current behavior. > > Finally, to answer Richard Biener's remaining question about this > approach: yes, there is an asymmetry between how patterns are > handled and how REG_EQUAL notes are handled. For example, at > the moment propagation into notes doesn't use rtx costs at all, > and ultimately when fwprop is updated to use insn_cost, this > (and recog) obviously isn't applicable to notes. There's no reason > the logic need be identical between patterns and notes, and during > stage4 we only need update propagation into patterns to fix this > P1 regression (notes and use of cost_insn can be done for GCC 15). > > For Jakub's reduced testcase: > > struct S { float a, b, c, d; }; > int bar (struct S x, struct S y) { > return x.b <= y.d && x.c >= y.a; > } > > On x86_64-pc-linux-gnu with -O2 gcc currently generates: > > bar:movq%xmm2, %rdx > movq%xmm3, %rax > movq%xmm0, %rsi > xchgq %rdx, %rax > movq%rsi, %rcx > movq%rax, %rsi > movq%rdx, %rax > shrq$32, %rcx > shrq$32, %rax > movd%ecx, %xmm4 > movd%eax, %xmm0 > comiss %xmm4, %xmm0 > jb .L6 > movd%esi, %xmm0 > xorl%eax, %eax > comiss %xmm0, %xmm1 > setnb %al > ret > .L6:xorl%eax, %eax > ret > > with this simple patch to fwprop, we now generate: > > bar:shufps $85, %xmm0, %xmm0 > shufps $85, %xmm3, %xmm3 > comiss %xmm0, %xmm3 > jb .L6 > xorl%eax, %eax > comiss %xmm2, %xmm1 > setnb %al > ret > .L6:xorl%eax, %eax > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Additionally, it also resolves the FAIL for > gcc.target/i386/pr82580.c. Ok for mainline? Thanks for the detailed explanation. I think this deserves a comment about the later costing before - if (!prop.profitable_p ()) + if (!prop.profitable_p () + && (prop.changed_mem_p () + || use_insn->is_asm () + || !single_set (use_rtl))) (and generally the profitable_p method should better be called sth like likely_profitable_p?) I'd say OK with the additional comment added but please give Richard S. a chance to comment. Thanks, Richard. > > 2024-01-16 Roger Sayle > > gcc/ChangeLog > PR rtl-optimization/111267 > * fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when > !prop.profitable_p for instructions that are not single sets. > When comparing costs, bail-out if the cost is unchanged and > !prop.profitable_p. > > gcc/testsuite/ChangeLog > PR rtl-optimization/111267 > * gcc.target/