Re: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-21 Thread juzhe.zh...@rivai.ai
OK I guess change register_operand into REG_P should fix those ICEs.

I will have a try and send a patch. 

Thanks.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2024-01-22 15:14
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; kito.cheng; jeffreyalaw; vineetg; Robin Dapp; 
palmer; Patrick O'Neill
Subject: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:
 
Note this is related to the fix for PR 109092.
 
But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.
 
What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...
 
Thanks,
Andrew Pinski
 
>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (test fo

Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-21 Thread Andrew Pinski
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:

Note this is related to the fix for PR 109092.

But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.

What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...

Thanks,
Andrew Pinski

>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at 
> jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in 
> reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)
>
> 
> juzhe.zh...@rivai.ai


Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-16 Thread Richard Biener
On Tue, Jan 16, 2024 at 2:13 AM Roger Sayle  wrote:
>
>
> This patch resolves PR rtl-optimization/111267 by improving RTL-level
> forward propagation.  This x86_64 code quality regression was caused
> (exposed) by my changes to improve how x86's (TImode) argument passing
> is represented at the RTL-level (reducing the use of SUBREGs to catch
> more optimization opportunities in combine).  The pitfall is that the
> more complex RTL representations expose a limitation in RTL's fwprop
> pass.
>
> At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
> be summarized as three steps.  Step 1 is a heuristic that rejects the
> propagation attempt if the expression is too complex, step 2 calls
> the backend's recog to see if the propagated/simplified instruction
> is recognizable/valid, and step 3 then calls src_cost to compare
> the rtx costs of the replacement vs. the original, and accepts the
> transformation if the final cost is the same of better.
>
> The logic error (or missed optimization opportunity) is that the
> step 1 heuristic that attempts to predict (second guess) the
> process is flawed.  Ultimately the decision on whether to fwprop
> or not should depend solely on actual improvement, as measured
> by RTX costs.  Hence the prototype fix in the bugzilla PR removes
> the heuristic of calling prop.profitable_p entirely, relying
> entirely on the cost comparison in step 3.
>
> Unfortunately, things are a tiny bit more complicated.  The cost
> comparison in fwprop uses the older set_src_cost API and not the
> newer (preffered) insn_cost API as currently used in combine.
> This means that the cost improvement comparisons are only done
> for single_set instructions (more complex PARALLELs etc. aren't
> supported).  Hence we can only rely on skipping step 1 for that
> subset of instructions actually evaluated by step 3.
>
> The other subtlety is that to avoid potential infinite loops
> in fwprop we should only reply purely on rtx costs when the
> transformation is obviously an improvement.  If the replacement
> has the same cost as the original, we can use the prop.profitable_p
> test to preserve the current behavior.
>
> Finally, to answer Richard Biener's remaining question about this
> approach: yes, there is an asymmetry between how patterns are
> handled and how REG_EQUAL notes are handled.  For example, at
> the moment propagation into notes doesn't use rtx costs at all,
> and ultimately when fwprop is updated to use insn_cost, this
> (and recog) obviously isn't applicable to notes.  There's no reason
> the logic need be identical between patterns and notes, and during
> stage4 we only need update propagation into patterns to fix this
> P1 regression (notes and use of cost_insn can be done for GCC 15).
>
> For Jakub's reduced testcase:
>
> struct S { float a, b, c, d; };
> int bar (struct S x, struct S y) {
>   return x.b <= y.d && x.c >= y.a;
> }
>
> On x86_64-pc-linux-gnu with -O2 gcc currently generates:
>
> bar:movq%xmm2, %rdx
> movq%xmm3, %rax
> movq%xmm0, %rsi
> xchgq   %rdx, %rax
> movq%rsi, %rcx
> movq%rax, %rsi
> movq%rdx, %rax
> shrq$32, %rcx
> shrq$32, %rax
> movd%ecx, %xmm4
> movd%eax, %xmm0
> comiss  %xmm4, %xmm0
> jb  .L6
> movd%esi, %xmm0
> xorl%eax, %eax
> comiss  %xmm0, %xmm1
> setnb   %al
> ret
> .L6:xorl%eax, %eax
> ret
>
> with this simple patch to fwprop, we now generate:
>
> bar:shufps  $85, %xmm0, %xmm0
> shufps  $85, %xmm3, %xmm3
> comiss  %xmm0, %xmm3
> jb  .L6
> xorl%eax, %eax
> comiss  %xmm2, %xmm1
> setnb   %al
> ret
> .L6:xorl%eax, %eax
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Additionally, it also resolves the FAIL for
> gcc.target/i386/pr82580.c.  Ok for mainline?

Thanks for the detailed explanation.  I think this deserves a comment
about the later costing before

-  if (!prop.profitable_p ())
+  if (!prop.profitable_p ()
+  && (prop.changed_mem_p ()
+ || use_insn->is_asm ()
+ || !single_set (use_rtl)))

(and generally the profitable_p method should better be called
sth like likely_profitable_p?)

I'd say OK with the additional comment added but please give Richard S.
a chance to comment.

Thanks,
Richard.

>
> 2024-01-16  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/111267
> * fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when
> !prop.profitable_p for instructions that are not single sets.
> When comparing costs, bail-out if the cost is unchanged and
> !prop.profitable_p.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/111267
> *