[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target||i686 --- Comment #29 from Uroš Bizjak --- Fixed.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #28 from GCC Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:74e8cc28eda9b1d75588fcd4017a735911b9d2b4 commit r14-9346-g74e8cc28eda9b1d75588fcd4017a735911b9d2b4 Author: Uros Bizjak Date: Wed Mar 6 20:53:50 2024 +0100 i386: Fix and improve insn constraint for V2QI arithmetic/shift insns optimize_function_for_size_p predicate is not stable during optab selection, because it also depends on node->count/node->frequency of the current function, which are updated during IPA, so they may change between early opts and late opts. Use optimize_size instead - optimize_size implies optimize_function_for_size_p (cfun), so if a named pattern uses "&& optimize_size" and the insn it splits into uses optimize_function_for_size_p (cfun), it shouldn't fail. PR target/114232 gcc/ChangeLog: * config/i386/mmx.md (negv2qi2): Enable for optimize_size instead of optimize_function_for_size_p. Explictily enable for TARGET_SSE2. (negv2qi SSE reg splitter): Enable for TARGET_SSE2 only. (v2qi3): Enable for optimize_size instead of optimize_function_for_size_p. Explictily enable for TARGET_SSE2. (v2qi SSE reg splitter): Enable for TARGET_SSE2 only. (v2qi3): Enable for optimize_size instead of optimize_function_for_size_p.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Uroš Bizjak changed: What|Removed |Added Attachment #57614|0 |1 is obsolete|| --- Comment #27 from Uroš Bizjak --- Created attachment 57626 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57626=edit Proposed v2 patch New version in testing: - uses optimize_size to stabilize optab discovery - explicitly enables insn pattern for TARGET_SSE2
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #26 from Jan Hubicka --- > I think optimize_function_for_size_p (cfun) isn't always true if > optimize_size is since it looks at the function-specific setting > of that flag, so you'd have to use opt_for_fn (cfun, optimize_size). When we initialize obtabs, we do it for a givn optimization_node setting, so opt_for_fn (cfun, optimize_size) should be equal to optimize_size. Honza
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #25 from Uroš Bizjak --- Created attachment 57614 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57614=edit Proposed patch Proposed patch that changes optimize_function_for_size_p (cfun) to optimize_size.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #24 from Jakub Jelinek --- (In reply to rguent...@suse.de from comment #22) > I think optimize_function_for_size_p (cfun) isn't always true if > optimize_size is since it looks at the function-specific setting > of that flag, so you'd have to use opt_for_fn (cfun, optimize_size). I think the insn-flags.h macros and actual insn conditions are checked when the corresponding function is current, and at that point opt_for_fn (cfun, optimize_size) should be the same as optimize_size. Even the set_cfun hook first sets the option and then tries to initialize optab and compares it against the global one.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #23 from Uroš Bizjak --- (In reply to Jan Hubicka from comment #21) > Looking at the prototype patch, why need to change also the splitters? Purely for implementation reasons, we check for general resp. SSE register in the operand predicates, so there is no need to use additional insn constraints.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #22 from rguenther at suse dot de --- On Tue, 5 Mar 2024, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 > > --- Comment #17 from Jakub Jelinek --- > Either change those too, or the splitter needs some variant what to do if > there > is a mismatch. > Though, optimize_size implies optimize_function_for_size_p (cfun), so if a > named pattern uses && optimize_size and the insn it splits into uses > optimize_function_for_size_p (cfun), it shouldn't fail. The other direction > is > not always true, optimize_function_for_size_p (cfun) can be true just because > the function > is cold, not just because it is -Os. I think optimize_function_for_size_p (cfun) isn't always true if optimize_size is since it looks at the function-specific setting of that flag, so you'd have to use opt_for_fn (cfun, optimize_size).
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #21 from Jan Hubicka --- Looking at the prototype patch, why need to change also the splitters? My original goal was to use splitters to expand to faster code sequences while having patterns necessary for both variants. This makes it possible to use optimize_insn_for_size/speed and make decisions using BB profile, since we will not ICE if the hotness of BB changes later.
Re: [Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
Looking at the prototype patch, why need to change also the splitters? My original goal was to use splitters to expand to faster code sequences while having patterns necessary for both variants. This makes it possible to use optimize_insn_for_size/speed and make decisions using BB profile, since we will not ICE if the hotness of BB changes later.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #20 from Jakub Jelinek --- (In reply to Uroš Bizjak from comment #19) > (In reply to Jan Hubicka from comment #18) > > But the problem here is more that optab initializations happens only at > > the optimization_node changes and not if we switch from hot function to > > cold? > > I think solving optab init problem is a better solution than the target > patch from comment #10. Using optimize_function_for_size_p in named pattern > predicate would avoid using the non-optimal pattern also in cold functions, > and would be preferrable to using optimize_size. It would be very costly IMHO, because on every set_cfun we'd need to compute optimize_function_for_size_p for both the old and new function and find out where to attach the additional optab tables. set_cfun can be called millions of times.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #19 from Uroš Bizjak --- (In reply to Jan Hubicka from comment #18) > But the problem here is more that optab initializations happens only at > the optimization_node changes and not if we switch from hot function to > cold? I think solving optab init problem is a better solution than the target patch from comment #10. Using optimize_function_for_size_p in named pattern predicate would avoid using the non-optimal pattern also in cold functions, and would be preferrable to using optimize_size.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #18 from Jan Hubicka --- optimize_function_for_size_p is not really affected by LTO or non-LTO. It does take into account node->count and node->frequency, which is updated during IPA, so it may change between early opts and late opts. I do not think we change it between vectorizer and RTL (or during RTL) and we can add this to a verifier if it is practical to rely on it (I can see that vectorizer makes instruction selection choices). But the problem here is more that optab initializations happens only at the optimization_node changes and not if we switch from hot function to cold?
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #17 from Jakub Jelinek --- Either change those too, or the splitter needs some variant what to do if there is a mismatch. Though, optimize_size implies optimize_function_for_size_p (cfun), so if a named pattern uses && optimize_size and the insn it splits into uses optimize_function_for_size_p (cfun), it shouldn't fail. The other direction is not always true, optimize_function_for_size_p (cfun) can be true just because the function is cold, not just because it is -Os.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #16 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #15) > Seems various backends use e.g. optimize_size or !optimize_size or optimize > > 0 etc. in > insn-flags.h, so perhaps change optimize_function_for_size_p (cfun) to > optimize_size? Won't this cause a mismatch with insn patterns we split to? As said in Comment #13, these have insn predicates that include optimize_function_for_size_p.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #15 from Jakub Jelinek --- Yeah, indeed, the optabs enable flags are cached in the optimization node, so it is ok to check the optimization flags in there, or target flags as well, but optimize_function_for_*_p is not, because it depends also on node->count/node->frequency of the current function but multiple functions can have the same optimization node/target node combo. Seems i386 is the only backend which does something like that. I believe this has been fixed in the backend in the past already, see e.g. PR92791 While if (opts != optimization_default_node) { init_tree_optimization_optabs (opts); if (TREE_OPTIMIZATION_OPTABS (opts)) this_fn_optabs = (struct target_optabs *) TREE_OPTIMIZATION_OPTABS (opts); } in invoke_set_current_function_hook maybe (hopefully) makes stuff work right for the different options, that certainly doesn't account for node->function/node->count. Seems various backends use e.g. optimize_size or !optimize_size or optimize > 0 etc. in insn-flags.h, so perhaps change optimize_function_for_size_p (cfun) to optimize_size?
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #14 from rguenther at suse dot de --- On Tue, 5 Mar 2024, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 > > Jakub Jelinek changed: > >What|Removed |Added > > CC||jakub at gcc dot gnu.org > > --- Comment #12 from Jakub Jelinek --- > Still, it would be nice to understand what changed > optimize_function_for_size_p > (cfun) > after IPA. Is something adjusting node->count or node->frequency? > Otherwise it should just depend on the optimize_size flag which should not > change... Maybe we share the TREE optimization node (it gets re-materialized during LTO streaming) between hot and cold functions and the "first" one getting in set_cfun will overwrite TREE_OPTIMIZATION_OPTABS in init_tree_optimization_optabs (though it seems to overwrite things). In any case I think the tree node sharing only looks at options, not at function frequency so having TREE_OPTIMIZATION_OPTABS part of the optimization node looks a bit fragile in the light of sharing.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #13 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #12) > Still, it would be nice to understand what changed > optimize_function_for_size_p (cfun) > after IPA. Is something adjusting node->count or node->frequency? > Otherwise it should just depend on the optimize_size flag which should not > change... The target-dependent issue is with insn patterns we split to. These are enabled with "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)", as they are intended to be combined from movstrict insn pattern (which can FAIL). So, the condition for V2QI insn is now either !TARGET_PARTIAL_REG_STALL or TARGET_SSE2 (where we disable alternative 0 for TARGET_PARTIAL_REG_STALL targets, but we still emit SSE instruction). Please note that while -march=i686 enables TARGET_PARTIAL_REG_STALL, it enables SSE2, so the proposed solution won't have much impact. Also, V2QImode is not much used, so I guess the proposed solution is the right compromise.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #12 from Jakub Jelinek --- Still, it would be nice to understand what changed optimize_function_for_size_p (cfun) after IPA. Is something adjusting node->count or node->frequency? Otherwise it should just depend on the optimize_size flag which should not change...
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #11 from Richard Biener --- (In reply to Uroš Bizjak from comment #10) > Created attachment 57612 [details] > Prototype patch > > Let's try this approach. Yeah, I guess !TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun) is best elided (or also avoid the pattern when optimizing for size).
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2024-03-05 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #10 from Uroš Bizjak --- Created attachment 57612 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57612=edit Prototype patch Let's try this approach.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #9 from Uroš Bizjak --- (In reply to Richard Biener from comment #8) > > grep optimize_ insn-flags.h | wc -l > 14 > > so it's not very many standard patterns that would be affected. I'd say > using these kind of flags on standard patterns is at least fragile? You are right, only recently added neg, add, sub, ashl, lshr, ashr V2QImode standard patterns have optimize_function_for_size_p in their condition. Any suggestion on how to fix them?
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #8 from Richard Biener --- > grep optimize_ insn-flags.h | wc -l 14 so it's not very many standard patterns that would be affected. I'd say using these kind of flags on standard patterns is at least fragile?
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0 CC||hubicka at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #7 from Richard Biener --- I think the question is more whether it's stable between optab queries the vectorizer or veclower does and RTL expansion. There whether it's LTO or not shouldn't play a role (it might for the actual testcase of course).
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #6 from Richard Biener --- It's possibly on a cold path (yes, optimize_function_for_size_p should be stable). Note though that optimize_function_for_size_p might in theory change between vectorization and RTL expansion, so maybe optab queries in the vectorizer are broken by this if we ever re-check that condition after optab initialization.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #5 from Uroš Bizjak --- Huh, it looks that optimize_function_for_size_p (cfun) is not stable during LTO?! Using: --cut here-- diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 2856ae6ffef..80114494b0b 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2975,7 +2975,7 @@ (define_insn "v2qi3" (match_operand:V2QI 1 "register_operand" "0,0,Yw") (match_operand:V2QI 2 "register_operand" "Q,x,Yw"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)" + "!TARGET_PARTIAL_REG_STALL" "#" [(set_attr "isa" "*,sse2_noavx,avx") (set_attr "type" "multi,sseadd,sseadd") @@ -2987,7 +2987,7 @@ (define_split (match_operand:V2QI 1 "general_reg_operand") (match_operand:V2QI 2 "general_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + "!TARGET_PARTIAL_REG_STALL && reload_completed" [(parallel [(set (strict_low_part (match_dup 0)) @@ -3021,7 +3021,7 @@ (define_split (match_operand:V2QI 1 "sse_reg_operand") (match_operand:V2QI 2 "sse_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + "!TARGET_PARTIAL_REG_STALL && TARGET_SSE2 && reload_completed" [(set (match_dup 0) (plusminus:V16QI (match_dup 1) (match_dup 2)))] --cut here-- So, removing optimize-size bypass successfully compiles the testcase.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #4 from Uroš Bizjak --- (In reply to Sam James from comment #0) > (insn 160 159 161 26 (parallel [ > (set (reg:V2QI 250 [ vect_patt_207.470_183 ]) > (minus:V2QI (reg:V2QI 251) > (reg:V2QI 249 [ vect__4.468_451 ]))) > (clobber (reg:CC 17 flags)) > ]) This is the definition of the offending pattern in mmx.md: (define_insn "v2qi3" [(set (match_operand:V2QI 0 "register_operand" "=?Q,x,Yw") (plusminus:V2QI (match_operand:V2QI 1 "register_operand" "0,0,Yw") (match_operand:V2QI 2 "register_operand" "Q,x,Yw"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)" "#" [(set_attr "isa" "*,sse2_noavx,avx") (set_attr "type" "multi,sseadd,sseadd") (set_attr "mode" "QI,TI,TI")]) where -march=i686 (aka pentiumpro) implies TARGET_PARTIAL_REG_STALL, so it should be disabled unless optimizing for size. I wonder if optimize_function_for_size_p is stable during the LTO compilation, but we have plenty of usages like the above throughout x86 .md files and there were no problems reported. Another possibility is that the instruction RTX is emitted without checking of the pattern condition, the testcase is compiled with -O3, so optimize_function_for_size_p should also be false. I don't see anything wrong with the above pattern. The failure also happens very early into the RTL part of the compilation (vregs pass is the first pass that tries to recognize the pattern), so my bet is on middle-end emitting insn pattern without checking for pattern availability.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #3 from Sam James --- I am reducing it now but it's slow going.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #2 from Sam James --- Created attachment 57610 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57610=edit TraceStream.cc.ii.xz
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #1 from Sam James --- Created attachment 57609 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57609=edit Task.cc.ii.xz