[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
 Target||i686

--- Comment #29 from Uroš Bizjak  ---
Fixed.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #28 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:74e8cc28eda9b1d75588fcd4017a735911b9d2b4

commit r14-9346-g74e8cc28eda9b1d75588fcd4017a735911b9d2b4
Author: Uros Bizjak 
Date:   Wed Mar 6 20:53:50 2024 +0100

i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

optimize_function_for_size_p predicate is not stable during optab
selection,
because it also depends on node->count/node->frequency of the current
function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for
TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explictily enable for
TARGET_SSE2.
(v2qi SSE reg splitter): Enable for TARGET_SSE2
only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

  Attachment #57614|0   |1
is obsolete||

--- Comment #27 from Uroš Bizjak  ---
Created attachment 57626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57626=edit
Proposed v2 patch

New version in testing:
  - uses optimize_size to stabilize optab discovery
  - explicitly enables insn pattern for TARGET_SSE2

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #26 from Jan Hubicka  ---
> I think optimize_function_for_size_p (cfun) isn't always true if
> optimize_size is since it looks at the function-specific setting
> of that flag, so you'd have to use opt_for_fn (cfun, optimize_size).

When we initialize obtabs, we do it for a givn optimization_node
setting, so opt_for_fn (cfun, optimize_size) should be equal to
optimize_size.

Honza

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #25 from Uroš Bizjak  ---
Created attachment 57614
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57614=edit
Proposed patch

Proposed patch that changes optimize_function_for_size_p (cfun) to
optimize_size.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #24 from Jakub Jelinek  ---
(In reply to rguent...@suse.de from comment #22)
> I think optimize_function_for_size_p (cfun) isn't always true if
> optimize_size is since it looks at the function-specific setting
> of that flag, so you'd have to use opt_for_fn (cfun, optimize_size).

I think the insn-flags.h macros and actual insn conditions are checked when the
corresponding function is current, and at that point opt_for_fn (cfun,
optimize_size) should be the same as optimize_size.  Even the set_cfun hook
first sets the option and then tries to initialize optab and compares it
against the global one.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #23 from Uroš Bizjak  ---
(In reply to Jan Hubicka from comment #21)
> Looking at the prototype patch, why need to change also the splitters?

Purely for implementation reasons, we check for general resp. SSE register in
the operand predicates, so there is no need to use additional insn constraints.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #22 from rguenther at suse dot de  ---
On Tue, 5 Mar 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232
> 
> --- Comment #17 from Jakub Jelinek  ---
> Either change those too, or the splitter needs some variant what to do if 
> there
> is a mismatch.
> Though, optimize_size implies optimize_function_for_size_p (cfun), so if a
> named pattern uses && optimize_size and the insn it splits into uses
> optimize_function_for_size_p (cfun), it shouldn't fail.  The other direction 
> is
> not always true, optimize_function_for_size_p (cfun) can be true just because
> the function
> is cold, not just because it is -Os.

I think optimize_function_for_size_p (cfun) isn't always true if
optimize_size is since it looks at the function-specific setting
of that flag, so you'd have to use opt_for_fn (cfun, optimize_size).

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #21 from Jan Hubicka  ---
Looking at the prototype patch, why need to change also the splitters?

My original goal was to use splitters to expand to faster code sequences
while having patterns necessary for both variants.  This makes it
possible to use optimize_insn_for_size/speed and make decisions using BB
profile, since we will not ICE if the hotness of BB changes later.

Re: [Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread Jan Hubicka via Gcc-bugs
Looking at the prototype patch, why need to change also the splitters?

My original goal was to use splitters to expand to faster code sequences
while having patterns necessary for both variants.  This makes it
possible to use optimize_insn_for_size/speed and make decisions using BB
profile, since we will not ICE if the hotness of BB changes later.


[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #20 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #19)
> (In reply to Jan Hubicka from comment #18)
> > But the problem here is more that optab initializations happens only at
> > the optimization_node changes and not if we switch from hot function to
> > cold?
> 
> I think solving optab init problem is a better solution than the target
> patch from comment #10. Using optimize_function_for_size_p in named pattern
> predicate would avoid using the non-optimal pattern also in cold functions,
> and would be preferrable to using optimize_size.

It would be very costly IMHO, because on every set_cfun we'd need to compute
optimize_function_for_size_p for both the old and new function and find out
where to attach the additional optab tables. set_cfun can be called millions of
times.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #19 from Uroš Bizjak  ---
(In reply to Jan Hubicka from comment #18)
> But the problem here is more that optab initializations happens only at
> the optimization_node changes and not if we switch from hot function to
> cold?

I think solving optab init problem is a better solution than the target patch
from comment #10. Using optimize_function_for_size_p in named pattern predicate
would avoid using the non-optimal pattern also in cold functions, and would be
preferrable to using optimize_size.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #18 from Jan Hubicka  ---
optimize_function_for_size_p is not really affected by LTO or non-LTO. 
It does take into account node->count and node->frequency, which is
updated during IPA, so it may change between early opts and late opts.

I do not think we change it between vectorizer and RTL (or during RTL)
and we can add this to a verifier if it is practical to rely on it (I
can see that vectorizer makes instruction selection choices).

But the problem here is more that optab initializations happens only at
the optimization_node changes and not if we switch from hot function to
cold?

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #17 from Jakub Jelinek  ---
Either change those too, or the splitter needs some variant what to do if there
is a mismatch.
Though, optimize_size implies optimize_function_for_size_p (cfun), so if a
named pattern uses && optimize_size and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.  The other direction is
not always true, optimize_function_for_size_p (cfun) can be true just because
the function
is cold, not just because it is -Os.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #16 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #15)

> Seems various backends use e.g. optimize_size or !optimize_size or optimize
> > 0 etc. in
> insn-flags.h, so perhaps change optimize_function_for_size_p (cfun) to
> optimize_size?

Won't this cause a mismatch with insn patterns we split to? As said in Comment
#13, these have insn predicates that include optimize_function_for_size_p.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #15 from Jakub Jelinek  ---
Yeah, indeed, the optabs enable flags are cached in the optimization node, so
it is ok
to check the optimization flags in there, or target flags as well, but
optimize_function_for_*_p is not, because it depends also on
node->count/node->frequency of the current function but multiple functions can
have the same optimization node/target node combo.
Seems i386 is the only backend which does something like that.
I believe this has been fixed in the backend in the past already, see e.g.
PR92791
While
  if (opts != optimization_default_node)
{
  init_tree_optimization_optabs (opts);
  if (TREE_OPTIMIZATION_OPTABS (opts))
this_fn_optabs = (struct target_optabs *)
  TREE_OPTIMIZATION_OPTABS (opts);
}
in invoke_set_current_function_hook maybe (hopefully) makes stuff work right
for the different options, that certainly doesn't account for
node->function/node->count.

Seems various backends use e.g. optimize_size or !optimize_size or optimize > 0
etc. in
insn-flags.h, so perhaps change optimize_function_for_size_p (cfun) to
optimize_size?

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #14 from rguenther at suse dot de  ---
On Tue, 5 Mar 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  CC||jakub at gcc dot gnu.org
> 
> --- Comment #12 from Jakub Jelinek  ---
> Still, it would be nice to understand what changed 
> optimize_function_for_size_p
> (cfun)
> after IPA.  Is something adjusting node->count or node->frequency?
> Otherwise it should just depend on the optimize_size flag which should not
> change...

Maybe we share the TREE optimization node (it gets re-materialized
during LTO streaming) between hot and cold functions and the "first"
one getting in set_cfun will overwrite TREE_OPTIMIZATION_OPTABS in
init_tree_optimization_optabs (though it seems to overwrite things).

In any case I think the tree node sharing only looks at options, not
at function frequency so having TREE_OPTIMIZATION_OPTABS part of
the optimization node looks a bit fragile in the light of sharing.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #13 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #12)
> Still, it would be nice to understand what changed
> optimize_function_for_size_p (cfun)
> after IPA.  Is something adjusting node->count or node->frequency?
> Otherwise it should just depend on the optimize_size flag which should not
> change...

The target-dependent issue is with insn patterns we split to. These are enabled
with "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)", as
they are intended to be combined from movstrict insn pattern (which can FAIL).

So, the condition for V2QI insn is now either !TARGET_PARTIAL_REG_STALL or
TARGET_SSE2 (where we disable alternative 0 for TARGET_PARTIAL_REG_STALL
targets, but we still emit SSE instruction).

Please note that while -march=i686 enables TARGET_PARTIAL_REG_STALL, it enables
SSE2, so the proposed solution won't have much impact. Also, V2QImode is not
much used, so I guess the proposed solution is the right compromise.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #12 from Jakub Jelinek  ---
Still, it would be nice to understand what changed optimize_function_for_size_p
(cfun)
after IPA.  Is something adjusting node->count or node->frequency?
Otherwise it should just depend on the optimize_size flag which should not
change...

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #11 from Richard Biener  ---
(In reply to Uroš Bizjak from comment #10)
> Created attachment 57612 [details]
> Prototype patch
> 
> Let's try this approach.

Yeah, I guess !TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)
is best elided (or also avoid the pattern when optimizing for size).

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Uroš Bizjak  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-05
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #10 from Uroš Bizjak  ---
Created attachment 57612
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57612=edit
Prototype patch

Let's try this approach.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #9 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #8)
> > grep optimize_ insn-flags.h | wc -l
> 14
> 
> so it's not very many standard patterns that would be affected.  I'd say
> using these kind of flags on standard patterns is at least fragile?

You are right, only recently added neg, add, sub, ashl, lshr, ashr V2QImode
standard patterns have optimize_function_for_size_p in their condition.

Any suggestion on how to fix them?

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #8 from Richard Biener  ---
> grep optimize_ insn-flags.h | wc -l
14

so it's not very many standard patterns that would be affected.  I'd say
using these kind of flags on standard patterns is at least fragile?

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener  ---
I think the question is more whether it's stable between optab queries the
vectorizer or veclower does and RTL expansion.  There whether it's LTO or not
shouldn't play a role (it might for the actual testcase of course).

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #6 from Richard Biener  ---
It's possibly on a cold path (yes, optimize_function_for_size_p should be
stable).  Note though that optimize_function_for_size_p might in theory
change between vectorization and RTL expansion, so maybe optab queries
in the vectorizer are broken by this if we ever re-check that condition
after optab initialization.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #5 from Uroš Bizjak  ---
Huh, it looks that optimize_function_for_size_p (cfun) is not stable during
LTO?!

Using:

--cut here--
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2856ae6ffef..80114494b0b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2975,7 +2975,7 @@ (define_insn "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL"
   "#"
   [(set_attr "isa" "*,sse2_noavx,avx")
(set_attr "type" "multi,sseadd,sseadd")
@@ -2987,7 +2987,7 @@ (define_split
  (match_operand:V2QI 1 "general_reg_operand")
  (match_operand:V2QI 2 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+  "!TARGET_PARTIAL_REG_STALL
&& reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
@@ -3021,7 +3021,7 @@ (define_split
  (match_operand:V2QI 1 "sse_reg_operand")
  (match_operand:V2QI 2 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+  "!TARGET_PARTIAL_REG_STALL
&& TARGET_SSE2 && reload_completed"
   [(set (match_dup 0)
 (plusminus:V16QI (match_dup 1) (match_dup 2)))]
--cut here--

So, removing optimize-size bypass successfully compiles the testcase.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #4 from Uroš Bizjak  ---
(In reply to Sam James from comment #0)

> (insn 160 159 161 26 (parallel [
> (set (reg:V2QI 250 [ vect_patt_207.470_183 ])
> (minus:V2QI (reg:V2QI 251)
> (reg:V2QI 249 [ vect__4.468_451 ])))
> (clobber (reg:CC 17 flags))
> ])

This is the definition of the offending pattern in mmx.md:

(define_insn "v2qi3"
  [(set (match_operand:V2QI 0 "register_operand" "=?Q,x,Yw")
(plusminus:V2QI
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
  "#"
  [(set_attr "isa" "*,sse2_noavx,avx")
   (set_attr "type" "multi,sseadd,sseadd")
   (set_attr "mode" "QI,TI,TI")])

where -march=i686 (aka pentiumpro) implies TARGET_PARTIAL_REG_STALL, so it
should be disabled unless optimizing for size. I wonder if
optimize_function_for_size_p is stable during the LTO compilation, but we have
plenty of usages like the above throughout x86  .md files and there were no
problems reported.

Another possibility is that the instruction RTX is emitted without checking of
the pattern condition, the testcase is compiled with -O3, so
optimize_function_for_size_p should also be false.

I don't see anything wrong with the above pattern. The failure also happens
very early into the RTL part of the compilation (vregs pass is the first pass
that tries to recognize the pattern), so my bet is on middle-end emitting insn
pattern without checking for pattern availability.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #3 from Sam James  ---
I am reducing it now but it's slow going.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #2 from Sam James  ---
Created attachment 57610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57610=edit
TraceStream.cc.ii.xz

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #1 from Sam James  ---
Created attachment 57609
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57609=edit
Task.cc.ii.xz