[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #33 from Segher Boessenkool  ---
(In reply to Aldy Hernandez from comment #29)
> A minor rant, but why can't all this be set up automatically in the compile
> farm machines?

We have everything installed with the default for whatever distor (or similar
for non-Linux) is used.  There are newer tools etc. in /opt/cfarm/ sometimes.

On https://gcc.gnu.org/install/ there are installation instructions, for
configuring and buiding GCC, generic as well as per-configuration stuff.  There
is nothing specific about the cfarm here.  There is some info about Solaris, in
the GCC documentation.

> Keeping track of minor nuances of each architecture is
> distracting.  They should all be set up, whether by setting default paths in
> /etc/profile or whatever, or by having the relevant patches in GCC's source
> base, such that they work with src/configure && make.

A lovely utopian worldview, if you subscribe to the "everything should always
be the same" worldview, anyway.

[Bug driver/80182] accidently invoked `gcc -lm -o file.c` which deletes file.c

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80182

Segher Boessenkool  changed:

   What|Removed |Added

 CC||mkuvyrkov at gcc dot gnu.org

--- Comment #7 from Segher Boessenkool  ---
I bet Maxim can help you with this.  Maxim?

[Bug rtl-optimization/115092] [14/15 Regression] wrong code at -O1 with "-fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre -fno-guess-branch-probability" on x86_64-linux-gnu since r14-4810

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115092

--- Comment #11 from Segher Boessenkool  ---
Still okay :-)

[Bug rtl-optimization/115092] [14/15 Regression] wrong code at -O1 with "-fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre -fno-guess-branch-probability" on x86_64-linux-gnu since r14-4810

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115092

--- Comment #9 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #8)
> > Yeah, that look like it is missing some test.
> 
> I'd go with
> --- gcc/combine.cc.jj 2024-05-07 18:10:10.415874636 +0200
> +++ gcc/combine.cc2024-05-15 13:33:26.555081215 +0200
> @@ -11852,8 +11852,10 @@ simplify_compare_const (enum rtx_code co
>   `and'ed with that bit), we can replace this with a comparison
>   with zero.  */
>if (const_op
> -  && (code == EQ || code == NE || code == GE || code == GEU
> -   || code == LT || code == LTU)
> +  && (code == EQ || code == NE || code == GEU || code == LTU
> +   /* This optimization is incorrect for signed >= INT_MIN or
> +  < INT_MIN, those are always true or always false.  */
> +   || ((code == GE || code == LT) && const_op > 0))
>&& is_a  (mode, _mode)
>&& GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
>&& pow2p_hwi (const_op & GET_MODE_MASK (int_mode))

Pre-approved.  Thanks!

> Seems there is no canonical way to return this is always true or this is
> always false,
> sure, we could make up something like NE 1 0 or EQ 1 0 or similar, but it
> wouldn't likely match and the question is if it would simplify.

Later code will likely pick this up.  More likely than with the GE anyway :-)

> The const_op == -1 handling below this looks correct to me.

Yup.

> > That needs to be fixed of course, but independent of that, this should 
> > really
> > have been completely folded away earlier already?
> 
> It would if one wouldn't carefully disable tons of optimizations (say -O1,
> so no (significant) VRP, dom* disabled, fre disabled).

Ha :-)

[Bug rtl-optimization/115092] [14/15 Regression] wrong code at -O1 with "-fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre -fno-guess-branch-probability" on x86_64-linux-gnu since r14-4810

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115092

--- Comment #7 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #5)
> I think the bug is in simplify_comparison.
> We have there
> GE (sign_extract:SI (reg/v:SI 101 [ g ]) (const_int 1 [0x1]) (const_int 0
> [0])) (const_int -1 [0x])
> That is first changed into
> GE (ashiftrt:SI (ashift:SI (reg/v:SI 101 [ g ]) (const_int 31 [0x1f]))
> (const_int 31  [0x1f])) (const_int -1 [0x])
> Both are always true.
> But then the
>   /* FALLTHROUGH */
> case LSHIFTRT:
>   /* If we have (compare (xshiftrt FOO N) (const_int C)) and
>  the low order N bits of FOO are known to be zero, we can do this
>  by comparing FOO with C shifted left N bits so long as no
>  overflow occurs.  Even if the low order N bits of FOO aren't
> known
>  to be zero, if the comparison is >= or < we can use the same
>  optimization and for > or <= by setting all the low
>  order N bits in the comparison constant.  */
> optimization triggers and optimizes it into
> GE (ashift:SI (reg/v:SI 101 [ g ]) (const_int 31 [0x1f])) (const_int
> -2147483648 [0x8000])
> I think that is ok too.
> But then
> code = simplify_compare_const (code, raw_mode, , );
> simplifies that to NE and I think that step is wrong, because GE of anything
> >= INT_MIN
> is true.
> 
> So, I think
>   /* If we are comparing against a constant power of two and the value
>  being compared can only have that single bit nonzero (e.g., it was
>  `and'ed with that bit), we can replace this with a comparison
>  with zero.  */
>   if (const_op
>   && (code == EQ || code == NE || code == GE || code == GEU
>   || code == LT || code == LTU)
>   && is_a  (mode, _mode)
>   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
>   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
>   && (nonzero_bits (op0, int_mode)
>   == (unsigned HOST_WIDE_INT) (const_op & GET_MODE_MASK (int_mode
> {
>   code = (code == EQ || code == GE || code == GEU ? NE : EQ);
>   const_op = 0;
> }
> in simplify_compare_const is wrong if const_op is the most significant bit
> of int_mode.

Yeah, that look like it is missing some test.

That needs to be fixed of course, but independent of that, this should really
have been completely folded away earlier already?

[Bug rtl-optimization/115092] [14/15 Regression] wrong code at -O1 with "-fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre -fno-guess-branch-probability" on x86_64-linux-gnu since r14-4810

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115092

--- Comment #6 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #4)
> Indeed, combine_simplify_rtx on
> (set (reg:CCGC 17 flags)
> (compare:CCGC (sign_extract:SI (reg/v:SI 101 [ g ])
> (const_int 1 [0x1])
> (const_int 0 [0]))
> (const_int -1 [0x])))
> with VOIDmode, false, false remaining arguments is optimizing it to
> (set (reg:CCZ 17 flags)
> (compare:CCZ (zero_extract:SI (reg/v:SI 101 [ g ])
> (const_int 1 [0x1])
> (const_int 0 [0]))
> (const_int 0 [0])))
> which is ok if it would be used solely in equality/non-equality comparisons,
> but is not ok when it is used in other comparisons. 1-bit sign_extract has
> range [-1,0] and
> [-1,0] < -1 is always false.

It is some target code that decided what to do with the CCGC thing.  It decided
to use CCZ instead, which of course is wrong if other conditions are used (and
should ICE if you try to use it for non-equality actually).

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #22 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #21)
> I am not sure if powerpc vsx
> has &~ though.

VMX has vandc (since 1999), and VSX has xxlandc (since 2010).

In general, PowerPC has a full complement of logical ops, everywhere.  In some
cases it has the full truth table of the operation as part of the binary opcode
;-)

[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu

2024-05-14 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902

--- Comment #11 from Segher Boessenkool  ---
So, is there a simplified testcase that *actually* shows any *actual* problem?

[Bug analyzer/110014] -Wanalyzer-allocation-size mishandles realloc (..., .... * sizeof (object))

2024-05-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110014

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED
 CC||segher at gcc dot gnu.org

--- Comment #7 from Segher Boessenkool  ---
Reopened, then.

[Bug analyzer/109577] -Wanalyzer-allocation-size mishandles __builtin_mul_overflow

2024-05-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109577

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org
 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #10 from Segher Boessenkool  ---
Reopened, then.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #15 from Segher Boessenkool  ---
(In reply to Aldy Hernandez from comment #11)
> I have reverted the prange enabling patch until the IPA pass is fixed.

Thank you!

[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu

2024-05-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902

--- Comment #10 from Segher Boessenkool  ---
(_extract, btw.)

[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu

2024-05-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902

--- Comment #9 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #2)
> We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be
> right.

Why not?  We prefer zero_extend whenever it has the same result.

[Bug rtl-optimization/114996] [15 Regression] [RISC-V] 2->2 combination no longer occurring

2024-05-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114996

--- Comment #1 from Segher Boessenkool  ---
This is not a 2->2 combination.  It is a 1->1 combination, which we never have
done,
and still don't.  We incorrectly "combined" another instruction, which in fact
we
left in place, it isn't combined at all!

[Bug target/113652] [14/15 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-05-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #26 from Segher Boessenkool  ---
(In reply to Michael Meissner from comment #23)
> 1) Ignore it and say to the users don't do that.
> 
> 2) Prevent the IEEE 128-bit libgcc bits from being built on a BE or 32-bit
> LE system unless some configure switch is used.  Or just kick the can down
> the road, and don't provide a configure option in GCC 14, and if people are
> interested do it in GCC 15.
> 
> 3) Only build the IEEE 128-bit libgcc bits if the user configured the
> compiler with --with-cpu=power7, --with-cpu=power8, --with-cpu=power9,
> --with-cpu=power10 (and in the future --with-cpu=power11 or
> --with-cpu=future).  This could be code that if __VSX__ is not defined, the
> libgcc support functions won't get built.  We would then remove the -mvsx
> option from the library support functions.

4) Build those pieces of libgcc with the flags needed for that piece of code.
Like, just as is required for all other code.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-05-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #66 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #64)
> As promised I'm going to revert the revert after 14.1 is released 
> (hopefully tomorrow).

Thank you!  beer++

> As for distros I have decided to include my
> hack posted in 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648725.html
> for SUSE based distros in GCC 13 and 14 as that seems to improve
> the problematical memory uses in our build farm.

I think this patch may well show some actual regressions :-(  We'll see.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-05-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #63 from Segher Boessenkool  ---
(In reply to Sarah Julia Kriesch from comment #62)
> (In reply to Segher Boessenkool from comment #61)
> > (In reply to Sarah Julia Kriesch from comment #60)
> > > I have to agree with Richard. This problem has been serious for a long 
> > > time
> > > but has been ignored by IBM based on distribution choices.
> > 
> > What?  What does IBM have to do with this?  Yes, they are my employer, but
> > what I decide is best for combine to do is not influenced by them *at all*
> > (except some times they want me to spend time doing paid work, distracting
> > me from things that really matter, like combine!)
> > 
> Then, tell other reasons why my requests in the openSUSE bug report have
> been rejected in the past, and this bug report has been open for 3 years.
> Perhaps it is helpful to know that IBM has fixed memory issues in PostgreSQL
> (for openSUSE/upstream) with higher quality via my request with the support
> for Red Hat (and faster).

Once again, I have no idea what you are talking about.  It sounds like some
complot theory?  Exciting!

I really have no idea what you are talking about.  I recognise some of the
words, but not enough to give me a handle on what you are on about.

> > > Anyway, we want to live within the open source community without any Linux
> > > distribution priorities (especially in upstream projects like here).
> > 
> > No idea what that means either.
> > 
> There is a reason for founding the Linux Distributions Working Group at the
> Open Mainframe Project (equality for all Linux Distributions on s390x).
> SUSE, Red Hat and Canonical have been supporting this idea also (especially
> based on my own work experience at IBM and the priorities inside).

And here I don't have any context either.

> > > Segher, can you specify the failed test cases? Then, it should be possible
> > > to reproduce and improve that all. In such a collaborative way, we can 
> > > also
> > > achieve a solution.
> > 
> > What failed test cases?  You completely lost me.
> > 
> This one:
> (In reply to Segher Boessenkool from comment #57)
> > (In reply to Richard Biener from comment #56)
> > PR101523 is a very serious problem, way way way more "P1" than any of the
> > "my target was inconvenienced by some bad testcases failing now" "P1"s there
> > are now.  Please undo this!

They are in this PR.  "See Also", top right corner in the headings.

> (In reply to Segher Boessenkool from comment #61)
> > We used to do the wrong thing in combine.  Now that my fix was reverted, we
> > still do.  This should be undone soonish, so that I can commit an actual
> > UNCSE
> > implementation, which fixes all "regressions" (quotes, because they are 
> > not!)
> > caused by my previous patch, and does a lot more too.  It also will allow us
> > to remove a bunch of other code from combine, speeding up things a lot more
> > (things that keep a copy of a set if the dest is used more than once).  
> > There
> > has been talk of doing an UNCSE for over twenty years now, so annoying me
> > enough to get this done is a good result of this whole thing :-)
> Your fixes should also work with upstream code and the used gcc versions in
> our/all Linux distributions. I recommend applying tests and merging your
> fixes to at least one gcc version.

Lol.  No.  Distributions have to sort out their own problems.  I don't have
a copy of an old version of most distros even; I haven't *heard* about the
*existence* of most distros!

I don't use a Linux distro on any of my own machines.  And I care about some
other OSes at least as much, btw.  And not just because my employer cares about
some of those.

> If you want to watch something about our reasons for creating a
> collaboration between Linux distributions (and upstream projects), you
> should watch my first presentation "Collaboration instead of Competition":
> https://av.tib.eu/media/57010
> 
> Hint: The IBM statement came from my former IBM Manager (now your CPO).

CPO?  What is a CPO?  I don't think I have any?  I do have an R2 somewhere,
does that help?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-05-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #61 from Segher Boessenkool  ---
(In reply to Sarah Julia Kriesch from comment #60)
> I have to agree with Richard. This problem has been serious for a long time
> but has been ignored by IBM based on distribution choices.

What?  What does IBM have to do with this?  Yes, they are my employer, but
what I decide is best for combine to do is not influenced by them *at all*
(except some times they want me to spend time doing paid work, distracting
me from things that really matter, like combine!)

> Anyway, we want to live within the open source community without any Linux
> distribution priorities (especially in upstream projects like here).

No idea what that means either.

> Segher, can you specify the failed test cases? Then, it should be possible
> to reproduce and improve that all. In such a collaborative way, we can also
> achieve a solution.

What failed test cases?  You completely lost me.

We used to do the wrong thing in combine.  Now that my fix was reverted, we
still do.  This should be undone soonish, so that I can commit an actual UNCSE
implementation, which fixes all "regressions" (quotes, because they are not!)
caused by my previous patch, and does a lot more too.  It also will allow us
to remove a bunch of other code from combine, speeding up things a lot more
(things that keep a copy of a set if the dest is used more than once).  There
has been talk of doing an UNCSE for over twenty years now, so annoying me
enough to get this done is a good result of this whole thing :-)

[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu

2024-05-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902

--- Comment #6 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #2)
> Looks like the issue is during combine.
> 
> We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be
> right.

Why is that not correct?  zero_extend is preferred over sign_extend, and both
are equivalent when only checking for zero.

Is there something wrong in target code here, perhaps?

[Bug rtl-optimization/114768] Volatile reads can be optimized away

2024-04-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114768

--- Comment #6 from Segher Boessenkool  ---
Heh, crossed :-)  I can confirm my patch works (tested and everything).  I have
no idea about zero_extract, which is a blight that should be eradicated tooth
and
nail!

[Bug rtl-optimization/114768] Volatile reads can be optimized away

2024-04-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114768

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #5 from Segher Boessenkool  ---
Created attachment 57984
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57984=edit
patch

[Bug target/114759] Power: multiple issues with -mrop-protect

2024-04-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114759

--- Comment #2 from Segher Boessenkool  ---
> 1. We always define the __ROP_PROTECT__ predefined macro when using 
> -mrop-protect, even when we've silently disabled ROP protection because of a 
> too old -mcpu=CPU value.  We should only emit __ROP_PROTECT__ when it's legal 
> to emit the ROP insns.

No.  Whenever the -mrop-protect option is in effect, we should do that
predefine.

If you want to refuse the option without a -mcpu= that can generate useful code
for it, that's fine, but that is not what we do.  Instead, we generate code
that
will do the ROP-protection boogaloo on CPUs that implement support for that,
and
does nothing otherwise.

> 2.  We always disable shrink-wrapping when -mrop-protect is used, [...]

Yes, this is problematic, and seems to be completely unnecessary.  When using
SWS
at least -- but then we need to define a component for doing the ROP-protection
thing, of course.  After all, it has to be done before anything else in the
function.
By exactly the same argument we should *also* do ROP-protection in all leaf
functions, btw!

> 3.  We silently disable ROP protection for everything other than 
> -mcpu=power10.  The binutils assembler accepts the ROP insns back to Power8, 
> so we should emit them for Power8 and later.

The ISA claims it will work for anything after ISA 2.04, even.

> 4.  We give an error when -mrop-protect is used with any -mabi=ABI value not 
> equal to ELFv2, [...]

Yes, we should make it work everywhere.  Even on -m32.  But it requires
adjusting
the ABI as well!

2) should be fixed, and 4) should be fixed by actually implementing it
everywhere!

[Bug rtl-optimization/96865] ICE in hash_rtx_cb, at cse.c:2548

2024-04-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96865

--- Comment #4 from Segher Boessenkool  ---
Well, I wanted to add Alex as well, but BZ does not allow that?  Says he does
not exist?

Is there some other mail address than that mentioned in MAINTAINERS, the one he
usually uses, that works, maybe @gcc.gnu.org?

[Bug rtl-optimization/96865] ICE in hash_rtx_cb, at cse.c:2548

2024-04-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96865

Segher Boessenkool  changed:

   What|Removed |Added

 CC||abel at ispras dot ru

--- Comment #3 from Segher Boessenkool  ---
Yup.  I thought there would be missing options needed for this to fail (-mcpu=
for example), but it fails with plain trunk.

Something with sel-sched.  It works fine without that.

Putting the maintainers of selective scheduling on Cc:.

[Bug target/114732] ge can't be reversed to unlt for bcd compares

2024-04-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114732

--- Comment #5 from Segher Boessenkool  ---
(Or, at-most-one-hot, that is!)

[Bug target/114732] ge can't be reversed to unlt for bcd compares

2024-04-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114732

--- Comment #4 from Segher Boessenkool  ---
(In reply to Segher Boessenkool from comment #3)
> -- Bit 0 is all-true, bit 2 is all-false, like in the vcmp* insns.

(And bits 1 and 3 are set to zeroes for those insns.  So it is all one-hot
there
as well.  But the meaning is different.)

[Bug target/114732] ge can't be reversed to unlt for bcd compares

2024-04-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114732

--- Comment #3 from Segher Boessenkool  ---
1001, 0101, 0011 I mean of course.

In some ways CCmode models this better than CCFPmode, but we do not actually
model
the SO bit (bit 3) at all in CCmode.  It is a nice feature of CCmode (that we
actually use as fundamental, in the backend code) that CCmode always has
exactly
one of three bits "hot" (and CCFPmode always one of four).  Bit 3 (SO) in
CCmode
is treated as not being part of the CC really, but an extra thing.  This
doesn't
work all that well of course.

So we really need st least three CC modes:

-- Exactly one of bits 0..3 hot, like CCFPmode;
-- Exactly one of bits 0..2 hot, bit 3 independently set, like CCmode (and
   that independent bit 3 modeled nicely as well, unlike what we have), and
   also like in the BCD insns;
-- Bit 0 is all-true, bit 2 is all-false, like in the vcmp* insns.

Do we need some other CC mode as well?  Doe we want separately named CC modes
for the different variants of this (like the integer CC mode vs. the BCD one)?
We already have a separate CCUNSmode which is exactly like CCmode, as far as
the hardware cares, but the meaning is different (for CCUNS the LT and GT bits
are set based on an unsigned integer compare, not a signed one).  There also
is CCEQmode, which has only bit 2 valid (we use it for constructing one CR bit
from others, like with cror or crnot).

[Bug target/114732] ge can't be reversed to unlt for bcd compares

2024-04-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114732

--- Comment #2 from Segher Boessenkool  ---
The fourth CR bit for BCD insns does not mean "unordered", it means "invalid or
overflow".  It behaves exactly as unordered though, except that it can signal
overflow as well as one of the lt gt eq conditions at the same time (so 1100,
1010, 1001 are valid bit settings from it).  This part makes CCFP not ideal,
but the codegen as shown here is correct nevertheless.

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-04-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #12 from Segher Boessenkool  ---
You cannot use a :CC value as argument of an unspec, as explained before.

The result of a comparison is expressed as a comparison, in RTL.  This patch
allows malformed RTL in more places than before, not progress at all.

[Bug rtl-optimization/112560] [14 Regression] ICE in try_combine on pr112494.c

2024-04-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #10 from Segher Boessenkool  ---
It is still wrong.  You're trying to sweep tour wrong assumptions under the
rug,
but they will only rear up elsewhere.  Just fix the actual *target* problem!

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-04-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #57 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #56)
> The fix was reverted but will be re-instantiated for GCC 15 by me.

And I still protest.

PR101523 is a very serious problem, way way way more "P1" than any of the
"my target was inconvenienced by some bad testcases failing now" "P1"s there
are now.  Please undo this!

[Bug rtl-optimization/114664] -fno-omit-frame-pointer causes an ICE during the build of the greenlet package

2024-04-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114664

--- Comment #16 from Segher Boessenkool  ---
Yup, GPR31 is used for the emulated frame pointer, so this is user error:
saying
a fixed-purpose register is clobbered makes no sense.  You are not allowed to
use any register that the compiler uses for function calling any other way.

It is a bad idea to use -fno-omit-frame-pointer on Power, btw, just like on any
other architecture that does not have a frame pointer architecturally.  We
don't
have one in any ABI even (for C at least, who knows what other languages (or
language implementations) do).  It is a grudge for targets that do not generate
proper DWARF, or you can use it to quickly validate some (suspected!) bug is
not
in certain parts of the compiler; but we do not want to ever use it by default,
esp. because it is so costly.  Benchmark it, and quickly be dissuaded from even
thinking about ever doing this.  Progress is Good.

[Bug target/114004] GCC emits a superfluous instruction for simple test case on ppc

2024-04-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114004

--- Comment #2 from Segher Boessenkool  ---
So, the rlwinm keeps all the top 32 bits intact, but those are all zero to
begin
with.  Somehow we don't see that, or don't take that into account anyway.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-04-05 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #54 from Segher Boessenkool  ---
Propose a patch, then?  With justification.  It should also work for 10x
bigger testcases.

[Bug testsuite/114518] [14 regression] gcc.target/powerpc/combine-2-2.c fails after r14-9692-g839bc42772ba7a

2024-03-29 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114518

Segher Boessenkool  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-29
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #3 from Segher Boessenkool  ---
Ah good.  Reproduced, confirmed, all that.  Thanks.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-03-29 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #21 from Segher Boessenkool  ---
(2.06, whoops)

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-03-29 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #20 from Segher Boessenkool  ---
(In reply to Michael Meissner from comment #19)
> When I wrote the VSX support many years ago, I intended that -mvsx enable
> all of ISA 2.06

But that is not how we do things, and it can never work predictably and
reliably.

If you want ISA 2.07 you have to use -mcpu=power8, or any other CPU that
implements that ISA level.  Using other options is counterintuitive, confusing,
and causes problems directly even.

> The reason is the kernel needs to
> be built without floating point support.

The kernel and many other things.  That's what -msoft-float, -mno-altivec, and
-mno-vsx are for.  Those options mean use no FPRs, no VRs, or no VSRs.  Nothing
more, nothing less.  

> If you say -mvsx, it should include the standard power7 integer instructions
> (-mpopcntd), power6 server instructions (i.e. -mhard-dfp, -mcmpb, -mrecip,
> -mpowerpc-gfxopt, and -mpowerpc-gpopt), etc.

No.  -mvsx means the compiler can use VSX things, is not prevented from it by
-mno-vsx.  There can be other reasons it can not use VSX, like, the targeted
CPU does not support VSX.

The option says absolutely nothing about any other instructions.

> VSX support assumes it can use lfiwax and lfiwzx.

Any CPU that supports VSX is ISA 2.07 at least, yes.

[Bug testsuite/114518] [14 regression] gcc.target/powerpc/combine-2-2.c fails after r14-9692-g839bc42772ba7a

2024-03-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114518

--- Comment #1 from Segher Boessenkool  ---
It fails with -m32 only for me?

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-03-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #2 from Segher Boessenkool  ---
The PR101523 fix makes sure we do not get the same I2 back, because that
violates algorithmic assumptions of combine.  Importantly, the way it was
things can be changed back time and time again, and that actually happened.
There is no "canonical form" in combine, it all depends on what little
piece of context is and is not considered what form combine prefers.  Things
can -- and DID -- oscillate.

So, what is happening here?  The "dup" here is really a "splat"?  Should the
backend have some extra define_insn or define_split, or maybe even a peephole?

[Bug target/70928] Load simple float constants via VSX operations on PowerPC

2024-03-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928

--- Comment #6 from Segher Boessenkool  ---
"All"...  not the non-finite denormals ;-)

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #52 from Segher Boessenkool  ---
Fixed.  (On trunk only, no backports planned, this goes back aaages).

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #51 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #46)
> Maybe combine already knows that it just "keeps i2" rather than replacing it?

It never does that.  Instead, it thinks it is making a new I2, but it ends up
to be exactly the same instruction.  This is not a good thing to do, combine
can change the whole thing back to the previous shape for example, when it
feels like it (combine does not make canonical forms ever!)

> When !newi2pat we seem to delete i2.  Anyway, somebody more familiar with
> combine should produce a good(TM) patch.

Yes, the most common combinations delete I2, they combine 2->1 or 3->1 or 4->1.
When this isn't possible combine tries to combine to two instructions, it has
various strategies for this: the backend can do it explicitly (via a
define_split),
or it can break apart the expression that was the src in the one set that was
the ->1 result, hoping that the two instructions it gets that way are valid
insns.  It tries only one way to do this, and it isn't very smart about it,
just very heuristic.

[Bug target/101523] Huge number of combine attempts

2024-03-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #39 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #37)
> Created attachment 57753 [details]
> quick attempt at a limit
> 
> So like this?

Hrm.  It should be possible to not have the same test 28 times.  Just at one
spot!

[Bug target/101523] Huge number of combine attempts

2024-03-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #38 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #36)
> > No, it definitely should be done.  As I showed back then, it costs less than
> > 1%
> > extra compile time on *any platform* on average, and it reduced code size by
> > 1%-2%
> > everywhere.
> > 
> > It also cannot get stuck, any combination is attempted only once, any
> > combination
> > that succeeds eats up a loglink.  It is finite, (almost) linear in fact.
> 
> So the slowness for the testcase comes from failed attempts.

Of course.  Most attempts do not succeed, there aren't instructions for most
"random" combinations of instructions feeding each other.  But combine blindly
tries everything, that is its strength!  It ends up finding many more thing
than
any recognition automaton does.

> > Something that is the real complaint here: it seems we do not GC often
> > enough,
> > only after processing a BB (or EBB)?  That adds up for artificial code like
> > this, sure.
> 
> For memory use if you know combine doesn't have "dangling" links to GC memory
> you can call ggc_collect at any point you like.  Or, when you create
> throw-away RTL, ggc_free it explicitly (yeah, that only frees the
> "toplevel").

A lot of it *is* toplevel (well, completely disconnected RTX), just
temporaries,
things we can just throw away.  At every try_combine call even, kinda.  There
might be some more RTX that needs some protection.  We'll see.

> > And the "param to give an upper limit to how many combination attempts are
> > done
> > (per BB)" offer is on the table still, too.  I don't think it would ever be
> > useful (if you want your code to compile faster just write better code!),
> > but :-)
> 
> Well, while you say the number of successful combinations is linear the
> number of combine attempts appearantly isn't

It is, and that is pretty easy to show even.  With retries it stays linear, but
with a hefty constant.  And on some targets (with more than three inputs for
some instructions, say) it can be a big constant anyway.

But linear is linear, and stays linear, for way too big code it is just as
acceptable as for "normal" code.  Just slow.  If you don't want the compiler to
take a long time compiling your way too big code, use -O0, or preferably do not
write insane code in the first place :-)


> (well, of course, if we ever
> combine from multi-use defs).  So yeah, a param might be useful here but
> instead of some constant limit on the number of combine attempts per
> function or per BB it might make sense to instead limit it on the number
> of DEFs?

We still use loglinks in combine.  These are nice to prove that things stay
linear, even (every time combine succeeds a loglink is used up).

The number of loglinks and insns (insns combine can do anything with) differs
by a small constant factor.

> I understand we work on the uses

We work on the loglinks, a def-use pair if you want.

> so it'll be a bit hard to
> apply this in a way to, say, combine a DEF only with the N nearest uses
> (but not any ones farther out),

There is only a loglink from a def to the very next use.  If that combines, the
insn that does the def is retained as well, if there is any other use.  But
there never is a combination of a def with a later use tried, if the earliest
use does not combine.

> and maintaining such a count per DEF would
> cost.  So more practical might be to limit the number of attempts to combine
> into an (unchanged?) insns?
> 
> Basically I would hope with a hard limit in place we'd not stop after the
> first half of a BB leaving trivial combinations in the second half
> unhandled but instead somehow throttle the "expensive" cases?

Ideally we'll not do *any* artificial limitations.  But GCC just throws its hat
in the ring in other cases as well, say, too big RA problems.  You do get not
as high quality code as wanted, but at least you get something compiled in
an acceptable timeframe :-)

[Bug target/101523] Huge number of combine attempts

2024-03-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #35 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #34)
> The change itself looks reasonable given costs, though maybe 2 -> 2
> combinations should not trigger when the cost remains the same?  In
> this case it definitely doesn't look profitable, does it?  Well,
> in theory it might hide latency and the 2nd instruction can issue
> at the same time as the first.

No, it definitely should be done.  As I showed back then, it costs less than 1%
extra compile time on *any platform* on average, and it reduced code size by
1%-2%
everywhere.

It also cannot get stuck, any combination is attempted only once, any
combination
that succeeds eats up a loglink.  It is finite, (almost) linear in fact.

Any backend is free to say certain insns shouldn't combine at all.  This will
lead to reduced performance though.

- ~ - ~ -

Something that is the real complaint here: it seems we do not GC often enough,
only after processing a BB (or EBB)?  That adds up for artificial code like
this, sure.

And the "param to give an upper limit to how many combination attempts are done
(per BB)" offer is on the table still, too.  I don't think it would ever be
useful (if you want your code to compile faster just write better code!), but
:-)

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #31 from Segher Boessenkool  ---
I need a configure flag, hrm.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #29 from Segher Boessenkool  ---
I did manage to build one, but it does not know _Float64x and stuff.

Do you have a basic C-only testcase, maybe?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #28 from Segher Boessenkool  ---
For Q111: on rs6000:
;; Combiner totals: 53059 attempts, 52289 substitutions (7135 requiring new
space),
;; 229 successes.

I don't have C++ cross-compilers built (those are not trivial to do, hrm). 
I'll
try to build a s390x one.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #25 from Segher Boessenkool  ---
So this testcase compiles on powerpc64-linux (-O2) in about 34s.  Is s390x
way worse, or is this in lie what you are seeing?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #24 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #21)
> Wouldn't it in this particular case be possible to recognize already in
> try_combine that separating the move out of the parallel cannot lead to
> additional optimization opportunities? To me it looks like we are just
> recreating the situation we had before merging the INSNs into a parallel. Is
> there a situation where this could lead to any improvement in the end?

It might be possible.  It's not trivial at all though, esp. if you consider
other patterns, other targets, everything.

Anything that grossly reduces what we try will not fly.

This testcase is very degenerate, if we can recognise something about that
and make combine handle that better, that could be done.  Or I'll do my
proposed "do not try more than 40 billion things" patch.

As it is now, combine only ever reconsiders anything if it *did* make changes.
So, if you see it reconsidering things a lot, you also see it making a lot of
changes.  And all those changes make for materially better generated code (that
is tested by combine always, before making changes).

Changing things so combine makes fewer changes directly means you want it to
optimise less well.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #17 from Segher Boessenkool  ---
Why does this happen so extremely often for s390x customers?  It should from
first principles happen way more often for e.g. powerpc, but we never see such
big problems, let alone "all of the time"!

So what is really happening?  And, when did this start, anyway, because
apparently
at some point in time all was fine?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #16 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #14)
> If my analysis from comment #1 is correct, combine does superfluous steps
> here. Getting rid of this should not cause any harm, but should be
> beneficial for other targets as well. I agree that the patch I've proposed
> is kind of a hack. Do you think this could be turned into a proper fix?

When some insns have changed (or might have changed, combine does not always
know
the details), combinations of the insn with later insns are tried again. 
Sometimes
this finds new combination opportunities.

Not retrying combinations after one of the insns has changed would be a
regression.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #13 from Segher Boessenkool  ---
(In reply to Sarah Julia Kriesch from comment #12)
> I expect also, that this bug is a bigger case.

A bigger case of what?  What do you mean?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #11 from Segher Boessenkool  ---
Okay, so it is a function with a huge BB, so this is not a regression at all,
there will have been incredibly many combination attempts since the day combine
has existed.

[Bug target/101893] There is no vgbbd on p7

2024-03-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101893

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Segher Boessenkool  ---
Yeah.  You could have done that yourself :-)

The bug can always be reopened, if it turns out not to be fixed.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #9 from Segher Boessenkool  ---
Yeah.

Without a testcase we do not know what is going on.  Likely it is a testcase
with some very big basic block, which naturally gives very many combination
opportunities: the problem by nature is at least quadratic.  There are various
ways to limit the work done for this, all amounting to "just give up if the
problem is too big", just like we do in many other places.

It also is interesting to see when this started happening.  One of the external
PRs indicated this has happened for some years already -- so notably this is
not a regression -- but what change caused this then?  It can even be the 2-2
thing, if it started far enough back.  Or, the real reason why we need to know
when it started: possibly a bug was introduced.

In all cases, we need the testcase.

(The reason this does not happen on x86 is that so many things on x86 are
stored
in memory, and on less register-poor archs like 390 not.  Combine never does
dependencies via memory).

[Bug target/65010] ppc backend generates unnecessary signed extension

2024-02-29 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65010

--- Comment #13 from Segher Boessenkool  ---
(In reply to pthaugen from comment #11)
> Another example to clean up. The back to back constant load/sign extend
> sequence of rtl insns is created in each block by the block reordering pass
> (.bbo) duplicating the common return block.

(.bbro)

That pass is way late, so you do not get any of the more basic optimisations
after it :-(  We either need to move this pass earlier, or do this before
cprop3
and cse2 as well already, or somehow do a minimal version of all that basic
stuff
in bbro itself?

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-02-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #6 from Segher Boessenkool  ---
There is no attached testcase, btw.  This makes investigating this kind of
tricky ;-)

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-02-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #5 from Segher Boessenkool  ---
Hrm.  When did this start, can you tell?

[Bug target/113934] Switch avr to LRA

2024-02-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113934

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
LRA does not use LEGITIMIZE_RELOAD_ADDRESS.  The LRA code knows how to make all
addresses legal by itself.

[Bug c/89072] -Wall -Werror should be defaults

2024-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89072

--- Comment #13 from Segher Boessenkool  ---
I always have -Wmissing-declarations -Wformat=2 , for some reason those aren't
in
-Wall, not even in -W .  Crazy if you ask me :-)

[Bug c/89072] -Wall -Werror should be defaults

2024-01-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89072

--- Comment #11 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #9)
> It is not always wrong, it is a reasonable choice for some projects during
> their development, if they are willing to fix or work around all new
> warnings, even if they are false positives.

Sure.  If people want the pain, they can have it.  But it is never okay to
cause other people to have -Werror -- they may have a different compiler
(version) that no one else has tested with, they may have different warnings
enabled, etc.

> But enabling any kind of -Werror by default is always wrong.

Yup.

[Bug c/89072] -Wall -Werror should be defaults

2024-01-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89072

--- Comment #7 from Segher Boessenkool  ---
-Werror always is wrong, for any sane users.  Always.  Not just "in general".

[Bug c/89072] -Wall -Werror should be defaults

2024-01-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89072

--- Comment #4 from Segher Boessenkool  ---
(In reply to Xi Ruoyao from comment #3)
> In GCC 14 several warnings will be turned to errors by default with C99 or a
> newer C standard.  But generally -Werror should *never* be the default. 
> Besides the reasons Segher and Rich have already given, the standard also
> disallows the compiler from randomly rejecting code just because it "looks
> suspicious".

Huh?  Where does the standard require that?  The closest to it is 4/8: "An 
implementation shall be accompanied by a document that defines all
implementation-defined and locale-specific characteristics and all
extensions." which essentially *allows* such restrictions.  It just has to
be documented.

Of course a good implementation will not reject valid code without a very
good reason to do that.

[Bug target/113341] Using GCC as the bootstrap compiler breaks LLVM on 32-bit PowerPC

2024-01-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113341

--- Comment #5 from Segher Boessenkool  ---
(In reply to John Paul Adrian Glaubitz from comment #3)
> (In reply to Segher Boessenkool from comment #2)
> > We need a reduced testcase.
> 
> Any suggestion on how to proceed here?

Nothing in particular, no.  We need that for *any* bug though!

[Bug target/113341] Using GCC as the bootstrap compiler breaks LLVM on 32-bit PowerPC

2024-01-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113341

--- Comment #2 from Segher Boessenkool  ---
We need a reduced testcase.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #11 from Segher Boessenkool  ---
(In reply to David Brown from comment #8)
> As for using "=X" in the "opt == 3" case, I worry that that could lead to
> errors as the two assembly lines are independent.  The first says "put X
> anywhere", and the second - if it had "=X" - says "take X from anywhere". 
> These don't have to be the same "anywhere" unless the input and output are
> in the same statement - and if the compiler picked different anywheres, the
> code would not work.

This cannot lead to errors.  The compiler knows where "x" is (it put it there
itself!)

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #10 from Segher Boessenkool  ---
> But the dump from combine does not make sense:

What about this does not make sense to you?
> Failed to match this instruction:
and then still doing stuff?  That is normal.  I'll work on making that look
better / make more sense, it always irked me as well.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #9 from Segher Boessenkool  ---
(In reply to Alexander Monakov from comment #6)
> From the context given in the gcc-help thread, the goal is to place an
> optimization barrier in a sequence of floating-point calculation. "+r" is
> inappropriate for floats, as it usually incurs a reload from a
> floating-point register to a GPR and back, and there's no universal
> constraint for FP regs (e.g. on amd64 it is "+x" for SSE registers, but "+t"
> for long double (or x87 FPU on 32-bit x86)).

Yup.  "Some target-specific register constraint" :-)

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #5 from Segher Boessenkool  ---
Oh, and if the goal of the code is to put and keep the datum in a register, the
code should really use "+r" anyway!

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #4 from Segher Boessenkool  ---
Nothing has changed here.

opt == 2 and opt == 3 should use "=X", not "+X", btw.

combine is perfectly correct that "X" allows *any operand whatsoever*, also
those
that you cannot really use as an output.  Maybe we should not allow "X" for
output
operands at all?

The error happens during reloading btw.  Which makes sense, I would be much
more
surprised if the compiler figured out a way to move data into an addition :-)

[Bug target/113115] [14 Regression] ICE In extract_constrain_insn_cached recog.cc with ppc64le-linux-gnu crosscompiler from r14-3592-g9ea1248604d7b6

2024-01-08 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113115

--- Comment #6 from Segher Boessenkool  ---
Using -mpower9-vector while not having -mcpu=power9 (or later) is wrong, and
should
not work.  Using -mno-power9-vector is just weird.

If we can neuter the -mpower9-vector (etc.) options now, that would be good. 
But
there are some complications with the testsuite at least?

[Bug target/108208] Bad assembly? on large LLVM source files on powerpc-unknown-linux-gnu (Error: operand out of range)

2024-01-01 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108208

--- Comment #9 from Segher Boessenkool  ---
(In reply to John Paul Adrian Glaubitz from comment #8)
> (In reply to Segher Boessenkool from comment #7)
> > This PR is for the sysv ABI, while most discussion was about the "ELFv1" 
> > ABI.
> 
> Doesn't the subject clearly mention "powerpc-unknown-linux-gnu"?

Yes.  And most of the discussion (via the gentoo link) is about
powerpc64-linux.

So things were confusing, for me at least.

> I am trying -O3 now.

That almost always runs out of space easier, not less easily.  O3 prioritises
possible speed wins over anything else (*possible* speed wins).

> > If you get an error at line 577996 of a source file, changes are your code
> > is just
> > completely unreasonably large, esp. on a smaller target like this :-)
> 
> I understand. But it's not always possible to change the code size,
> especially when the code is not mine but some random upstream code.
> 
> What I don't understand is that other 32-bit architectures don't seem to be
> affected. For example, hppa-unknown-linux-gnu is not affected.

None of the HPPA ABIs are affected by peculiarities of a particular PowerPC
ABI.

The bug report does noty give enough information to see what is really going
on,
so I have no further advice.

[Bug target/108208] Bad assembly? on large LLVM source files on powerpc-unknown-linux-gnu (Error: operand out of range)

2024-01-01 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108208

--- Comment #7 from Segher Boessenkool  ---
This PR is for the sysv ABI, while most discussion was about the "ELFv1" ABI.

Only the 64-bit ABIs have the code model ABI, for the powerpc*-*-*
configurations.
Some other architectures have it for more things, and some for fewer, or even
none.

If you get an error at line 577996 of a source file, changes are your code is
just
completely unreasonably large, esp. on a smaller target like this :-)

[Bug target/108208] Bad assembly? on large LLVM source files on powerpc-unknown-linux-gnu (Error: operand out of range)

2024-01-01 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108208

--- Comment #4 from Segher Boessenkool  ---
See my previous comment?

You can either write better code, or use -mcmodel=large or similar, accepting
the not-so-stellar generated code you get then.

[Bug rtl-optimization/112758] [13/14 Regression] Inconsistent Bitwise AND Operation Result between int and long long int

2023-12-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112758

--- Comment #12 from Segher Boessenkool  ---
(In reply to Eric Botcazou from comment #11)
> > It says those upper bits are well-defined, i.e. whatever MD pattern is used
> > for it eventually will emit machine code that has the exact same result for
> > those upper bits.
> 
> No, that's not true, the set of "register operations" is restricted.

Who what where?  That is not how it is documented.  There is
word_register_operation_p as a bandaid to make it *somewhat* work, added
decades
later, but it still won't fly :-(

Different parts of the compiler think it has much more stringent semantics btw.

[Bug rtl-optimization/112758] [13/14 Regression] Inconsistent Bitwise AND Operation Result between int and long long int

2023-12-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112758

--- Comment #10 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #6)
> I must say I have no idea what WORD_REGISTER_OPERATION says about the upper
> bits of a paradoxical SUBREG if it is a MEM and load_extend_op (inner_mode)
> is ZERO_EXTEND (zeros then?  Then this optimization is ok), or something
> else?  And what it says on REGs.

It says those upper bits are well-defined, i.e. whatever MD pattern is used for
it eventually will emit machine code that has the exact same result for those
upper bits.  This is almost impossible to prove for any non-trivial target, and
certainly extremely fragile.

[Bug rtl-optimization/112758] [13/14 Regression] Inconsistent Bitwise AND Operation Result between int and long long int

2023-12-08 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112758

--- Comment #4 from Segher Boessenkool  ---
WORD_REGISTER_OPERATIONS is extremely ill-defined.  Or, it is used for other
things than what it stands for, whichever way you want to look at it.

A backend that defines the macro to non-zero promises that for *any* operation
on any values in a smaller than full-register mode, the compiler can instead
do the operation in that full-register mode, and all the resulting bits will
be well-defined.  This is not true for most real non-trivial backends.

There is word_register_operation_p to filter out the most obvious and egregious
cases where WORD_REGISTER_OPERATIONS is just a foolish thing, but this function
isn't used nearly enough, and it doesn't filter out enough either.

[Bug target/110606] ICE output_operand: '%&' used without any local dynamic TLS references on powerpc64le-linux-gnu

2023-12-05 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110606

--- Comment #8 from Segher Boessenkool  ---
What does "dead at sched2" mean?  Are they dead when sched2 starts, or made
dead
by it?  Well it must be the former; what pass does make it dead, then?  split3
apparently?  Why is this not done in split2 already, any good reason?

[Bug target/112707] [14 regression] gcc 14 outputs invalid assembly on ppc: Error: unrecognized opcode: `fctid'

2023-12-05 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112707

--- Comment #13 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #12)
> I'll note that you don't always
> get an assembler error, since gcc still passes -many to the assembler for
> non --enable-checking gcc builds, which causes it to accept the fctid insn.

Hrm.  Was that an oversight?  Should we always do that now?  Can you prepare a
patch (and test on some common configs) please?

[Bug target/110606] ICE output_operand: '%&' used without any local dynamic TLS references on powerpc64le-linux-gnu

2023-11-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110606

--- Comment #5 from Segher Boessenkool  ---
The insn that it fails on is the result from a split using *tls_ld .

[Bug target/110606] ICE output_operand: '%&' used without any local dynamic TLS references on powerpc64le-linux-gnu

2023-11-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110606

--- Comment #4 from Segher Boessenkool  ---
It needs   -O2 -fPIC -fno-exceptions   to fail.

[Bug target/112707] [14 regression] gcc 14 outputs invalid assembly on ppc: Error: unrecognized opcode: `fctid'

2023-11-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112707

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #8 from Segher Boessenkool  ---
Yeah, it tested for ISA 2.04 before.  That was an attempt at including 476
probably?

We really should have a TARGET_FCTID, on for TARGET_POWERPC64 or for cpu 476
(so
NOT user-selectable separately, of course!); not try to use pre-existing flags
for this, which might work but will forever stay confusing.

So either a separate OPTION_FCTID for in rs6000-cpus.def, or TARGET_FCTID. 
Either
works for me.

(Background: in ISA 1.xx it was for 64-bit implementations only.  But it does
not
need 64-bit registers or a 64-bit integer pipeline at all, it is an FP
instruction
that works on FP registers, which always are 64-bit.  The instruction was
implemented
on the 476).

[Bug target/112103] [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304

2023-11-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103

--- Comment #2 from Segher Boessenkool  ---
In all those cases the code is perfectly fine, but also in all of those cases
the
code is still suboptimal: the rldicl is just as superfluous as the second
rlwinm
was!  :-)

[Bug target/112103] [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304

2023-11-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103

--- Comment #1 from Segher Boessenkool  ---
Those are:

$ diff -up rlwinm-0.s{.12,}
--- rlwinm-0.s.12   2023-11-09 18:28:49.362639203 +
+++ rlwinm-0.s  2023-11-09 18:30:46.422896735 +
@@ -6747,7 +6747,7 @@ f_1_16_31:
 .LFB345:
.cfi_startproc
rlwinm 3,3,1,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -7645,7 +7645,7 @@ f_1_24_31:
 .LFB390:
.cfi_startproc
rlwinm 3,3,1,24,31
-   rlwinm 3,3,0,0xff
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -11235,7 +11235,7 @@ f_2_16_31:
 .LFB570:
.cfi_startproc
rlwinm 3,3,2,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -12133,7 +12133,7 @@ f_2_24_31:
 .LFB615:
.cfi_startproc
rlwinm 3,3,2,24,31
-   rlwinm 3,3,0,0xff
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -15722,7 +15722,7 @@ f_7_16_31:
 .LFB795:
.cfi_startproc
rlwinm 3,3,7,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -16620,7 +16620,7 @@ f_7_24_31:
 .LFB840:
.cfi_startproc
rlwinm 3,3,7,24,31
-   rlwinm 3,3,0,0xff
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -20207,7 +20207,7 @@ f_8_16_31:
 .LFB1020:
.cfi_startproc
rlwinm 3,3,8,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -24691,7 +24691,7 @@ f_9_16_31:
 .LFB1245:
.cfi_startproc
rlwinm 3,3,9,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -29174,7 +29174,7 @@ f_15_16_31:
 .LFB1470:
.cfi_startproc
rlwinm 3,3,15,16,31
-   rlwinm 3,3,0,0x
+   rldicl 3,3,0,32
blr
.long 0
.byte 0,0,0,0,0,0,0,0
@@ -67092,4 +67092,4 @@ f_31_31_31:
.cfi_endproc
 .LFE3375:
.size   f_31_31_31,.-.L.f_31_31_31
-   .ident  "GCC: (GNU) 12.0.1 20220406 (experimental)"
+   .ident  "GCC: (GNU) 14.0.0 20231103 (experimental)"

[Bug rtl-optimization/106594] [13/14 Regression] sign-extensions no longer merged into addressing mode

2023-10-30 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #27 from Segher Boessenkool  ---
(In reply to Roger Sayle from comment #21)
> Segher has proposed that object code size correlates with the quality of

It isn't a proposition, it is a simple and obvious fact.  But, this isn't
exactly
what I say :-)

Code size strongly correlates with number of instructions, almost 1-1 on most
targets.  Number of instructions is exactly what combine tries to reduce.

Whether that makes the code actually better is something completely separate as
well.  If your instruction cost function (and please use insn_cost, it is much
easier to use, and thus gives better results than rtx_costs) is good, this of
course should work fine.  And there is a hook (TARGET_LEGITIMATE_COMBINED_INSN)
for the very nasty cases.

But the whole "fewer insns that do the same thing, is better" thing is not
actually true on some targets.  Such targets are incredibly hard to optimise
for.  There is no way combine can do a good job for such targets.  It is
incredibly hard for human programmers to write good machine code for such
systems
by hand as well.

I do use object code size **of a huge sample** as a quick and dirty sniff test
to see if a change to combines is good or bad.  After that I always look at the
actual changes as well.  I do realise all pitfalls associated with this :-)

[Bug target/111367] Error: operand out of range (0x1391c is not between 0xffffffffffff8000 and 0x7fff)

2023-09-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111367

--- Comment #11 from Segher Boessenkool  ---
> > There really should be a comment why one alternative needs the %U{n} and the
> > other can
> > ignore it, btw.  Nothing new there, but a head-scratcher :-)
> 
> OK, something like: "prefixed load/store insns only have D-form but no
> update and X-form"?

Exactly.  Something short is plenty, but if there is nothing there it is
surprising.  Surprising is bad :-)

[Bug target/111367] Error: operand out of range (0x1391c is not between 0xffffffffffff8000 and 0x7fff)

2023-09-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111367

--- Comment #9 from Segher Boessenkool  ---
I don't like that "wzd" attribute at all.  Please just put an "if" for the mode
around this -- everywhere else (including in a large part of this patch!) we
deal with SImode and DImode separately already.  Or perhaps you can use the
"ptrload" attribute,
which includes the "l"?

There really should be a comment why one alternative needs the %U{n} and the
other can
ignore it, btw.  Nothing new there, but a head-scratcher :-)

[Bug c/66425] (void) cast doesn't suppress __attribute__((warn_unused_result))

2023-09-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425

--- Comment #60 from Segher Boessenkool  ---
(In reply to Roman Krotov from comment #59)
> All, what I'm asking for, is to make something like -Wno-void-unused, which
> would suppress the warnings only for the (void) casted calls.

So you want to not warn for some (just *some*) explicitly unused cases, and do
warn for other explicitly unused cases, and all implicitly unused cases?  While
the author of the code explicitly asked for a warning message to be emitted in
all such cases: "The 'warn_unused_result' attribute causes a warning to be
emitted if a caller of the function with this attribute does not use its return
value."

> This is desperately needed by the projects like systemd (see the first link
> in my first comment) as a less severe variant than -Wno-unused-result, so
> that they won't get punished with less diagnostics.

They (like EVERYONE ELSE IN THE WORLD) should not use -Werror, if they do not
like punishment.  Warnings are warnings.  The author of your code (the header
files for the library code) wanted everyone to be warned about not using the
return value from a certain function.  He/she was almost certainly right about
that.  And it is easy to suppress the warning in the few cases where you really
want to.

> I don't see any reason not to implement -Wno-void-unused with the similar
> description (stating that it's not recommended, if you want) to help the
> projects like systemd.

Define what it would do *exactly*, make a patch for it (including for the
documentation, amending all existing documentation as well), and do that in
such a way that it a) is correct, and b) makes any sense.  Then send the
patch to gcc-patches@.  If you do not want to do all that work (including the
very much non-trivial amount of follow-up work that will cause), then please
go away?  Don't tell us to do insane things that are an incredible amount of
work just because you had a bad idea and now want it to become reality.

> It won't change the meaning of the wur attribute, bacause it will be a
> non-default switch.

This makes no sense at all.

[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization

2023-07-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #13 from Segher Boessenkool  ---
So.  Before expand we have

  _6 = (__int128) x_3(D);
  x.0_1 = _6 << 59;
  _2 = x.0_1 >> 59;
  _4 = (__int128 unsigned) _2;
  return _4;

That should have been optimised better :-(

The RTL code it expands to sets the same pseudo multiple times.  Bad bad bad.
This hampers many optimisations.  Like:
(insn 6 3 7 2 (set (reg:DI 124)
(lshiftrt:DI (reg:DI 129 [ x+8 ])
(const_int 5 [0x5]))) "110717.c":6:11 299 {lshrdi3}
 (nil))
(insn 7 6 8 2 (set (reg:DI 132)
(ashift:DI (reg:DI 128 [ x ])
(const_int 59 [0x3b]))) "110717.c":6:11 289 {ashldi3}
 (nil))
(insn 8 7 9 2 (set (reg:DI 132)
(ior:DI (reg:DI 124)
(reg:DI 132))) "110717.c":6:11 233 {*booldi3}
 (nil))
(They are subregs right after expand, totally unreadable; this is after
subreg1, slightly more readable, but essentially the same code still).

The web pass eventually gets rid of the double set in this case.

Because the shift-left-then-right survives all the way to combine, it (being
the greedy bastard that it is) will use the combiner patterns rs6000 has for
multi-precision shifts, before it would notice the two (multiprecision!)
shifts together are largely a no-op, so you get stuck at a local optimum.
Pat for the course for combine :-/

[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization

2023-07-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #12 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #9)
> Wonder how many important targets provide double-word shift patterns vs.
> ones which expand it through generic code.

Very long ago rs6000 had special code for this.  That was sub-optimal in
other ways, and the generic code generated almost ideal code (sometimes an
extra data movement insn).

> powerpc probably could be improved:
> foo:
> srwi 9,4,5
> mr 10,9
> rlwimi 4,9,5,0,31-5
> rlwimi 10,3,27,0,31-27
> srawi 3,10,27
> blr

This is hugely worse than what we used to do, it seems?

GCC 8 did

srdi 9,4,5
rldimi 9,3,59,0
rldimi 4,9,5,0
sradi 3,9,59
blr

GCC 9 started with the unnecessary move.

But we should get only one insert insn in any case!

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #5 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #2)
> The
> 
>(insn 13 4 14 2 (set (reg:V2SF 20 xmm0 [orig:91 x2 ] [91])
> (vec_select:V2SF (reg:V4SF 20 xmm0 [94])
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> ]))) "t.c":10:12 4394 {sse_storelps}
>  (nil))
> 
> insns are gone in split after reload.

Insns 13 and 14 are deleted by split2, yes.  Although the very next insn
(15) obviously uses the regs (20 and 21) those insns set?!

[Bug target/106895] powerpc64 unable to specify even/odd register pairs in extended inline asm

2023-07-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

--- Comment #12 from Segher Boessenkool  ---
> I guess that would be annoying if you couldn't have modifiers on constraints

There is no such thing as "operand modifiers".  There are *output* modifiers:
they change how an operand is *printed*, they do not change the operand in any
way, shape, or form.

> or a bad algorithm for working them out. Fair enough.

No idea what you mean here?

> > > or why TI doesn't work but PTI apparently would,
> > 
> > Because this is exactly what PTImode is *for*!
> 
> Right I accept it is, I meant I just would not have been able to work it out
> (assuming if PTI was documented it would be "Partial Tetra Integer" and be
> no more useful than the other P?I type documentation.

For the rs6000 port, multi-register operands are not restricted to aligned
register numbers ("even/odd pairs").  (Some other ports do have this).  We use
the existing PTI mode for that (it also can be allocated in GPRs only, never in
VSRs, unlike TImode).

"Partial" does not have much meaning here.  A minority of ports use partial
integer words for what they were introduced for originally: modes that are
smaller than a full register, say, a 24-bit mode when registers are 32 bits.

We use it as another integer mode that is the same size.  It is unfortunate
that we still have to resort to such tricks.

[Bug target/106895] powerpc64 unable to specify even/odd register pairs in extended inline asm

2023-07-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

--- Comment #10 from Segher Boessenkool  ---
(In reply to Nicholas Piggin from comment #9)
> I don't know why constraint is wrong and mode is right

Simple: you would need O(2**T*N) constraints for our existing N register
constraints, together with T features like this.  But only O(2**T) modes at
most.

> or why TI doesn't work but PTI apparently would,

Because this is exactly what PTImode is *for*!

> but I'll take anything that works. Could we
> get PTI implemented? Does it need a new issue opened?

It was implemented in 2013.  The restriction to only even pairs was a bugfix,
also from 2013.

If you have code like

  typedef __int128 __attribute__((mode(PTI))) even;

you get an error like

  error: no data type for mode 'PTI'

This needs fixing.  You can keep it in this PR?

[Bug target/106895] powerpc64 unable to specify even/odd register pairs in extended inline asm

2023-07-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

--- Comment #8 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #6)
> (In reply to Segher Boessenkool from comment #5)
> > Constraints are completely the wrong tool for this.  Just use modes, which
> > *are* the right tool?
> 
> Well you cannot specify modes in the asm, so I think you're saying we need
> use the correct type that maps to a internal to GCC mode that has the
> even/odd register behavior, so something like:
> 
>   unsigned int foo __attribute__ ((mode (XX)));
> 
> ...where XXmode is the new integer mode that gives us even/odd register
> pairs?  Of course we have to be careful about how this all works wrt -m32
> versus -m64.

No, the type there is "unsigned int".  I meant to say exactly what I did say:
just use modes.  Which you indeed do in user code by the mode attribute, yes.

And you do not need a new mode: PTImode should just work.  But the user
specifying that is currently broken it seems?

Without -mpowerpc64 you cannot *have* 128-bit integers in registers.  That
should be
fixed, but you cannot have it in just *two* registers, which is what is
required
here.  For most targets that then means -m64 is required.

[Bug target/106895] powerpc64 unable to specify even/odd register pairs in extended inline asm

2023-07-02 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

--- Comment #5 from Segher Boessenkool  ---
Constraints are completely the wrong tool for this.  Just use modes, which
*are* the right tool?

[Bug target/78904] zero-extracts are not effective

2023-06-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78904

--- Comment #17 from Segher Boessenkool  ---
(In reply to Roger Sayle from comment #16)
> Just to warn people in advance, the test case pr78904-1b.c is expected to
> start FAILing with the commit of
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622079.html and is
> scheduled to be resolved 24-48 hours later (over the weekend) by
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622078.html
> As explained in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622083.html this is to
> investigate additional tweaks and whether alternate fixes are more
> appropriate.

Thanks for the warning Roger!  Much appreciated.

That fix is for x86 only though?  Is that really the only target affected?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #94 from Segher Boessenkool  ---
(In reply to Alexander Klepikov from comment #92)
> I remembered why I used two different insns - first to eliminate infinite
> loop with help of marking insn with attribute, and second because I could
> not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> functions but we have not 'set_attr_*'.

An attribute is part of the instruction *definition*, the define_insn; it isn't
a property you put on one instance of it.

[Bug testsuite/101002] Some powerpc tests fail with -mlong-double-64

2023-06-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101002

--- Comment #9 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #4)
> These die because the struct we're using to check the alignment of uses long
> double as the "big" aligned type.  We could either disable the tests using a
> "dg-require-effective-target longdouble128" or we could use a different more
> aligned type in the struct.  Maybe _Float128 or _Decimal128 or use an
> attribute aligned?   Thoughts?

Maybe just some vector type?  Those have 128-bit alignment even with
-mno-altivec,
right?

> gcc.target/powerpc/pr85657-3.c
> gcc.target/powerpc/signbit-1.c
> pr85657-3.c:38:20: error: unknown type name ‘__ibm128’; did you mean
> ‘__int128’?
> 
> These die because we don't create the type __ibm128 when using
> -mlong-double-64, which seems strange since we do create the __float128 type
> used in the test cases.
> 
> Mike, I assume the __ibm128 type should always be created?

It always should, yes.  Always.  Unconditionally.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #88 from Segher Boessenkool  ---
(In reply to Oleg Endo from comment #85)
> > +/* { dg-final { scan-assembler 
> > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:"
> >  { target { ! has_dyn_shift } } } }  */
> 
> Can you try to somehow write this in a simpler way?  Maybe omit some of the
> register number matches, as they don't matter etc.

Do not use double-quoted strings unless you need interpolation?  If you use {}
around the string you do not need to backslash-quote (and double-quote) so much
at all.

[0-9] is \d

whitespace is \s

See the Tcl re_syntax manual page :-)

  1   2   3   4   5   6   7   8   9   10   >