[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #20 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #18)
> (In reply to Segher Boessenkool from comment #16)
> > (In reply to Jakub Jelinek from comment #15)
> > > PowerPC I think does, not sure about s390.
> > 
> > Does what?
> 
> Have a public place to submit issues against the powerpc abis.

Only the ELFv2 ABI really (it's on github).  The rest doesn't have (public)
maintained documents at all.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2022-10-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #16 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #15)
> PowerPC I think does, not sure about s390.

Does what?

[Bug other/107353] [13 regression] Numerous ICEs after r13-3416-g1d561e1851c466

2022-10-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353

--- Comment #5 from Segher Boessenkool  ---
Please revert until it is fixed?  It breaks way too many targets.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #45 from Segher Boessenkool  ---
Yes, that is fine afaics.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #42 from Segher Boessenkool  ---
(In reply to H.J. Lu from comment #41)
> (In reply to Segher Boessenkool from comment #40)
> > Let me repeat: A const_int cannot be assigned to a MODE_CC.  It has no
> > meaning.
> > This is invalid RTL.  If it ever works, or worked, that is an accident.
> 
> Can we make it to work with a target hook? It will allow more backed
> optimizations.

No, you cannot.  A lot of generic code will not work with your special
re-interpretation of basic RTL rules.  Just write correct code in your
backend, it is not hard.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #40 from Segher Boessenkool  ---
Let me repeat: A const_int cannot be assigned to a MODE_CC.  It has no meaning.
This is invalid RTL.  If it ever works, or worked, that is an accident.

A MODE_CC stands for a comparison (in the mathematical sense).  Saying it is
"1"
would mean what?

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #38 from Segher Boessenkool  ---
You cannot put a const_int in a MODE_CC.  It is meaningless.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #33 from Segher Boessenkool  ---
(In reply to H.J. Lu from comment #32)
> > There is no actual comparison with 0, that is just notation.
> 
> True.  But simplify-rtx.cc simplifies
> 
> (ltu (reg 17) (const_int 0))
> 
> to false when reg 17 is set.

Is set?  What does that even mean?  Is set to what?

> Use the actual comparison isn't issue.  The issue is how
> 
> (ltu (reg 17) (const_int 0))
> 
> should be simplified when reg 17 is known to be set.

You need to look at the setter.  It cannot be simplified otherwise.

[Bug testsuite/107240] [13 Regression] FAIL: gcc.dg/vect/vect-bitfield-write-2.c since r13-3219-g25413fdb2ac249

2022-10-14 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107240

--- Comment #5 from Segher Boessenkool  ---
So perhaps this needs instructions new on P8 (which fleshed out the integer
support amongst other things, but that sounds relevant here?)  Test that with
  { powerpc*-*-* && has_arch_pwr8 }
or such?  But please make sure this is the reason first :-)

[Bug testsuite/107240] [13 Regression] FAIL: gcc.dg/vect/vect-bitfield-write-2.c since r13-3219-g25413fdb2ac249

2022-10-14 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107240

--- Comment #3 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #1)
> I guess the first question is, is it expected that the
> vect-bitfield-write-2.c loop should be vectorized on power7 which only has
> Altivec and not VSX?

P7 has VSX.  It is the first processor with VSX.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #30 from Segher Boessenkool  ---
(In reply to H.J. Lu from comment #26)
> LTU/GEU are only used to check FLAGS_REG against constant 0.

That is not what
  (ltu (reg 17) (const_int 0))
means though?

Together with a previous
  (set (reg 17) (compare A B))

it means the result of A <= B.

There is no actual comparison with 0, that is just notation.

> simplify_const_relational_operation has
> 
>  /* We can't simplify MODE_CC values since we don't know what the
>  actual comparison is.  */
>   if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
> return 0;

And combine *does* know how to find the actual comparison, in many cases.  Some
other passes can as well, there is no magic involved, you just need to look at
other insns as well (just one really, the CC setter).

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #29 from Segher Boessenkool  ---
(In reply to Hongtao.liu from comment #23)
> looking at i386.c put_condition_code used by *setcc_qi, it looks like (EQ
> (reg:CCCmode FLAG_REG) (const_int 0)) means get carry flag.
> Not (LTU: (REG:CCCmode FLAGS_REG) (const_int 0)).
> Now I got more confused.

(eq (reg:CCC 17) (const_int 0))  means that the comparison that did set reg 17
returned "equal".

CCCmode is just like CCmode, but *only* the carry flag is valid.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #28 from Segher Boessenkool  ---
> So the issue is with the consumer:
> 
> (insn 50 49 51 2 (parallel [
> (set (reg:SI 93)
> (neg:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]
> (clobber (reg:CC 17 flags))
> ]) "107172.c":4:10 1258 {*x86_movsicc_0_m1_neg}
>  (expr_list:REG_DEAD (reg:CCC 17 flags)
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil
> 
> There are many similar patterns in different backends.  They work as long as
> the flags register isn't a known constant since simplify-rtx.cc leaves them
> alone.  They become a problem only when the flags register is a known
> constant.

Such patterns are fine.  The problem is that this consumer of MODE_CC does not
fit together with the producer of that reg 17: it only has meaning together,
that
is how this stuff works; and it has no meaning at all like this.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #21 from Segher Boessenkool  ---
(In reply to Hongtao.liu from comment #19)
> (In reply to H.J. Lu from comment #18)
> > (In reply to Segher Boessenkool from comment #16)
> > > Hi Roger,
> > > 
> > > (In reply to Roger Sayle from comment #15)
> > > > Yes, a COMPARE rtx can be used to set various flags on x86, but many 
> > > > other
> > > > operations also legitimately set and/or use MODE_CC, often in a parallel
> > > > with the primary operation.
> > > 
> > > *Any* MODE_CC setter sets the flags as-if from a compare.  This is what
> > > MODE_CC *is*.
> > > 
> > > Setting something as ne:CC and then using it as somethingelse:CC has no
> > > defined meaning.
> > 
> > This
> > 
> > (parallel [
> > (set (reg:SI 97) 
> > (neg:SI (ltu:SI (reg:CCC 17 flags)
> > (const_int 0 [0]
> > (clobber (reg:CC 17 flags))
> > ])
> > 
> > still won't work correctly if reg:CCC 17 flags is set by a compare of
> > 2 known values.
> 
> I guess Segher means it should be NE instead of LTU in the
> x86_movcc_0_m1_neg, since the setters is NE to const 0.

Yes.

>  (ne:CCC (reg:SI 87 [ a_lsm.8 ])
> (const_int 0 [0])))
> 
>  (define_expand "x86_movcc_0_m1_neg"
>[(parallel
>  [(set (match_operand:SWI48 0 "register_operand")
> - (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
> + (neg:SWI48 (ne:SWI48 (reg:CCC FLAGS_REG) (const_int 0
>   (clobber (reg:CC FLAGS_REG))])])
> 
> It can pass the PR, but failed pr101617.c, the f1 case.
> 
> generate:
> testl   %edi, %edi
> movl$1, %edx
> movl$-1, %eax
> cmove   %edx, %eax
> 
> origin:
> negl%edi
> sbbl%eax, %eax
> orl $1, %eax

And this is why using a relation (e.g. ltu, an RTX_COMPARE) instead of a
compare (an RTX_BIN_ARITH) as setter cannot work.  The setter and the getter
are modified independently by very many parts of the compiler, and then
everything falls apart.

The only valid things on the RHS of a MODE_CC set are a reg, a compare, or
an unspec.  Everything else is undefined and problematical as well.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #16 from Segher Boessenkool  ---
Hi Roger,

(In reply to Roger Sayle from comment #15)
> Yes, a COMPARE rtx can be used to set various flags on x86, but many other
> operations also legitimately set and/or use MODE_CC, often in a parallel
> with the primary operation.

*Any* MODE_CC setter sets the flags as-if from a compare.  This is what
MODE_CC *is*.

Setting something as ne:CC and then using it as somethingelse:CC has no
defined meaning.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #14 from Segher Boessenkool  ---
(In reply to H.J. Lu from comment #13)
> (In reply to Segher Boessenkool from comment #12)
> > 
> > To determine the semantics of this piece of RTL you need to see the 
> > setter(s)
> > of reg 17 feeding this use.  In this case, the setter was
> >   (set (reg:CCC 17)
> >(ne:CCC (reg:SI 82)
> >(const_int 0 [0])))
> > which has no meaning for a use that uses "ltu".
> 
> What should a valid setter look like?  It should set reg 17 in CCC mode if
> reg 82 in SI mode isn't 0.

CCCmode can only represent the result of a comparison, like any other MODE_CC
thing.  The i386 CCCmode means only the carry bit can be used for this, so you
beed to do an unsigned comparison against (const_int 1).  This will end up with
the opposite polarity of what you said I guess, you need "geu" instead?

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #12 from Segher Boessenkool  ---
(In reply to H.J. Lu from comment #11)
> Assuming (reg:CCC 17 flags) is set to 1 by compare properly, how should

A MODE_CC RTL reg is never set to "1".  It is set to the result of a
comparison,
instead.  The semantics of a consumer of a MODE_CC depends on the producer.

> (insn 50 49 51 2 (parallel [
> (set (reg:SI 93)
> (neg:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]
> (clobber (reg:CC 17 flags))
> ]) "107172.c":4:10 1258 {*x86_movsicc_0_m1_neg}
>  (expr_list:REG_DEAD (reg:CCC 17 flags)
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil
> 
> work?

The semantics of
  (ltu:SI (reg:CCC 17) (const_int 0))
is: the result of "ltu" of the producer of this reg 17, taken from 17 as mode
CCC (which means only the carry output is valid), and that result as a SImode
(which then depends on what STORE_FLAG_VALUE is for your target -- 1 or -1 for
most targets, but other values are more complicated).

It never means "1".  It never means "0".  Never.

To determine the semantics of this piece of RTL you need to see the setter(s)
of reg 17 feeding this use.  In this case, the setter was
  (set (reg:CCC 17)
   (ne:CCC (reg:SI 82)
   (const_int 0 [0])))
which has no meaning for a use that uses "ltu".

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #10 from Segher Boessenkool  ---
The input to combine has

(insn 49 10 50 2 (parallel [
(set (reg:CCC 17 flags)
(ne:CCC (reg:SI 82 [ a.1_2 ])
(const_int 0 [0])))
(set (reg:SI 92)
(neg:SI (reg:SI 82 [ a.1_2 ])))
]) "107172.c":4:10 680 {*negsi_ccc_1}
 (expr_list:REG_DEAD (reg:SI 82 [ a.1_2 ])
(expr_list:REG_UNUSED (reg:SI 92)
(nil
(insn 50 49 51 2 (parallel [
(set (reg:SI 93)
(neg:SI (ltu:SI (reg:CCC 17 flags)
(const_int 0 [0]
(clobber (reg:CC 17 flags))
]) "107172.c":4:10 1258 {*x86_movsicc_0_m1_neg}
 (expr_list:REG_DEAD (reg:CCC 17 flags)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

This is incorrect already: insn 49 has to do a cmp, not a ne, for it to be
valid.  It was created by the ce1 pass.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #8 from Segher Boessenkool  ---
Bah, scratch that last part, of course it is valid (I thought this was using 0
in a MODE_CC but I just cannot read).

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2022-10-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #7 from Segher Boessenkool  ---
Please show the (relevant part of) output of -fdump-rtl-combine-all ?  At least
those parts where it decided (ltu:SI (const_int 1) (const_int 0)) is valid (it
isn't) and where optimising that to (const_int 0) is valid (it isn't).

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2022-10-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #32 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #31)
> (In reply to Segher Boessenkool from comment #30)
> > We have to disallow all (*all*) operands that require prefixed insns, until
> > we can handle those properly.
> 
> So if we can't disallow pcrel addresses in asm operands in 
> rs6000_legitimate_address_p, then where can we disallow them when they're
> used with all of the current memory constraints?  Ie, not the new pcrel
> address friendly constraint we don't have yet?

Maybe we can do something like

  "m!"(xx)

to mean prefixed addressing is allowed.  This would be handled adjacent to
where we handle "m<>" already (in recog.cc mostly).

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2022-10-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #30 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #29)
> (In reply to Segher Boessenkool from comment #28)
> > All prefixed addresses, pcrel or R=0, are valid always.  The original code
> > is correct.
> 
> Well they're only valid when compiling for power10, but we probably don't
> generate pcrel addresses unless we're compiling for power10, so ok.

Of course, generating prefixed instructions for a CPU that does not support
that is not valid.  The same is true for any optional instruction or addressing
form.  GCC does not generate those.

> > But lxsd cannot use "m" as constraint anyway.  It needs "wY", and that will
> > work fine here?
> 
> "wY" might be correct for lxsd, but I don't see how using "wY" instead of
> "m" will stop us from generating a pcrel here, since mem_operand_ds_form()
> doesn't disallow pcrel addresses.  Ie, lxsd is a red herring to the actual
> bug.

"m" is not correct for "lxsd" (and even "Y" is only because we now require
"Y<>" to allow update form insns).

I mistakenly though that "wY" does not allow prefixed.  But it does.

We have to disallow all (*all*) operands that require prefixed insns, until
we can handle those properly.

[Bug target/98519] rs6000: @pcrel unsupported on this instruction error in pveclib

2022-10-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

--- Comment #28 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #27)
> (In reply to Michael Meissner from comment #23)
> > If we change rs6000_legitimate_address_p to return false if we have a
> > prefixed address and we are in asm, we get an insn not found error:
> > 
> > --- /home/meissner/tmp/gcc-tmp/TskwFJ_rs6000.c  2021-02-16
> > 11:44:05.520201674 -0500
> > +++ gcc/config/rs6000/rs6000.c  2021-02-16 11:41:41.444740394 -0500
> > @@ -9532,7 +9532,7 @@ rs6000_legitimate_address_p (machine_mod
> >  
> >/* Handle prefixed addresses (PC-relative or 34-bit offset).  */
> >if (address_is_prefixed (x, mode, NON_PREFIXED_DEFAULT))
> > -return 1;
> > +return !recog_data.is_asm;
> >  
> >/* Handle restricted vector d-form offsets in ISA 3.0.  */
> >if (quad_offset_p)
> 
> I don't think this change is correct as is, since pcrel addresses could be
> legitimate in asm

All prefixed addresses, pcrel or R=0, are valid always.  The original code
is correct.

But lxsd cannot use "m" as constraint anyway.  It needs "wY", and that will
work fine here?

[Bug rtl-optimization/107050] duplicate load of return value when facing multiple branches

2022-09-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050

--- Comment #2 from Segher Boessenkool  ---
Splitting blocks in shrink-wrap will cause degraded performance compared
to the status quo, on average.  If I understand what will be split how,
that is?  It certainly can be good to move more code, much much more than
prepare_shrink_wrap does, but that is a good trade-off most of the time
only because it makes the fast path faster, makes less code executed when
there is an early return: just randomly moving code to be executed later
makes code *slower*.

Where shrink-wrapping duplicates code here only one copy is executed, ever.

The question seems to really be why at -O1 global variable accesses are not
optimised very well?  The answer to that is this is -O1, if you want good
optimisation you should use -O2!

[Bug middle-end/107027] New: Please improve the documentation of global register variables

2022-09-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107027

Bug ID: 107027
   Summary: Please improve the documentation of global register
variables
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

The semantics of global register variables are a strict superset of the
semantics
of local register variables, but this isn't clearly documented.  Having a
global
register variable as operand to an extended inline asm guarantees the register
declared for the register variable is used in the asm, it never is copied to a
temporary or similar.

[Bug target/100799] Stackoverflow in optimized code on PPC

2022-09-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #16 from Segher Boessenkool  ---
It cannot be -mcpu=power8, that cannot generate isel.  -mcpu=power9 comes
closer, but I still do not see exactly the same output, and crucially not
the strange store either.

What the what.

[Bug target/96072] ICE: Segmentation fault (in add_reg_note)

2022-09-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96072

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1
   Last reconfirmed||2022-09-13

--- Comment #3 from Segher Boessenkool  ---
None of these ICE, not at any optimisation level, for no Linux ABI.  Is this
fixed, are any special options needed?

[Bug target/100799] Stackoverflow in optimized code on PPC

2022-09-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #14 from Segher Boessenkool  ---
What is the exact command line (and relevant configuration!) required to
reproduce this?

[Bug target/106536] P9: gcc does not detect setb pattern

2022-09-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106536

Segher Boessenkool  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-09-13
 Status|UNCONFIRMED |NEW
 CC||segher at gcc dot gnu.org

--- Comment #1 from Segher Boessenkool  ---
Confirmed.

GCC uses conditional branches here in expand already.  It is hard to optimise
this over that.

Using -mcpu=power10 we don't get conditional branches:

cmpld 7,3,4  # 8[c=4 l=4]  *cmpdi_unsigned
li 9,1   # 43   [c=4 l=4]  *movsi_internal1/8
cmpld 0,3,4  # 45   [c=4 l=4]  *cmpdi_unsigned
setnbc 10,28 # 44   [c=4 l=4]  *setnbc_unsigned_si
isel 3,9,10,1# 46   [c=4 l=4]  isel_unsigned_si/1
extsw 3,3# 24   [c=4 l=4]  extendsidi2/1
blr  # 51   [c=4 l=4]  simple_return

but this of course isn't ideal yet either.  The branches aren't fully optimised
away until ce2, which is *after* combine.  ce1 didn't catch this, perhaps
because
the two conditional assignments are a bit intertwined there?

[Bug rtl-optimization/106751] internal compiler error: in purge_dead_edges with inline-asm goto

2022-09-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #7 from Segher Boessenkool  ---
For me (powerpc64-linux) it fails with

===
106751.c:10:1: error: flow control insn inside a basic block
(jump_insn 6 3 13 2 (parallel [
(asm_operands/v ("") ("") 0 []
 []
 [
(label_ref:DI 9)
] 106751.c:5)
(clobber (reg:SI 98 ca))
]) "106751.c":5:3 -1
 (expr_list:REG_UNUSED (reg:SI 98 ca)
(nil))
 -> 9)
during RTL pass: loop2_invariant
===

That pass did
===
Set in insn 13 is invariant (0), cost 4, depends on 
Decided to move invariant 0 -- gain 4
Invariant 0 moved without introducing a new temporary register
changing bb of uid 13
  from 3 to 2
===

It moved the insn after a jump_insn, not a good idea:

===
(jump_insn 6 3 13 2 (parallel [
(asm_operands/v ("") ("") 0 []
 []
 [
(label_ref:DI 9)
] 106751.c:5)
(clobber (reg:SI 98 ca))
]) "106751.c":5:3 -1
 (expr_list:REG_UNUSED (reg:SI 98 ca)
(nil))
 -> 9)
(insn 13 6 9 2 (set (reg:SI 119)
(const_int 0 [0])) "106751.c":9:28 562 {*movsi_internal1}
 (nil))
;;  succ:   3 [always]  count:10631108 (estimated locally)
===

Maybe it does not see this is an unconditional jump insn?

[Bug target/106895] powerpc64 strange extended inline asm behaviour with register pairs

2022-09-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Segher Boessenkool  ---
wQ is a memory constraint, not a register constraint.

We have no way to express even/odd in register constraints.  You can force
it some other way?

It's a lot easier if you use __atomic_* instead of inline asm?  Like:

void f(unsigned __int128 *addr, unsigned __int128 val)
{
__atomic_store_n(addr, val, __ATOMIC_RELAXED);
}

Please reopen if you want something in particular to be changed.  Thanks!

[Bug middle-end/106833] Miss to handle OPAQUE_TYPE specially in verify_type

2022-09-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106833

--- Comment #12 from Segher Boessenkool  ---
(In reply to Kewen Lin from comment #10)
> (In reply to Segher Boessenkool from comment #9)
> > Although, preferably we should not allow assigning one opaque type to 
> > another
> > opaque type just because they will eventually use the same mode, not without
> > warning anyway?  Or is that unavoidable?  Compare assigning a V4SI to a 
> > V4SF.
> 
> IIUC, you meant the assignment happening for two different opaque types,
> then it's a conversion?

Yes exactly.

> If so, I think we can check it in
> rs6000_invalid_conversion, currently it just simply checks the modes.

Yup.

> If we
> have two different opaque types mapping to one same mode, we can further
> check if the things like TYPE_CANONICAL match.

Like that.  It isn't urgent -- we currently have only one type for each of
our two opaque modes -- but if we allow too much here, we would need a separate
mode for each opaque thing we want to distinguish, which is contrary to the
point of having it :-)

> > I don't know if your patch does this, btw, and it isn't so easy to test, we
> > currently have only one type for each of our opaque modes.  Maybe test by
> > adding an extra builtin type :-)
> 
> This patch doesn't handle that, the main issue here is that some
> cv-qualified opaque type can cause ICE in type verification during LTO.
> IMHO, opaque types conversion issue looks like a separated issue and it can
> be handled in target hook invalid_conversion. But I guess you want a more
> generic check?  And as you pointed out, there is no such scenario that two
> opaque types have the same mode, not sure if we really want to handle it for
> now. :-)

I think it can be handled generically, no target code is needed for it.

[Bug middle-end/106833] Handle OPAQUE_TYPE in gimple_canonical_types_compatible_p

2022-09-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106833

--- Comment #9 from Segher Boessenkool  ---
Although, preferably we should not allow assigning one opaque type to another
opaque type just because they will eventually use the same mode, not without
warning anyway?  Or is that unavoidable?  Compare assigning a V4SI to a V4SF.

I don't know if your patch does this, btw, and it isn't so easy to test, we
currently have only one type for each of our opaque modes.  Maybe test by
adding an extra builtin type :-)

[Bug preprocessor/106840] glibc master build failure on ppc64le-linux-gnu since r13-2212-geb4879ab905308

2022-09-05 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106840

--- Comment #2 from Segher Boessenkool  ---
This is inside #ifdef __ASSEMBLER__ .  Running assembler code (or anything else
that isn't C) through the C preprocessor is the subject of one of my "why would
you ever do that" rants: the assembler macro processor is strictly more capable
already, and has much saner semantics (for assembler code).

[Bug middle-end/106833] Handle OPAQUE_TYPE in gimple_canonical_types_compatible_p

2022-09-05 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106833

--- Comment #7 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #6)
> Ah, that special "mode".  I think verify_types shouldn't do anything
> for OPAQUE_TYPES or alternatively trust the targets setup of
> TYPE_MAIN_VARIANT/TYPE_CANONICAL.  Maybe verify TYPE_CANONICAL
> and TYPE_MAIN_VARIANT are also OPAQUE_TYPE.

It's probably easiest to just test if the TYPE_MODEs match, for OPAQUE_TYPEs?

> So the solution should be fully inside verify_type.

Yeah.  OPAQUE might be a "hack" (or call it an "invention" :-) ), but the nice
thing about is it is very self-contained, no interactions anywhere.  That is
the
point even :-)

[Bug target/106736] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:333

2022-08-31 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106736

--- Comment #11 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #10)
> (In reply to Segher Boessenkool from comment #9)
> > When MMA is not enabled,
> ...
> > the __vector_{quad,pair} types should not exist, 
> 
> Unfortunately, target type initialization only occurs once at the very
> beginning

That is what indirection is for (or copying more likely, in this case).

> and if we don't initialize them because of the command line
> options in affect at the time, then we get problems like PR96125, so we have
> to initialize these types always, just like we do for built-in functions.

I don't understand?

> I also don't know of any way to attach flags to a type that says when a type
> is enabled/exists and when it doesn't.

You have to manually add code in strategic places.  Very fiddly, and very
fragile.  Or, can you do stuff in rs6000_option_override_internal?

[Bug target/106736] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:333

2022-08-31 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106736

--- Comment #9 from Segher Boessenkool  ---
When MMA is not enabled, the movxo and movoo patterns should never be reached
at all; the __vector_{quad,pair} types should not exist, and the
{XO,OO}mode-using
code should then never be created.  So how did this happen here?

[Bug target/106755] Incorrect code gen for altivec intrinsics with constant inputs

2022-08-26 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106755

--- Comment #5 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #2)
> So the tests (I've removed all static inline usage and always use
> -fno-inline) pass with -O1 and fail with -O2 and -O3.  Looking at all of the
> optimizations enabled by -O2 that are not in -O1 and using -fno-* for them,
> the only option that allows the tests to pass with -O2 is
> -fno-strict-aliasing.  That said, -Wall and -Wstrict-aliasing do not flag
> any warnings with the code.  I suppose they could miss some issues in the
> test case code???

There are 10 cases (in just the .cc files) that use "optimize > 1", and
another 5 that do "optimize >= 2".

The -O options do more than just being shorthand for some -f option selections.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #23 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #22)
> The longer a have been looking at these STRICT_LOW_PART issue the more I
> think that STRICT_LOW_PART is an awful way to express what we need:
> 
> - the information needed to understand what it is doing is distributed
> across 3 RTXs (strict_low_part (subreg:mode1 (reg:mode2 xx) OFS))
> - the big problems arise since the involved RTXs are separately optimized
> and we might end up with partial information without a clear definition of
> how to deal with that
> - actually it is really hard to handle the RTXs as one unit. Recursively
> walking RTXs needs to record whether we are in a STRICT_LOW_PART or not.
> 
> 
> I think it might make sense to explore other ways to express this:
> 
> 1. SUBREG flag - Looks easy, but it would be hard to catch all places which
> should care about that flag.
> 
> 2. Introduce a new RTX code which has a mode and an offset attached but does
> not require an additional SUBREG anymore.
> 
> 3. Since a STRICT_LOW_PART is essentially a bit insertion operation we could
> express it always with a ZERO_EXTRACT target operand and get rid of
> STRICT_LOW_PART entirely. A ZERO_EXTRACT would be somewhat more cumbersome
> to deal with, since it would always require to check the bit width and
> offset for all the cases which just use mode boundaries. But at least most
> passes know how to deal with them already.

4. With existing simple RTL:

(set (reg:DI x) (ior:DI (and:DI (reg:DI x) (const_int -4294967296))
(zero_extend:DI (reg:SI y

(ZERO_EXTRACT is never useful IMNSHO: it just makes the easy cases slightly
easier to write, and causes a lot of useless work everywhere else).

[Bug target/100736] ICE: unrecognizable insn

2022-08-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100736

--- Comment #6 from Segher Boessenkool  ---
There are so many things here, it's hard to start.  Two big things:

Firstly, this is not floating point at all, so -ffinite-math-only should not
make any difference.  We currently abuse CCFP (in a non-safe way), this should
be fixed.

Secondly, -mcpu=power9 (to get isel) or -mcpu=power10 (to get setbc) are more
interesting.  What is the generated machine code for those?

More things:

crnot is needed to get the polarity of the result correct.  We could instead
do a xori 1 (or similar) on the eventual GPR.  If we do say a cror and a crnot
we should make this can be combined to a crnor (all 14 logic functions are
supported), but if the inversion is done in the GPR (with such a xori, say),
this is much harder to optimise.

If we are not interested in overflows, always one of LT GT EQ is set, so we
never need any crlogical insn here.

To be exact, this is whenever we have valid inputs: if there is output
overflow cr6.3 will be set as well, but still exactly one of cr6.0, cr6.1,
cr6.2 will be set.

But if there is an invalid *input* cr6 is set to 0b0001 always.  I don't think
we need to care here (otoh it isn't obvious how to best model it!)

[Bug target/102146] [11 regression] several test cases fails after r11-8940

2022-08-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #21 from Segher Boessenkool  ---
Closing as fixed then (pr56605.c still fails on older branches, but that is
harmless).

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #19 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #18)
> (In reply to Segher Boessenkool from comment #17)
> ...
> > Yes, but that says the high 48 bits of the hardware reg are untouched, which
> > is not true (only the high 16 of the low 32 are guaranteed unmodified).
> 
> Right, if the original register mode does not match the mode of the full
> hardreg, we continue to need that mode as the upper bound. So with the
> subreg folding in reload we appear to loose information we need to interpret
> the STRICT_LOW_PART correctly.

Exactly.  This is why strict_low_part of anything else than a subreg is
ill-defined.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #17 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #16)
> (In reply to Segher Boessenkool from comment #15)
> > (In reply to Andreas Krebbel from comment #14)
> > > > So you are suggesting that every strict_low_part after reload can just 
> > > > be
> > > > removed?  If that is true, should we not just do exactly that then?
> > > 
> > > I think we have 3 options:
> > > (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> > > (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> > > (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
> ...
> > > (3) E.g. it means that the bits of hardreg x in its hardware mode (the 
> > > mode
> > > for UNITS_PER_WORD) which are not covered by MODE are not touched by the 
> > > SET.
> > 
> > But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
> > is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
> > high 32 bits are fair game.  That does not agree with your proposed
> > semantics.
> 
> In that case I would have expected reload to turn this into 
> (strict_low_part (reg:HI xx))
> already.

Yes, but that says the high 48 bits of the hardware reg are untouched, which
is not true (only the high 16 of the low 32 are guaranteed unmodified).

[Bug rtl-optimization/96475] direct threaded interpreter with computed gotos generates suboptimal dispatch loop

2022-08-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96475

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #26 from Segher Boessenkool  ---
(In reply to Paweł Bylica from comment #25)
> Is this issue resolved then?

If we don't want backports it is done yes.  Originally I wanted backports, but
given we needed fixup patches etc., this was a bit dangerous.  And now it is
two years ago.

Closing.  Thanks!

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #15 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #14)
> > So you are suggesting that every strict_low_part after reload can just be
> > removed?  If that is true, should we not just do exactly that then?
> 
> I think we have 3 options:
> (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
> 
> (1) For that, all passes after reload must be able to deal with these
> SUBREGs. Since SUBREGs are rare after reload it is hard to say how robust
> that handling is right now.

On some targets you always have subregs to describe the action of certain
machine instructions (like mulh one PowerPC).

In the past there were subregs of mem as well.  Those shouldn't occur anymore,
but most (all?) late passes still know how to handle it.


> (2) Here the question to me is which passes after reload currently do
> something with the strict-low-part info. Clearly a non-option if we would
> loose any optimizations with that.

Yes.  This potentially even changes semantics, unless we are sure nothing uses
the "high part" at all.


> (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode
> for UNITS_PER_WORD) which are not covered by MODE are not touched by the SET.

But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
high 32 bits are fair game.  That does not agree with your proposed semantics.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #13 from Segher Boessenkool  ---
(Sorry I missed this)

(In reply to Andreas Krebbel from comment #11)
> I've tried to change our movstrict backend patterns to use a predicate on
> the dest operand which enforces a subreg. However, since reload strips the
> subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a
> reg again. At least after reload something like this should be acceptable -
> right?
> 
> 298r.ira:
> (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
> (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
>  (nil))
> 
> 299r.reload:
> (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
> (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
> "t.cc":37:17 1485 {movstrictsi}
>  (nil))

So you are suggesting that every strict_low_part after reload can just be
removed?  If that is true, should we not just do exactly that then?

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2022-08-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #29 from Segher Boessenkool  ---
Okay, closing then.  Thanks!

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2022-08-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #27 from Segher Boessenkool  ---
So this particular bug is no longer there, and this PR can be closed?

[Bug target/102146] [11 regression] several test cases fails after r11-8940

2022-08-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146

--- Comment #19 from Segher Boessenkool  ---
Hi guys,

What testcases are still failing?  I'm a bit lost :-)

[Bug target/103197] [10/11] ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx for a single byte

2022-08-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #18 from Segher Boessenkool  ---
Fixed everywhere.

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

--- Comment #9 from Segher Boessenkool  ---
(In reply to Alan Modra from comment #8)
> (In reply to Segher Boessenkool from comment #7)
> > '-fpatchable-function-entry=N[,M]'
> >  Generate N NOPs right at the beginning of each function, with the
> >  function entry point before the Mth NOP.
> 
> Bad doco.  Should be "after the Mth NOP" I think.  Or better written to
> avoid the concept of a 0th nop.  Default for M is zero, placing all nops
> after the function entry and before normal function prologue code.

It is correct as written?

Also, "M" isn't used in the current compiler (and *cannot* be used: it is a
local variable that goes out of scope after being set, patch_area_start in
process_options).

[The text is carefully written so that "anywhere before the Mth nop" will be
a valid implementation as well, btw, that perhaps explains the tortured
language here.  But maybe there is another explanation for that).

> > The nops have to be consecutive.
> 
> I hope you are making this statement based on

Based on just what is written.  "N nops right at the beginning of the
function".
Not very formal, but not open to other interpretations either.

> an analysis of the purpose of
> having M nops before the entry point and N-M after the entry point, because
> the documentation you are quoting doesn't take into account the fact that
> ELFv2 functions have two entry points.  We don't have "the" entry point.

If ELFv2 wants to do something with the LEP here, it should make some extra
flag here.  Abusing generic facilities for a different purpose never works.

> I admit I didn't analyse -fpatchable-function-entry usage in any depth
> before writing the patches in PR98125.  All I did was look at the linux
> kernel to the point of deciding that we want a patchable area after the
> local entry point to catch all calls to the function.  That would be what
> -fpatchable-function-entry=n does for ELFv2, and I think we all agree on
> that.

The PowerPC Linux kernel uses -mprofile-kernel instead, which works a lot
better for them AFAIUI.  Are people planning on changing that?

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

--- Comment #7 from Segher Boessenkool  ---
'-fpatchable-function-entry=N[,M]'
 Generate N NOPs right at the beginning of each function, with the
 function entry point before the Mth NOP.

The nops have to be consecutive.

[Bug fortran/106579] ieee_signaling_nan problem in fortran on powerpc64

2022-08-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106579

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
Let me say again that IEEE QP and double-double should not be the same kind.
This very obviously cannot work at all.

[Bug target/96786] rs6000: We output the wrong .machine for -mcpu=7450

2022-08-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96786

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Segher Boessenkool  ---
Yes, I fuxed it in g:77eccbf39ed.  That needed the g:80fcc4b6afee fixup, and
will need more work in the future. but this PR is fixed indeed :-)

[Bug target/103498] Spec 2017 imagick_r is 2.62% slower on Power10 with pc-relative addressing compared to not using pc-relative addressing

2022-08-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103498

--- Comment #2 from Segher Boessenkool  ---
Mike, do you still see this?

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

--- Comment #3 from Segher Boessenkool  ---
Your second option isn't correct: all these nops should be consecutive.  Your
option 1 is fine :-)

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #28 from Segher Boessenkool  ---
(In reply to rsand...@gcc.gnu.org from comment #25)
> - On big-endian targets, vector loads and stores are assumed to put the
>   first memory element at the most significant end of the vector register.

I agree with everything here, except calling this "most significant".  That
just makes no sense for vectors.  It is element 0, but that is not more
significant than any other element :-)  Vectors aren't integers.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #27 from Segher Boessenkool  ---
IMO what vec_select calls element 0 is always in the first argument of the
vec_concat it works on, in BE as well as LE.  But yes this is quite
underdefined
in our documentation, and who know what is actually implemented, in targets as
well as in generic code :-(

[Bug rtl-optimization/106419] ICE in lra_assign, at lra-assigns.cc:1649

2022-07-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106419

--- Comment #10 from Segher Boessenkool  ---
(In reply to Kewen Lin from comment #9)
> (In reply to Segher Boessenkool from comment #8)
> > So for which pseudo and which hard register did this ICE, and what did the
> > code look like at that point?
> 
> The culprit pseudo is r133, the values of those related expressions in the
> check:
> 
> lra_reg_info[i].nrefs  -> 4
> 
> reg_renumber[i] -> 97
> 
> overlaps_hard_reg_set_p(lra_reg_info[i].conflict_hard_regs, E_SImode, 97) ->
> true
> 
> Before IRA, the code looks like:

> (insn 34 33 35 4 (set (reg:SI 97 ctr)
> (reg/v/f:SI 133 [ foo ])) "test.f":17:72 562 {*movsi_internal1}
>  (nil))  

> in IRA, the hard reg assignment is:

> choosing r3 for r133.

Doing ctr (reg 97) instead (as LRA seems to change it to?) is
counterproductive.

We have that

> (insn 33 32 34 4 (set (reg:SI 3 3)
> (reg/v/f:SI 137 [ g ])) "test.f":17:72 562 {*movsi_internal1}
>  (nil))

right before 34, so if we want to use hard reg 3 for pseudo 97 we could
swap insns 33 and 34 (both of which are trivial assignments), much nicer
than the current dance via memory.

But all of this is a distraction from the actual bug here, sorry.

[Bug rtl-optimization/106419] ICE in lra_assign, at lra-assigns.cc:1649

2022-07-26 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106419

--- Comment #8 from Segher Boessenkool  ---
So for which pseudo and which hard register did this ICE, and what did the
code look like at that point?

[Bug rtl-optimization/106419] ICE in lra_assign, at lra-assigns.cc:1649

2022-07-26 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106419

--- Comment #7 from Segher Boessenkool  ---
That mfctr;mtctr is extremely slow of course, and that mtctr is superfluous
completely (this is true for all registers, not just CTR, nothing special to
PowerPC even).  I know this is just -Og, but still :-)

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-07-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #11 from Segher Boessenkool  ---
I mean, if that patch is actually flawed, this is GCC 12 and latter; if the
problem is more generic (combine, probably simplify-rtx to be exact) it is
more widespread.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-07-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #10 from Segher Boessenkool  ---
This happened after
  commit 0910c516a3d72af048af27308349167f25c406c2
  Author: Xionghu Luo 
  Date:   Tue Oct 19 04:02:04 2021 -0500
which probably caused it.  That means it would be GCC 12 and later.

[Bug target/106091] [11/12/13 Regression] during RTL pass: swaps ICE: verify_flow_info failed: missing REG_EH_REGION note at the end of bb 69 with -fnon-call-exceptions

2022-07-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106091

--- Comment #4 from Segher Boessenkool  ---
That patch looks good :-)

[Bug target/100799] Stackoverflow in optimized code on PPC

2022-07-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #13 from Segher Boessenkool  ---
(In reply to Alexander Grund from comment #11)
> Some more experiments with GCC 10.3, OpenBLAS 0.3.15 and FlexiBLAS 3.0.4:
> 
> Baseline: Broken at -O1, working at -Og
> 
> I got it to break with "-Og -fmove-loop-invariants".
> Then it worked again by adding "-fstack-protector-all".

Both are great info!

> But that is
> seemingly not advisable:
> https://developers.redhat.com/blog/2020/05/22/stack-clash-mitigation-in-gcc-
> part-3

-fstack-protector-strong is cheap enough that you can (and perhaps should)
enable it almost always.  Some distributions do this even?

-fstack-check= is an Ada thing.  -fstack-clash-protection is a different thing
as well (that's what that article is about).

Enabling ssp is not a great workaround of course, it is much to roundabout;
and I suspect the only reason it works is because it changes the stack layout.
Still, useful info, thanks :-)

[Bug target/100799] Stackoverflow in optimized code on PPC

2022-07-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #12 from Segher Boessenkool  ---
(In reply to Alexander Grund from comment #10)
> (In reply to Peter Bergner from comment #2)
> > The failure with GCC 7 and later coincides with the PPC port starting to
> > default to LRA instead of reload.
> 
> Is there a compiler flag that can switch the default back as a workaround?

No, the PowerPC GCC port only supports LRA since g:7a5cbf29beb2 (from 2017).

[Bug rtl-optimization/106210] [10/11/12/13 Regression] missing shrink wrap for simple case since r9-3594-g8d2d39587d941a40

2022-07-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106210

--- Comment #6 from Segher Boessenkool  ---
The prepare_shrink_wrap code handles only very limited very simple cases.

After g:8d2d39587d94 there is another copy at this point (which is an
*improvement*, it gives more freedom).  I don't see how this trips up
prepare_shrink_wrap though?

Btw, for rs6000 this is no longer shrink-wrapped in GCC 6 already, long
before that commit.  It saves r3 in r31 then (was r9), and that makes
requires_stack_frame_p return true (because that reg needs to be saved on
the stack before use, being non-volatile and all).

[Bug c/106335] New: struct copies with volatile fields are done using memcpy

2022-07-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106335

Bug ID: 106335
   Summary: struct copies with volatile fields are done using
memcpy
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

struct s { volatile int x[42]; } a;
void f(struct s b) { a = b; }

results in machine code calling memcpy(), which is not valid.

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

--- Comment #4 from Segher Boessenkool  ---
On aarch64 we have (in expand):

;; i_4 = i_3 << 64;

(insn 10 9 11 (set (subreg:DI (reg/v:TI 94 [ i ]) 8)
(subreg:DI (reg/v:TI 93 [ i ]) 0)) "100694.c":4:6 -1
 (nil))

(insn 11 10 0 (set (subreg:DI (reg/v:TI 94 [ i ]) 0)
(const_int 0 [0])) "100694.c":4:6 -1
 (nil))

But on rs6000 we get:

;; i_4 = i_3 << 64;

(insn 10 9 11 (set (subreg:DI (reg/v:TI 119 [ i ]) 0)
(ashift:DI (subreg:DI (reg/v:TI 118 [ i ]) 8)
(const_int 0 [0]))) "100694.c":4:6 -1
 (nil))

(insn 11 10 0 (set (subreg:DI (reg/v:TI 119 [ i ]) 8)
(const_int 0 [0])) "100694.c":4:6 -1
 (nil))

What the what.

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

--- Comment #3 from Segher Boessenkool  ---
Should this not be handled by the subreg passes?

[Bug lto/91287] LTO disables linking with scalar MASS library (Fortran only)

2022-06-30 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91287

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #44 from Segher Boessenkool  ---
This was fixed on 9 and later.  Closing.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-06-30 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #7 from Segher Boessenkool  ---
(The original insns, before this combination.)

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-06-30 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #6 from Segher Boessenkool  ---
What is wrong there?  It isn't obvious.  You may need to show insns 188 and 199
in non-slim form, "slim" is very lossy.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p

2022-06-29 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #8 from Segher Boessenkool  ---
There is structural RTL checking in rtl.h (see RTL_CHECK{1,2,C1,C2,C3} and the
various ELT and INT accessors).  This would be easier to use here if we used
some STRICT_LOW_PART_P everywhere :-)

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p

2022-06-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #6 from Segher Boessenkool  ---
It looks like quite a few more backends use strict_low_part on random RTL,
which
is completely meaningless :-(

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p

2022-06-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #5 from Segher Boessenkool  ---
Thanks for tracking this down!

Interesting it survived so long.  We could use some RTL checking on this :-)

[Bug rtl-optimization/106101] [12/13 Regression] ICE in reg_bitfield_target_p

2022-06-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #3 from Segher Boessenkool  ---
STRICT_LOW_PART is required to contain a SUBREG though.

[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #11 from Segher Boessenkool  ---
Wrt rs6000: we have shift+mask+compare in just one insn (it is basic powerpc),
and our
  (define_insn "*and3_imm_dot_shifted"
pattern outputs this as just an "andi." insn when it can.  But indeed the shift
wasn't optimised away for us either.

[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #10 from Segher Boessenkool  ---
So on Arm we get

Trying 6 -> 8:
6: r119:SI=r123:SI>>0x8
  REG_DEAD r123:SI
8: {cc:CC_NZ=cmp(r119:SI&0x6,0);clobber scratch;}
  REG_DEAD r119:SI
Failed to match this instruction:
(parallel [
(set (reg:CC_NZ 100 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))
(clobber (scratch:SI))
])
Failed to match this instruction:
(set (reg:CC_NZ 100 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 123)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))

instead of something like

(set (reg:CC_NZ 100 cc)
 (compare:CC_NZ (and:SI (reg:SI 123)
(const_int 1536))
(const_int 0)))

which is correct for every CC mode even, not just NZ?

[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #9 from Segher Boessenkool  ---
This is all handled in combine, nothing is specific to rs6000 (only the
description of all of our insns is, of course, but there is really no way
around that, nor should there be :-) )

Why does combine not optimise this for Arm?  Of course it would be good if
this would be optimised early as well, but that does not mean we should not
try to optimise it late as well!

[Bug middle-end/106059] [13 regression] cc.dg/vect/pr79347.c fails after r13-1171-g9f55aee9dca759

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106059

--- Comment #5 from Segher Boessenkool  ---
Thank you for the quick fix!

[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #7 from Segher Boessenkool  ---
For Power, both the original testcase and the one in comment 5 generate perfect
code, for all -mcpu= I tested.  Should this be a target bug?

[Bug target/105991] [12/13 Regression] rldicl+sldi+add generated instead of rldimi

2022-06-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105991

Segher Boessenkool  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #8 from Segher Boessenkool  ---
Yes, this needs a backport.

[Bug testsuite/106059] [13 regression] cc.dg/vect/pr79347.c fails after r13-1171-g9f55aee9dca759

2022-06-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106059

--- Comment #1 from Segher Boessenkool  ---
Well, this patch should not have changed behaviour at all!

[Bug middle-end/106016] [PowerPC] crash with attempt to initialize array of MMA accumulators

2022-06-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106016

Segher Boessenkool  changed:

   What|Removed |Added

  Component|target  |middle-end

--- Comment #10 from Segher Boessenkool  ---
No, this is *not* a target issue.  Let's try middle-end, then.

[Bug target/105991] [12/13 Regression] rldicl+sldi+add generated instead of rldimi

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105991

--- Comment #4 from Segher Boessenkool  ---
(In reply to Marek Polacek from comment #0)
> It doesn't look like a wrong code problem, but it seems more optimal to use
> rldimi (rotate left, mask insert) rather than rotate left by 0 bits, AND
> with a mask, shift left, and add.

Confirmed.  The original code is much better (and yes, the current is correct
as well).

[Bug target/106017] [PowerPC] No array-to-pointer conversion for MMA accumulator

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106017

--- Comment #6 from Segher Boessenkool  ---
FWIW, reinterpret_cast allows exactly the same things as C casts (but with the
obvious C++ extensions: member objects, member functions, C++'s concept of
lvalue, that kins of thing).  It is not similar to bit_cast at all.

[Bug target/106016] [PowerPC] crash with attempt to initialize array of MMA accumulators

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106016

--- Comment #6 from Segher Boessenkool  ---
Like that yes.  Pre-approved if it survives regcheck, too.  Thanks!

Please add the testcase as well of course :-)

[Bug target/106017] [PowerPC] No array-to-pointer conversion for MMA accumulator

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106017

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #5 from Segher Boessenkool  ---
Okay, I'll handle it.

[Bug target/106016] [PowerPC] crash with attempt to initialize array of MMA accumulators

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106016

--- Comment #3 from Segher Boessenkool  ---
Yeah.  It should just return 1 like the other scalar types?

[Bug target/106017] [PowerPC] No array-to-pointer conversion for MMA accumulator

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106017

--- Comment #3 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #2)
> We do not want or allow automatic conversions between the opaque
> __vector_pair and __vector_quad types and other types and those are
> correctly disallowed there.

Of course, but that is not what this is about...

> As for the pointer conversions tested there, I guess they came along for the
> ride?  Nemanja, do you remember the history there?  Or does LLVM allow the
> pointer conversions and it's just GCC that complains?

... this is.

Possibly the restriction prevents some ICEs elsewhere, but those just need to
be solved then, not hidden.

[Bug target/106015] [PowerPC] pointer to MMA accumulator not convertible to char pointer

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106015

--- Comment #2 from Segher Boessenkool  ---
Confirmed.  Likely the same cause as PR106017.

[Bug target/106016] [PowerPC] crash with attempt to initialize array of MMA accumulators

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106016

Segher Boessenkool  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-06-17

--- Comment #1 from Segher Boessenkool  ---
Confirmed.  Thanks for reporting these bugs!

[Bug target/106017] [PowerPC] No array-to-pointer conversion for MMA accumulator

2022-06-17 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106017

Segher Boessenkool  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-06-17

--- Comment #1 from Segher Boessenkool  ---
Confirmed.

C allows to convert a pointer to data to any other pointer to data (possibly
modulo alignment restrictions).  What is *not* valid is accessing anything via
a type not compatible with its effective type (or via a character type).

So the restriction in rs6000_invalid_conversion errors for valid C programs.
What was it intended to accomplish?

[Bug c++/87656] Useful flags to enable with -Wall or -Wextra

2022-06-03 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87656

--- Comment #17 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #16)
> Note, what is most important with this are configure scripts, if we start
> warning on something still widely used in configure snippets, we'll get
> silently different results of configure checks.

A configure check that isn't specifically for some warning) that gives
different
results if some random warning happens, is fundamentally broken already.  I
would
hope existing checks are more robust (but I certainly believe they are not :-(
)

> For old style definitions, the question is if we want to warn about
> void foo () {} style of functions or just those which actually have some
> arguments.

We can have a =2 to warn for everything, and =1 for just the more serious
things?
Easy to switch default for -Wall and -W that way, too.

[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck

2022-06-02 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402

--- Comment #47 from Segher Boessenkool  ---
(In reply to Sam James from comment #46)
> Even partially making the build less recursive would likely help a fair bit.

It will help a bit, sure, but not nearly as much as you perhaps hope for.

There are quite a few "synchronisation" points where nothing after it can be
done until everything before it has been done.  Partly this is just because
we have a three-stage bootstrap, but also there are some generator programs
that everything else depends on (on its output that is), and those are real
chokepoints.

Also, recursive make is a scourge of humanity, for sure, but fixing this has
to be done in auto first and foremost.

[Bug debug/105586] [11/12/13 Regression] -fcompare-debug failure (length) with -O2 -fno-if-conversion -mtune=power4 -fno-guess-branch-probability

2022-05-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105586

--- Comment #2 from Segher Boessenkool  ---
We have

+(debug_insn 11 10 81 2 (var_location:QI u8_1 (mem/c:QI (plus:DI (unspec:DI [
+(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
+(reg:DI 2 2)
+] UNSPEC_TOCREL)
+(const_int 3 [0x3])) [1 g+3 S1 A8])) "../105586.c":11:8 -1
+ (nil))

(the first such unspec in the file), and sched1 does

+;;   0--> b  0: i  11 loc [unspec[`*.LANCHOR0',%2] 47+0x3]   
:nothing:GENERAL_REGS+0(0)FLOAT_REGS+0(0)CR_REGS+0(0)SPECIAL_REGS+0(0)

with it.  Things in debug_insns should not influence code generation.

[Bug target/103605] [PowerPC] fmin/fmax should be inlined always with xsmindp/xsmaxdp

2022-05-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103605

--- Comment #8 from Segher Boessenkool  ---
(In reply to jos...@codesourcery.com from comment #4)
> > xsmindp
> > The minimum of a QNaN and any value is that value. The minimum of any value 
> > and
> > an SNaN is that SNaN converted to a QNaN.
> > xsmindp(NaN, 3.0) = 3.0 xsmindp(3.0, NaN) = NaN
> 
> That seems right for fmin, provided that (QNaN, SNaN) arguments in either 
> order produce a QNaN result (with "invalid" raised).

They do, and they return a QNaN with the payload of the first operand, in both
cases.

[Bug target/105325] power10: Error: operand out of range

2022-04-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105325

Segher Boessenkool  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug testsuite/105427] [12 regression] gcc.target/powerpc/pr92398.p9-.c fails after r12-8265-gad56a60f58c1ed

2022-04-28 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105427

--- Comment #2 from Segher Boessenkool  ---
Maybe it needs a dg-skip-if for the has_arch_XXX, instead of in the dg-do
target clause?

<    1   2   3   4   5   6   7   8   9   10   >