[Bug target/90513] asm thunks do not work on PowerPC64/VxWorks (kernel mode)

2019-05-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90513

--- Comment #9 from Segher Boessenkool  ---
With a local entry offset?  Do you mean it has non-zero top three bits of
st_other?

[Bug target/90453] PowerPC/AltiVec VSX: Provide vec_pack/vec_unpackh/vec_unpackl for 32<->64

2019-05-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90453

--- Comment #7 from Segher Boessenkool  ---
Note that vec_pack works for unsigned as well.

For vec_unpack[hl] of unsigned you can do a vec_merge[hl] instead (with the
first arg a zero vector).

[Bug target/90453] PowerPC/AltiVec VSX: Provide vec_pack/vec_unpackh/vec_unpackl for 32<->64

2019-05-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90453

Segher Boessenkool  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Segher Boessenkool  ---
vec_unpack* works on vectors of signed integers only, not unsigned.

You need to target at least power8 (-mcpu=power8) to get the long long
versions of this, i.e. the vpkudum and vupk[hl]sw instructions.

This is all documented correctly, as far as I see?  In the ISA doc,
in the ABI doc, and in the GCC docs?  (Power8 is ISA 2.07, we could add
some clarification for that).

[Bug target/90453] PowerPC/AltiVec VSX: Provide vec_pack/vec_unpackh/vec_unpackl for 32<->64

2019-05-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90453

--- Comment #3 from Segher Boessenkool  ---
What should the semantics of this be?  There are four 32-bit elts each
in packedl and packedr, which of those go where in unpacked?

I think what you want to do can be expressed with just two or maybe three
existing builtins, but it's not clear to me exactly what you want.

[Bug target/90453] PowerPC/AltiVec VSX: Provide vec_pack/vec_unpackh/vec_unpackl for 32<->64

2019-05-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90453

Segher Boessenkool  changed:

   What|Removed |Added

 Target|powerpc |powerpc*-*-*
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-05-18
 CC||segher at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
What does "32<->64" mean?

[Bug target/90513] asm thunks do not work on PowerPC64/VxWorks (kernel mode)

2019-05-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90513

--- Comment #6 from Segher Boessenkool  ---
Confirmed.  We have for the thunk

.set.LTHUNK0,_ZN12Intermediate1vEv


.align 2
.p2align 4,,15
.globl _ZThn8_N12Intermediate1vEv
.type   _ZThn8_N12Intermediate1vEv, @function
_ZThn8_N12Intermediate1vEv:
.LFB27:
.cfi_startproc
.LCF2:
0:  addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
.localentry _ZThn8_N12Intermediate1vEv,.-_ZThn8_N12Intermediate1vEv
addi 3,3,-8
b .LTHUNK0
.cfi_endproc
.LFE27:
.size   _ZThn8_N12Intermediate1vEv,.-_ZThn8_N12Intermediate1vEv


so this will not work unless the jump is optimised by the loader to jump to the
local entry point.  The compiler should not require the loader to do this.

[Bug c/90476] prepossessor should error if #line 0

2019-05-14 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90476

--- Comment #3 from Segher Boessenkool  ---
Where is it documented (in GCC). then?  I can't find it.

[Bug c/89410] [7/8 Regression] ICE in calculate_line_spans, at diagnostic-show-locus.c:1237 after #line

2019-05-14 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89410

--- Comment #24 from Segher Boessenkool  ---
(In reply to Jonny Grant from comment #23)
> Would it be better if I created a separate PR for this?  #line 0  ?

Yes please, it's a separate issue, and will get lost here.  Thanks.

[Bug bootstrap/90418] [10 Regression] powerpc-darwin9 bootstrap fails after r271013

2019-05-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90418

--- Comment #8 from Segher Boessenkool  ---
It also fails all over on powerpc-linux.  Pretty much all targets just
do something like

  /* Extra stack adjustment for exception handler return.  */
  if (crtl->calls_eh_return)
emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
   EH_RETURN_STACKADJ_RTX));

  /* Now we can return.  */
  emit_jump_insn (gen_simple_return ());


A fix should be target-independent, or it should fix all targets.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #7 from Segher Boessenkool  ---
From the combine dump of without_sel:

Trying 8, 9 -> 10:
8: r127:V4SI=r124:V4SI^r131:V4SI
  REG_DEAD r131:V4SI
9: r122:V4SI=r127:V4SI:V4SI
  REG_DEAD r130:V4SI
  REG_DEAD r127:V4SI
   10: r128:V4SI=r124:V4SI^r122:V4SI
  REG_DEAD r124:V4SI
  REG_DEAD r122:V4SI
Failed to match this instruction:
(set (reg:V4SI 128 [ l ])
(xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 124 [ l ])
(reg:V4SI 131))
(reg:V4SI 130))
(reg/v:V4SI 124 [ l ])))



That's not canonical form on RTL, and it's not a useful form either.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

Segher Boessenkool  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #6 from Segher Boessenkool  ---
But we should translate the xor/and/xor back to something saner.

Thanks for the testcase!

[Bug target/90363] or1k: Extra mask insn after load from memory

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90363

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #1 from Segher Boessenkool  ---
Trying 13 -> 14:
   13: r51:QI=[r50:SI+low(`*.LANCHOR0')]
  REG_DEAD r50:SI
   14: r43:SI=zero_extend(r51:QI)
  REG_DEAD r51:QI
Failed to match this instruction:
(set (reg:SI 43 [ g_doswap.0_2+-3 ])
(zero_extend:SI (mem/v/c:QI (lo_sum:SI (reg/f:SI 50)
(symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) [0 g_doswap+0 S1
A8])))

The mem arg in that does not match nonimmediate_operand, since it is const.
You want something like reg_or_mem_operand.

[Bug target/90323] powerpc should convert equivalent sequences to vec_sel()

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

Segher Boessenkool  changed:

   What|Removed |Added

 Target|powerpc |powerpc*-*-*
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-05-06
 CC||segher at gcc dot gnu.org
Version|8.3.0   |10.0
Summary|ppc should convert  |powerpc should convert
   |equivalent sequences to |equivalent sequences to
   |vec_sel()   |vec_sel()
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
Please provide a compilable testcase?  With flags, expected code, and
what you see instead?

[Bug target/71390] PowerPC GCC should warn if use does -mcpu=, and an old assembler was used

2019-04-30 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71390

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #2 from Segher Boessenkool  ---
Most HAVE_AS_* were deleted (in r264675).  [ Only HAVE_AS_MFCRF is left, and
we should get rid of that as well...  We shouldn't use the two-argument mfcr
at all, this is mfocrf in modern lingo. ]

Problems like PR70957 can now not happen at all.  Instead, when the compiler
generates code the assembler does not like, the user gets an error.  So just
don't use too old assemblers!

Closing this PR as WONTFIX.

[Bug target/65342] [7/8/9/10 Regression] FAIL: gfortran.dg/intrinsic_(un)?pack_1.f90 -O1 execution test on powerpc-apple-darwin9/10 after r210201

2019-04-30 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65342

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #24 from Segher Boessenkool  ---
(In reply to Alan Modra from comment #10)
> > permitted? (i.e. modifying %1, which is an input operand)
> 
> Yes.  You're outputting assembly, practically anything goes.

But the generate machine code will modify that reg, while the compiler
does not know.

[Bug rtl-optimization/89721] __builtin_mffs sometimes optimized away

2019-04-30 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89721

--- Comment #5 from Segher Boessenkool  ---
No, it needs backports.  Thanks for reminding me!

[Bug c/90036] [8/9/10 Regression] false positive: directive argument is null [-Werror=format-overflow=]

2019-04-29 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90036

--- Comment #6 from Segher Boessenkool  ---
(In reply to Martin Sebor from comment #5)
> A conversion specification is what follows the % character (i.e., just the
> 's' in in something like "%3s", with the 's' being called a conversion
> specifier).

7.21.6.1/4.  's' is the "conversion specifier character", but the whole
thing is the "conversion specification", including the percent sign.

/3 says a directive is a conversion specification or an ordinary character,
so imho it isn't great to refer to directives in the warning (also it's the
first time I heard it called that; I hazard I'm not the only one.

> The use of plain here null comes -Wnonnull: null argument where non-null
> required.  I don't see that as a problem but I also wouldn't have an issue
> with changing both to "null pointer" (like -Wformat prints) just as long as
> it's done consistently.

Right.  There are two goals to warnings:

1) They should be *correct*;
2) they should be helpful.

Sometimes these two bite each other.  Rephrasing can help sometimes.

[Bug rtl-optimization/90249] [9/10 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2019-04-29 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

--- Comment #6 from Segher Boessenkool  ---
(In reply to Wilco from comment #5)
> (In reply to Segher Boessenkool from comment #4)
> > That is code *size*.  Code size is expected to grow a tiny bit, because of
> > *better* register allocation.
> > 
> > But we could not do make_more_copies at -Os, if that helps?  (The hard
> > register
> > changes themselves are required for correctness).
> 
> Better register allocation implies lower average codesize due to fewer
> spills, fewer callee-saves, fewer moves etc.

That depends on the case.  And we are dealing with a quite specialised case
here.

> I still don't understand what specific problem make_more_copies is trying to
> solve. Is it trying to do life-range splitting of argument registers?

Nope.  It is simply that before the hard-reg change we very often combined the
argument register moves with other insns to something a different form than
those other insns, importantly when we can do this because we know how those
values are extended, etc.  make_more_copies simply inserts another reg-reg move
so that that new move can do this instead, since we no longer combine the hard
register move.  Without this we get a lot of actual code quality regressions.

[Bug rtl-optimization/90249] [9/10 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2019-04-29 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

--- Comment #4 from Segher Boessenkool  ---
That is code *size*.  Code size is expected to grow a tiny bit, because of
*better* register allocation.

But we could not do make_more_copies at -Os, if that helps?  (The hard register
changes themselves are required for correctness).

[Bug c/90036] [8/9/10 Regression] false positive: directive argument is null [-Werror=format-overflow=]

2019-04-27 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90036

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
x.cpp:22:11: error: ā€˜%sā€™ directive argument is null [-Werror=format-overflow=]
   22 | printf("%s = %s\n", [0], [0]); // warning
  | ~~^

Side issue...  It's not clear what a "directive" or a "directive argument"
is here, without guessing.  "%s" is called a "conversion specifier".  "null"
is not a defined anything either, and the highlight should ideally be on the
argument, with maybe an extra info one for the conversion spec.

So maybe something like

x.cpp:22:11: error: argument to ā€˜%sā€™ is a null pointer
[-Werror=format-overflow=]
   22 | printf("%s = %s\n", [0], [0]); // warning
  | ^~  ^~

[Bug target/87213] ICE in final_scan_insn_1, at final.c:3070

2019-04-26 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87213

Segher Boessenkool  changed:

   What|Removed |Added

 Target||powerpc*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-26
  Known to work||7.0, 9.0
   Host|powerpc64le-linux-gnu   |
 Ever confirmed|0   |1
  Known to fail||8.0

--- Comment #1 from Segher Boessenkool  ---
It does not fail on GCC 7 or GCC 9, but it does still fail like this on GCC 8.
Confirmed.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #14 from Segher Boessenkool  ---
I committed as r270601, on gcc-9-branch

2019-04-26  Segher Boessenkool  

PR other/90257
Revert the revert:
2019-04-21  H.J. Lu  

PR target/90178
Revert:
2018-11-21  Uros Bizjak  

Revert the revert:
2013-10-26  Vladimir Makarov  

Revert:
2013-10-25  Vladimir Makarov  

* lra-spills.c (lra_final_code_change): Remove useless move insns.


so this is okay for rs6000 on GCC 9 for now.

[Bug other/90257] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

Segher Boessenkool  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug other/90257] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #1 from Segher Boessenkool  ---
This patch from a few days ago craters our specint scores by a few percent.

I'm marking this P1.

[Bug rtl-optimization/90249] [9 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2019-04-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249

Segher Boessenkool  changed:

   What|Removed |Added

Summary|[9/10 Regression] Code size |[9 Regression] Code size
   |regression on thumb2 due to |regression on thumb2 due to
   |sub-optimal register|sub-optimal register
   |allocation starting with|allocation starting with
   |r265398 |r265398

--- Comment #2 from Segher Boessenkool  ---
What difference is there on some code of significant size?  Do you see
regressions then?

Of course there are some tiny examples where it now does worse, just like
there are examples where it now does better.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #59 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #58)
> If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be
> still useful without it (does it ever trigger say on the kernel where it
> didn't trigger before)?

The patch in comment 44 is obviously good, it improves the size by 0.090%
as noted (this is a kernel build, multi_v5_defconfig iirc).

I'd say it is perfectly safe for GCC 9, but I'm not an Arm maintainer :-)

[Bug inline-asm/90181] Feature request: provide a way to explicitly select specific named registers in constraints

2019-04-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90181

--- Comment #7 from Segher Boessenkool  ---
(In reply to nfxjfg from comment #6)
> Yes, it's clear that that the constraint can't be _just_ the register name,
> since they'll clash with builtin constraints now or with future
> architectures (which may add arbitrary register names). The proposed
> "*registername" is pretty nice, though. Having this would be great.

Hrm, "*" already has a meaning with current GCC (it essentially is ignored
in inline asm)...  It might be better to have some new syntax that gives an
error with older GCC.

> I didn't find a RISC-V builtin for ecall (maybe I looked in the wrong
> place). That wouldbn't be sufficient anyway.

Right, you would need a builtin for every calling convention for syscalls.
The aren't too many of those though?

> Another situation where I
> wanted to specify many fixed register constraints was a piece of inline code
> that did some syscalls without touching the stack (it needed all inputs as
> registers, and in specific registers, and have some registers for free use
> by the asm code itself).

A biggish piece of asm like that might be better as actual assembler code
than as inline asm, you may want to consider that?

[Bug c/90167] invalid example in GCC documentation wrt. effective type rules

2019-04-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90167

--- Comment #3 from Segher Boessenkool  ---
But you are not accessing as the union type.  You do the access with the
type of one of its members.  And that is UB.

The part of the standard you quote is about things like

union a_union f(double *p) { return *(union a_union *)p; }

[Bug tree-optimization/89847] Simplify subexpressions of % constant

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89847

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-22
 CC||segher at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
They didn't test the right targets ;-)

While for x86_64 you get

movl%edi, %eax
sall$5, %eax
subl%edi, %eax
addl$27961, %eax
andl$15, %eax
ret

and for aarch64 you get

lsl w1, w0, 5
sub w0, w1, w0
mov w1, 27961
add w0, w0, w1
and w0, w0, 15
ret

for sparc{,64} you get

sethi   %hi(27648), %g1
or  %g1, 313, %g1
sub %g1, %o0, %o0
jmp %o7+8
 and%o0, 15, %o0

(the mul-by-31 was optimised away by combine).

While for 32-bit powerpc you get

mulli 3,3,31
addi 3,3,27961
rlwinm 3,3,0,28,31
blr

(if you don't set a modern -mcpu=, anyway), for powerpc64 you get

subfic 3,3,9
rldicl 3,3,0,60
blr

This again is done by combine:

Trying 10, 11 -> 12:
   10: r129:SI=r128:SI-r132:DI#4
  REG_DEAD r132:DI
  REG_DEAD r128:SI
   11: r130:SI=r129:SI+0x6d39
  REG_DEAD r129:SI
   12: r125:SI=r130:SI&0xf
  REG_DEAD r130:SI
Failed to match this instruction:
(set (reg:SI 125)
(and:SI (minus:SI (const_int 9 [0x9])
(subreg:SI (reg:DI 132) 4))
(const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 130)
(minus:SI (const_int 9 [0x9])
(subreg:SI (reg:DI 132) 4)))
Successfully matched this instruction:
(set (reg:SI 125)
(and:SI (reg:SI 130)
(const_int 15 [0xf])))


Ideally this would be done in gimple already, of course.  Combine cannot
handle this in general.

[Bug target/88474] Inline built-in hypot for -ffast-math

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88474

Segher Boessenkool  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||segher at gcc dot gnu.org
 Resolution|FIXED   |---

--- Comment #4 from Segher Boessenkool  ---
It isn't implemented for any other targets yet.  When I use __builtin_hypot
with -ffast-math (I tried on powerpc64-linux) I get a call to __hypot_finite,
instead of just three machine instructions, like e.g.

fmul 2,2,2
fmadd 1,1,1,2
fsqrt 1,1

which is what you get for

double hypot(double x, double y) { return __builtin_sqrt(x*x + y*y); }

Reopened.  (Or do you want this PR to be just for x87?  If so, why?)

[Bug tree-optimization/89811] uint32_t load is not recognized if shifts are done in a fixed-size loop

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89811

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
On PowerPC, for "bad" we get

addi 9,3,2
lbz 0,1(3)
lbz 3,0(3)
lhbrx 10,0,9
rlwimi 0,10,8,0,31-8
rlwimi 3,0,8,0,31-8
rldicl 3,3,0,32
blr

(BE -m64); it managed to recognise the top two bytes as a byte-reverse load,
but not the lower two.

(And yup, "loop" uses no byte-reverse at all.)

[Bug tree-optimization/89804] optimization opportunity: move variable from stack to register

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
That sounds not too hard to fix, no?

Expand should expand and not do all kinds of other things.  Also, doing this
optimisation in RTL is much harder to do than in gimple, I think.

[Bug c/89774] Add flag to force single precision

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Segher Boessenkool  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2019-04-22
 CC||segher at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=90070,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=22326
 Resolution|INVALID |---
 Ever confirmed|0   |1

--- Comment #9 from Segher Boessenkool  ---
We currently only do it for trivial cases, as the example in comment 6 shows
as well.  This is done during expand, which is the wrong place for it.

PR90070 is asking for better optimisation of this: do the operation in single
precision, and use single-precision constants, if this does not change the
result (or there is some -ffast-math option).

PR22326 is also closely related.  I don't think we can close any of these PRs
as a dup of another, they are all asking for slightly different things :-)

[Bug inline-asm/90181] Feature request: provide a way to explicitly select specific named registers in constraints

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90181

Segher Boessenkool  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2019-04-20
 CC||segher at gcc dot gnu.org
 Resolution|WONTFIX |---
 Ever confirmed|0   |1

--- Comment #4 from Segher Boessenkool  ---
(In reply to nfxjfg from comment #0)
> Currently, inline assembler constraints have no way to select an explicitly
> named register. Apparently you're supposed to use register variables. There
> is even text that register variables exist only for this use case.

That is not what it says.  Originally a local register variable lived in
the specified register always.  This quickly was found out to not really
work.  After many years it was finally documented as just not supported
for anything but assembler operands.

Hopefully it will actually do *only* this in the not too far future.  We
should be able to make (almost) all gotchas here magically disappear.

> For example, suppose you want to pass something through the register a7 on
> the RISC-V platform. You need to do:
> 
>   void call_ecall(size_t num)
>   {
> register size_t r_a7 __asm("a7") = num;
> __asm volatile("ecall" : : "r" (r_a7) : "memory");
>   }
> 
> This gets awkward fast. It adds a lot of extra noise if you have many
> registers to pass (the ecall instruction provides an example where this may
> be needed).

Does the riscv port not have a builtin for this?

> The semantics are also not entirely clear: will r_a7 occupy the a7 register
> for the entire function

It does not matter: you are only allowed to pass it to the asm, and nothing
else is defined behaviour.

>   void call_ecall(size_t num)
>   {
> __asm volatile("ecall" : : "a7" (num) : "memory");
>   }

Because a7 is not a constraint.  It also cannot *be* one, in general;
for example, many archs have a register "r0" but the constraint "r0"
already means something else.

So we need some new syntax for this.  I suggested "*a7" before.

Confirmed.  It's a reasonable request, and it is a feature that would make
GCC better, and isn't too hard to define or implement.  Reopening.

[Bug target/90193] [8/9 Regression] asm goto with TLS "m" input operand generates incorrect assembler in O1 and O2

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

--- Comment #8 from Segher Boessenkool  ---
(As Alexander said in comment 1...  I need to learn how to read some day).

[Bug target/90193] [8/9 Regression] asm goto with TLS "m" input operand generates incorrect assembler in O1 and O2

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

--- Comment #7 from Segher Boessenkool  ---
The same splitter is what causes the bb of the asm to be marked as
always falling through, which is why that non-fallthrough label is
eventually deleted.

[Bug target/90193] [8/9 Regression] asm goto with TLS "m" input operand generates incorrect assembler in O1 and O2

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

--- Comment #6 from Segher Boessenkool  ---
It emits an insn instead if a jump_insn in the asm, during split1, in

(define_split
  [(match_operand 0 "tls_address_pattern")]
  "TARGET_TLS_DIRECT_SEG_REFS"
  [(match_dup 0)]
  "operands[0] = ix86_rewrite_tls_address (operands[0]);")

[Bug target/90193] [8/9 Regression] asm goto with TLS "m" input operand generates incorrect assembler in O1 and O2

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
It actually ICEs if you have checking enabled.

[Bug c/90167] invalid example in GCC documentation wrt. effective type rules

2019-04-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90167

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||segher at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Segher Boessenkool  ---
The code accesses d, of type double, as an int.  That is not a compatible
type.  It does not matter how it got there, what pointer casts trickery
with unions it did.

You can access a union type as the type of any of its members.  But a double
is not a union type.

[Bug tree-optimization/88055] [9 regression] ICE in extract_insn, at recog.c:2305 on ppc64le

2019-04-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88055

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Segher Boessenkool  ---
Fixed.

[Bug tree-optimization/88055] [9 regression] ICE in extract_insn, at recog.c:2305 on ppc64le

2019-04-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88055

--- Comment #10 from Segher Boessenkool  ---
Author: segher
Date: Fri Apr 19 16:58:01 2019
New Revision: 270460

URL: https://gcc.gnu.org/viewcvs?rev=270460=gcc=rev
Log:
tree-call-cdce: If !HONOR_NANS do not make code with NaNs (PR88055)

If we don't HONOR_NANS we should not try to use any unordered
comparison results.  Best case those will just be optimized away;
realistically, they ICE.  For example, the rs6000 backend has some
code that specifically checks we never do this.


PR tree-optimization/88055
* tree-call-cdce.c (comparison_code_if_no_nans): New function.
(gen_one_condition): Use it if !HONOR_NANS.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-call-cdce.c

[Bug rtl-optimization/79405] [8/9/10 Regression] Infinite loop in fwprop

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79405

Segher Boessenkool  changed:

   What|Removed |Added

   Priority|P1  |P4

--- Comment #13 from Segher Boessenkool  ---
We have a fine workaround, committed ages ago, so this isn't a P1 anymore.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #54 from Segher Boessenkool  ---
(In reply to Wilco from comment #52)
> (In reply to Segher Boessenkool from comment #48)
> > With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
> > That does not fix this PR though :-/
> 
> But it does fix most of the codesize regression.

Yes, and it often creates *better* code, as far as I can see.

> The shrinkwrapping testcase
> seems a preexisting problem that was exposed by the combine changes,

It is.

> so it
> doesn't need to hold up the release. The regalloc change might fix
> addr-modes-float.c too.

I'd like to see the RA fix in GCC 9.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #53 from Segher Boessenkool  ---
(In reply to Richard Earnshaw from comment #51)
> In the more general case splitting this would produce worse code, not
> better, since then we'd end up with two instructions rather than one.

Sure, it _often_ is good to have it merged.  Quite clearly more often than
not it's good, so if you have to pick only one way, this is the way to go.

Hopefully we can do better though.  But not for stage 4, sure.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #50 from Segher Boessenkool  ---
The insn is

(insn 7 3 8 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 0 r0 [116])
(const_int 0 [0])))
(set (reg/v:SI 4 r4 [orig:112 a ] [112])
(reg:SI 0 r0 [116]))
]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
 (nil))

and that isn't split, and then prepare_shrink_wrap gives up on it.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #49 from Segher Boessenkool  ---
(In reply to Wilco from comment #47)
> (In reply to Segher Boessenkool from comment #46)
> > With all three patches together (Peter's, mine, Jakub's), I get a code size
> > increase of only 0.047%, much more acceptable.  Now looking what that diff
> > really *is* :-)
> 
> I think with Jakub's change you don't need to disable the movsi_compare0
> pattern in combine. If regalloc works as expected, it will get split into a
> compare so shrinkwrap can handle it.

prepare_shrink_wrap can not handle that.  prepare_shrink_wrap needs to be
improved for other reasons, of course.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #48 from Segher Boessenkool  ---
With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
That does not fix this PR though :-/

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #46 from Segher Boessenkool  ---
With all three patches together (Peter's, mine, Jakub's), I get a code size
increase of only 0.047%, much more acceptable.  Now looking what that diff
really *is* :-)

[Bug target/16798] PowerPC - Opportunity to use recording form instruction.

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16798

--- Comment #9 from Segher Boessenkool  ---
With all three patches together (Peter's, mine, Jakub's), I get a code size
increase of only 0.047%, much more acceptable.  Now looking what that diff
really *is* :-)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #42 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #40)
> The question is what the code size differences would be with those changes
> (i.e. how often does it help not to have *movsi_compare0 make RA decisions
> worse vs. how often we actually have those two instructions separated by
> other insns).

Yeah.  If someone writes patches adding the peepholes, I can test it, but I'm
no hero at writing peepholes, esp. for an arch I do not fully understand :-/

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #41 from Segher Boessenkool  ---
(In reply to Wilco from comment #38)
> Well the question really is what is bad about movsi_compare0 that could be
> easily fixed?

"Easily fixed"...  There is no such thing here.

Because it is a parallel everything has to work on the compare and the move
together.  Various things do not handle that, things that only handle simple
moves for example.  Like prepare_shrink_wrap in this testcase.  And for many
other things you have to split the parallel before you can do the transform
you want.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #39 from Segher Boessenkool  ---
On a linux kernel defconfig build it increases code size by 0.567%.
That seems a bit much :-(

The peephole only recognises

  mov rA,rB
  cmp rB,#0

and not

  mov rA,rB
  cmp rA,#0

or

  cmp rB,#0
  mov rA,rB

and we see a lot of the latter, after my patch anyway.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #37 from Segher Boessenkool  ---
Yes, it is a balancing act.  Which option works better?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #35 from Segher Boessenkool  ---
Peter's patch solves this particular problem, but not the PR unfortunately.

I finally understand Jakub's comment 30.  This patch solves the PR (also
without Peter's patch):

===
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..67dddb2 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0"
(const_int 0)))
(set (match_operand:SI 0 "s_register_operand" "=r,r")
(match_dup 1))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && reload_completed"
   "@
cmp%?\\t%0, #0
subs%?\\t%0, %1, #0"
===

[Bug c/89410] [7/8 Regression] ICE in calculate_line_spans, at diagnostic-show-locus.c:1237 after #line

2019-04-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89410

--- Comment #22 from Segher Boessenkool  ---
#line 0   isn't valid C code.  If it causes problems we should just
error on it (and perhaps even when it doesn't (yet) cause problems).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #31 from Segher Boessenkool  ---
It's how you do a parallel of a mov and a flags set, which of course you
can have before RA, and you want created by combine, typically.  Or do I
misunderstand the question?

(I though Arm have a "movs" op for this, btw?)

[Bug target/17108] Store with update not generated for a simple loop

2019-04-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108

--- Comment #9 from Segher Boessenkool  ---
Author: segher
Date: Wed Apr 17 09:45:57 2019
New Revision: 270407

URL: https://gcc.gnu.org/viewcvs?rev=270407=gcc=rev
Log:
rs6000: Improve the load/store-with-update patterns (PR17108)

Many of these patterns only worked in 32-bit mode, and some only worked
in 64-bit mode.  This patch makes these use Pmode, fixing the PR.  On
the other hand, the stack updates have to use the same mode for the
stack pointer as for the value stored, so let's simplify that a bit.

Many of these patterns pass the wrong mode to
avoiding_indexed_address_p (it should be the mode of the datum
accessed, not the mode of the pointer).

Finally, I merge some patterns into one (using iterators).


PR target/17108
* config/rs6000/rs6000.c (rs6000_split_multireg_move): Adjust pattern
name.
(rs6000_emit_allocate_stack_1): Simplify condition.  Adjust pattern
name.
* config/rs6000/rs6000.md (bits): Add entries for SF and DF.
(*movdi_update1): Use Pmode.
(movdi__update): Fix argument to avoiding_indexed_address_p.
(movdi__update_stack): Rename to ...
(movdi_update_stack): ... this.  Fix comment.  Change condition. Don't
use Pmode.
(*movsi_update1): Use Pmode.
(*movsi_update2): Use Pmode.
(movsi_update): Rename to ...
(movsi__update): ... this.  Use Pmode.
(movsi_update_stack): Fix condition.
(*movhi_update1): Use Pmode.  Fix argument to
avoiding_indexed_address_p.
(*movhi_update2): Ditto.
(*movhi_update3): Ditto.
(*movhi_update4): Ditto.
(*movqi_update1): Ditto.
(*movqi_update2): Ditto.
(*movqi_update3): Ditto.
(*movsf_update1, *movdf_update1): Merge, rename to...
(*mov_update1): This.  Use Pmode.  Fix argument to
avoiding_indexed_address_p.  Add "size" attribute.
(*movsf_update2, *movdf_update2): Merge, rename to...
(*mov_update2): This.  Ditto.
(*movsf_update3): Use Pmode.  Fix argument to
avoiding_indexed_address_p.
(*movsf_update4): Ditto.
(allocate_stack): Simplify condition.  Adjust pattern names.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.c
trunk/gcc/config/rs6000/rs6000.md

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #27 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #26)
> ;; a4(r117,l0) conflicts: a3(r112,l0)
> ;; total conflict hard regs:
> ;; conflict hard regs:
> 
> ;; a5(r116,l0) conflicts:  cp0:a0(r111)<->a4(r117)@330:move
>   cp1:a2(r114)<->a3(r112)@41:shuffle
>   cp2:a3(r112)<->a5(r116)@125:shuffle
>   pref0:a0(r111)<-hr0@2000
>   pref1:a4(r117)<-hr0@660
>   pref2:a5(r116)<-hr0@1000
>   regions=1, blocks=6, points=10
> allocnos=6 (big 0), copies=3, conflicts=0, ranges=6
> 
> Note: I'm assuming we're missing a \n after p116's empty conflicts above?

The code is

  fputs (" conflicts:", file);
  n = ALLOCNO_NUM_OBJECTS (a);
  for (i = 0; i < n; i++)
{
  ira_object_t obj = ALLOCNO_OBJECT (a, i);
  ira_object_t conflict_obj;
  ira_object_conflict_iterator oci;

  if (OBJECT_CONFLICT_ARRAY (obj) == NULL)
continue;
  [...]
}

and the

;; total conflict hard regs:

etc. prints are in that [...].

[Bug other/88790] No warning for misleading indentation

2019-04-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88790

--- Comment #4 from Segher Boessenkool  ---
(Yup, worked).

[Bug other/88790] No warning for misleading indentation

2019-04-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88790

Segher Boessenkool  changed:

   What|Removed |Added

 CC||daniel.marjamaki at gmail dot 
com

--- Comment #3 from Segher Boessenkool  ---
Let's try that...

[Bug middle-end/90070] Add optimization for optimizing small integer values by fp integral constant

2019-04-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90070

--- Comment #6 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #5)
> Oh that is PR 22326 

Indeed it is.  And your conclusion there ("we need some pass that does
this properly", instead of the current thing during expand) still holds,
too.  (How do you do this btw, remembering all PRs?! :-) )

[Bug rtl-optimization/89794] combine incorrectly forwards register value through auto-inc operation

2019-04-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89794

--- Comment #8 from Segher Boessenkool  ---
Author: segher
Date: Mon Apr 15 11:33:29 2019
New Revision: 270368

URL: https://gcc.gnu.org/viewcvs?rev=270368=gcc=rev
Log:
combine: Count auto_inc properly (PR89794)

The code that checks if an auto-increment from i0 or i1 is not lost is
a bit shaky.  The code to check the same for i2 is non-existent, and
cannot be implemented in a similar way at all.  So, this patch counts
all auto-increments, and makes sure we end up with the same number as
we started with.  This works because we still have a check that we
will not duplicate any.

We should do this some better way, but not while we are in stage 4.


PR rtl-optimization/89794
* combine.c (count_auto_inc): New function.
(try_combine): Count how many auto_inc expressions there were in the
original instructions.  Ensure we have the same number in the new
instructions.  Remove the code that tried to ensure auto_inc side
effects on i1 and i0 are not lost.

gcc/testsuite/
PR rtl-optimization/89794
* gcc.dg/torture/pr89794.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr89794.c
Modified:
trunk/ChangeLog
trunk/gcc/combine.c
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/90070] Add optimization for optimizing small integer values by fp integral constant

2019-04-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90070

Segher Boessenkool  changed:

   What|Removed |Added

 Target|powerpc64le-gnu-linux,  |powerpc*-*-*
   |powerpc64-gnu-linux |
   Host|powerpc64le-gnu-linux,  |
   |powerpc64-gnu-linux |
  Build|powerpc64le-gnu-linux,  |
   |powerpc64-gnu-linux |

--- Comment #4 from Segher Boessenkool  ---
You'll have a crossing anyway (it is y+5*x with x an integer and y a float),
but a single fma is faster than doing the mul as integer, almost everywhere.

When we write e.g.

float f(float x) { return 5.0 * x; }

GCC is smart enough to do the mul in single precision (although C says it is
double precision, and only later rounded to SP, the result is identical)"

addis 9,2,.LC0@toc@ha
lfs 0,.LC0@toc@l(9)
fmuls 1,1,0
blr

but for

float f(float x, float y) { return 5.0*x + y; }

it does not (and AFAICS it gives identical results here, too, even without
-ffast-math, which makes no difference currently):

addis 9,2,.LC1@toc@ha
lfd 0,.LC1@toc@l(9)
fmadd 1,1,0,2
frsp 1,1
blr

[Bug target/17108] Store with update not generated for a simple loop

2019-04-12 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #8 from Segher Boessenkool  ---
We currently generate (for -O2 -m64, -O3 unrolls it completely, see comment 7)

li 9,8
mtctr 9
.p2align 4,,15
.L2:
stfs 1,0(3)
addi 3,3,4
bdnz .L2
blr



and for -m32 we get

li 9,8
addi 3,3,-4
mtctr 9
.p2align 4,,15
.L2:
stfsu 1,4(3)
bdnz .L2
blr




The difference is partly the selected -mcpu=, but that doesn't explain it
completely.

The gimple passes (probably ivopts) have decided to do a pre_inc here; all
differences are at RTL level.  Except for -mcpu=power9 they didn't.

A case where it works as expected, -O2 -m32 -mcpu=power4, the auto_inc_dec
pass does not help (this is caused by rtx_cost issues):

starting bb 3
   11: [r122:SI]=r127:SF
   11: [r122:SI]=r127:SF
found mem(11) *(r[122]+0)
   10: r122:SI=r122:SI+0x4
   10: r122:SI=r122:SI+0x4
found pre inc(10) r[122]+=4
   11: [r122:SI]=r127:SF
found mem(11) *(r[122]+0)
trying SIMPLE_PRE_INC
cost failure old=16 new=408

(I have a patch for that).



but then combine comes along and does

Trying 10 -> 11:
   10: r122:SI=r122:SI+0x4
   11: [r122:SI]=r127:SF
Successfully matched this instruction:
(parallel [
(set (mem:SF (plus:SI (reg:SI 122 [ ivtmp.10 ])
(const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4
A32])
(reg/v:SF 127 [ d ]))
(set (reg:SI 122 [ ivtmp.10 ])
(plus:SI (reg:SI 122 [ ivtmp.10 ])
(const_int 4 [0x4])))
])
allowing combination of insns 10 and 11
original costs 4 + 4 = 8
replacement cost 4



-m64 however says

Trying 10 -> 11:
   10: r122:DI=r122:DI+0x4
   11: [r122:DI]=r127:SF
Failed to match this instruction:
(parallel [
(set (mem:SF (plus:DI (reg:DI 122 [ ivtmp.11 ])
(const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4
A32])
(reg/v:SF 127 [ d ]))
(set (reg:DI 122 [ ivtmp.11 ])
(plus:DI (reg:DI 122 [ ivtmp.11 ])
(const_int 4 [0x4])))
])



Oh dear, we do not have the float load/store-with-update instructions for -m64.
On all modern 64-bit CPUs these are cracked, so they execute the same as the
separate addi and store instructions, but it costs code space.  And if we do
not want them we should make them more expensive, not just pretend the insns
do not exist :-)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #23 from Segher Boessenkool  ---
It says (I added some debug)

   Insn 50(l0): point = 27
ignoring for conflicts:
(reg:SI 0 r0 [ a ])

but non_conflicting_reg_copy_p isn't called at all where it is improving
the allocation

[Bug middle-end/90070] Add optimization for optimizing small integer values by fp integral constant

2019-04-12 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90070

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-12
 Ever confirmed|0   |1

--- Comment #2 from Segher Boessenkool  ---
Another way to do it is as an FMA in float mode, which should be faster
everywhere (everywhere that has FMA).  Current GCC doesn't do that either,
not if you write 5* (it does a mulli), nor if you write 5.0* (it does the
calculation in double precision, and rounds to single precision afterwards;
it would give the exact same result if it did the calculation in single
precision directly, afaics, bot when using FMA and when not).

Confirmed.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #16 from Segher Boessenkool  ---
(Which would make insn 50 go away, if you prefer to look at it that way).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #15 from Segher Boessenkool  ---
  Forming thread by copy 0:a0r111-a4r117 (freq=500):
Result (freq=3500): a0r111(2500) a4r117(1000)
  Forming thread by copy 2:a3r112-a5r116 (freq=125):
Result (freq=4500): a3r112(1500) a5r116(3000)
  Forming thread by copy 1:a2r114-a3r112 (freq=62):
Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000)
  Pushing a1(r113,l0)(cost 0)
  Pushing a4(r117,l0)(cost 0)
  Pushing a0(r111,l0)(cost 0)
  Pushing a2(r114,l0)(cost 0)
  Pushing a3(r112,l0)(cost 0)
  Pushing a5(r116,l0)(cost 0)
  Popping a5(r116,l0)  -- assign reg 3
  Popping a3(r112,l0)  -- assign reg 4
  Popping a2(r114,l0)  -- assign reg 3
  Popping a0(r111,l0)  -- assign reg 0
  Popping a4(r117,l0)  -- assign reg 0
  Popping a1(r113,l0)  -- assign reg 2
Assigning 4 to a5r116
Disposition:
0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
5:r116 l0 44:r117 l0 0


r116 does not conflict with *any* other pseudo.  It is alive in the first
two insns of the function, which are

(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181
{*arm_movsi_insn}
 (nil))
(insn 7 50 8 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 116)
(const_int 0 [0])))
(set (reg/v:SI 112 [ a ])
(reg:SI 116))
]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
 (expr_list:REG_DEAD (reg:SI 116)
(nil)))

r0 _is_ used by a successor (as the argument for the call to foo), but we
could use r0 for r116 anyway, since what we assign to it is r0 :-)

[Bug target/88055] ICE in extract_insn, at recog.c:2305 on ppc64le

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88055

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #9 from Segher Boessenkool  ---
I have a patch.

[Bug target/89271] [9 Regression] gcc.target/powerpc/vsx-simode2.c stopped working in GCC 9

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89271

--- Comment #20 from Segher Boessenkool  ---
I currently get (on BE; the testcase forces -mcpu=power8):

std 3,-16(1)
addi 9,1,-12
lxsiwzx 32,0,9
#APP
 # 10 "vsx-simode2.c" 1
xxlor 32,32,32  # v, v constraints
 # 0 "" 2
#NO_APP
mfvsrwz 3,32
blr

so yes this test _should_ fail, we do the wrong thing.  NO_REGS is
chosen for this reg class, no vector class is considered.

[Bug target/84369] test case gcc.dg/sms-10.c fails on power9

2019-04-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pthaugen at linux dot 
ibm.com

--- Comment #4 from Segher Boessenkool  ---
power9.md has

(define_insn_reservation "power9-store" 0

whereas pretty much everything else has a non-zero number here.  This number
is only for true dependences, so read-after-write, so 0 does not make super
much sense anyway.

Assigning this to Pat.  Pat, feel free to kick it back to me, or to whoever
else you want ;-)

[Bug rtl-optimization/90007] [9 Regression] ICE in extract_constrain_insn_cached, at recog.c:2223

2019-04-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

--- Comment #7 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #3)
> Why does sel-sched try to propagate hard registers into insns before RA? 
> The whole point of the combiner changes was not to do that, so that the RA
> can do better job.

That, and *correctness*.  Propagating hard registers can lead to things that
cannot be reloaded.  Even in the simple case here you cannot necessarily
replace the hard reg with a pseudo and end up with valid code.

[Bug target/89794] combine incorrectly forwards register value through auto-inc operation

2019-04-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89794

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #7 from Segher Boessenkool  ---
I have a patch.

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-09 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #44 from Segher Boessenkool  ---
(In reply to Jeffrey A. Law from comment #43)
> The problem with your suggestions Segher is that we'd have to do them for
> every target which defines insns with a zero_extract destination and that's
> been the well understood way to handle this stuff for over 2 decades.

It has only worked in some cases and not in others, for all of those decades.
And what cases those are exactly changes with the phase of the moon, well, with
any otherwise irrelevant change.

This is part of the reason why rs6000 doesn't have insv patterns any more,
btw (since r226005).  (The other part is that our rl*imi insns can only in
very limited cases be described with insv).

> Improving combine avoids that problem.

Sure, but combine just gives up for RMW insns in many cases (and it has to).
Some other passes do the same thing, I would think?  Using the same pseudo
for two things causes problems.

> Of course we have to balance the
> pros/cons of any patch in that space as well which is hard to do without an
> official patch to evaluate.  What I've got is just proof of concept for the
> most common case, but it does show some promise.

Oh, I'm not against any such patch /per se/, if it is safe and suitable for
stage 4, and an improvement (not a regression for some targets), I'll okay
it of course.  

> Also note that Steve's patch just addresses combine_bfi IIUC.  My POC
> addresses insv_?.c as well as the existing combine_bfi test (but I haven't
> tested it against the deeper tests in Steve's patch.

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #42 from Segher Boessenkool  ---
The "movk" failures...  This is from insv_1.c:

Trying 7, 6 -> 8:
7: r95:DI=0x1d6b
6: r93:DI=r97:DI&0x
  REG_DEAD r97:DI
8: r94:DI=r93:DI|r95:DI
  REG_DEAD r95:DI
  REG_DEAD r93:DI
  REG_EQUAL r93:DI|0x1d6b
Failed to match this instruction:
(set (reg:DI 94)
(ior:DI (and:DI (reg:DI 97)
(const_int -281470681743361 [0x]))
(const_int 32345398706176 [0x1d6b])))
Successfully matched this instruction:
(set (reg:DI 95)
(and:DI (reg:DI 97)
(const_int -281470681743361 [0x])))
Failed to match this instruction:
(set (reg:DI 94)
(ior:DI (reg:DI 95)
(const_int 32345398706176 [0x1d6b])))

It should have matched what it originally cam up with, afaics?  This is
exactly what movk does?  (Don't rely on the input and output regs to agree,
like with insv; that only happens by chance.  Instead, use separate operands,
with "0" constraint, etc.)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #12 from Segher Boessenkool  ---
(In reply to Segher Boessenkool from comment #11)
> (In reply to Wilco from comment #8)
> > mov r4, r0
> > cmp r4, #0
> 
> Why does it copy r0 to r4 and then compare r4?  On more modern machines it
> is faster to compare r0 itself, and it would allow shrink-wrapping to work
> fine here

We get this in combine:

Trying 2 -> 7:
2: r112:SI=r116:SI
  REG_DEAD r116:SI
7: cc:CC=cmp(r112:SI,0)
Successfully matched this instruction:
(parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 116)
(const_int 0 [0])))
(set (reg/v:SI 112 [ a ])
(reg:SI 116))
])

(that's *movsi_compare0).


This is preceded by

(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 179
{*arm_movsi_insn}
 (nil))


And it stays that way until IRA, which does

Disposition:
0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
5:r116 l0 44:r117 l0 0

If r116 had been allocated hard reg 0 all would be fine (and we know r116
dies in insn 7 already, there is a REG_DEAD note on it).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #11 from Segher Boessenkool  ---
(In reply to Wilco from comment #8)
>   push{r4, lr}
>   mov r4, r0
>   cmp r4, #0

Why does it copy r0 to r4 and then compare r4?  On more modern machines it
is faster to compare r0 itself, and it would allow shrink-wrapping to work
fine here (well, need to move the assignment to r4 down to the block where
it is used, but something will certainly do that, and it is one of the
shrink-wrapping improvements I want to do for GCC 10).

> It seems shrinkwrapping is more random, sometimes it's done as expected,
> sometimes it is not. It was more consistent on older GCC's.

Shrink-wrapping is very predictable.  But no block where a non-volatile
register is used or set will get shrink-wrapped.  This limitation has
existed since forever.

[Bug rtl-optimization/80960] [7/8/9 Regression] Huge memory use when compiling a very large test case

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #19 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #18)
> Hmm, so if we'd have numbered stmts in an EBB we could check the
> distance between set and use and not combine when that gets too big?

Yeah.  Or we could even not make a LOG_LINK in the first place between
statements that are too far apart.

> > Combine also makes garbage for every try, and none of that is cleaned
> > up during combine.  Maybe we should change that?  (I can try next week).
> 
> Not sure how easy that is but yes, it might help quite a bit due
> to less churn on the cache.  Just ggc_free()ing the "toplevel"
> RTX of failed attempts might already help a bit.  It's of course
> kind-of a hack then but with an appropriate comment it would be
> fine I guess (recursively ggc_free()ing might run into sharing
> issues so that probably won't work).

combine does not change anything *between* combination attempts, and
all attempts go via the same function (try_combine), so calling gcc_collect
should be fine.  Manually gcc_free'ing things would be a hack alright.

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #41 from Segher Boessenkool  ---
Seeing that the code in your examples can be expressed as a bitfield insert
requires that combine sees that only the low 8 bits of reg 93 can be non-zero,
by the way.  It usually does not know this.  It could in this case if it was
combining insn 6 as well.  Did it try that before?  What happened?

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #40 from Segher Boessenkool  ---
You'll get much better results if you don't use insv in your machine
description; writing it with the input and output separate (and then
using a "0" constraint) is much friendlier to the optimisers.

[Bug rtl-optimization/80960] [7/8/9 Regression] Huge memory use when compiling a very large test case

2019-04-03 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #17 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #16)

> I would guess so.  I wonder if the combiner could restrict itself
> here?  Maybe LUID "distances" are an approximation?  Doesn't the
> combiner use DF uses so the number of combinations shouldn't increase
> with basic-block size but only with the number of uses?  Of course
> since we don't have SSA the uses probably include those that cross
> other defs...

Combine doesn't try too many pairs: it tries every def only with its
first use, so that is linear in # insns.  But the check if a combination
is valid uses reg_used_between_p, which is no good for insns a big
distance apart.

> That said, algorithmically first building up a kind-of-SSA to
> restrict things combine will try might help to avoid this kind
> of issues.

Yup.  Not going to happen in stage4, of course :-/

There are a few other things which aren't linear, but this is the
worst one (the rest only happens occasionally, or only on successful
combinations).

> Since combine does a df_analyze we should have a way to record
> the number of insns in a block without another IL walk, it could
> also fall back to 2->1 and 2->2 insn combinations after visiting
> a new PARAM max-combine-bb-insns-3-3 number of insns in an EBB.

The 3->1 (or 3->2) isn't really the problem; there just are many more
to try than 2->[12].

> Actually it already does two walks over the whole function in
> combine_instructions it seems, so recording # insns per EBB should
> be possible?  (if that's really the key metric causing the issue)

The average distance between a set and its first use is the key metric.
The numbers make it feel like that is pretty constrained here still
(I haven't run numbers on it), but 100 is very much already if there are
1M insns in the block (or whatever).  All numbers that aren't terrible,
but combines it takes up quite a chunk of time.

Combine also makes garbage for every try, and none of that is cleaned
up during combine.  Maybe we should change that?  (I can try next week).

[Bug rtl-optimization/86928] ICE in compute_live, at sel-sched.c:3097

2019-04-03 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928

--- Comment #6 from Segher Boessenkool  ---
(In reply to Alexander Monakov from comment #5)
> I didn't have any better ideas, so fixed via comment #2.

Thanks!

[Bug rtl-optimization/80960] [7/8/9 Regression] Huge memory use when compiling a very large test case

2019-04-02 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #15 from Segher Boessenkool  ---
It seems to be that this happens for huge basic blocks, and the combiner
tries to combine pairs of instructions that are far apart.  This is unlikely
to work often, and the cost is quadratic in # insns, the way checking where
a register is used works.

The param to do only 2->1 (and 2->2) combinations should help a lot, make
combine not take longer than the rest of the compiler does.  Does it?

[Bug inline-asm/87984] [7/8/9 Regression] wrong code for local reg var input to asm

2019-04-02 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87984

--- Comment #34 from Segher Boessenkool  ---
(In reply to rguent...@suse.de from comment #33)
> On Sat, 30 Mar 2019, segher at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87984
> > 
> > --- Comment #32 from Segher Boessenkool  ---
> > Historically, a local register asm variable *does* live in that variable
> > for its entire scope.  This stopped working correctly, even with the many
> > caveats there were for it, and many years ago the manual added language
> > saying that only using such a var in an extended asm in or out is supported,
> > and there was language warning you to keep the life time short, etc.
> > 
> > This did *not* change the implementation.  Any other use still is explicitly
> > unsupported, and all such testcases are invalid code.
> 
> Hmm, but that means the only effect of a local reg var would be
> implicit input/output constraints, right?

Explicit.  Yes.  That is the documented only supported use.  But it is not
currently the only thing it *does*.

> Of course there's also
> calls(?) that would need to remat all local register vars.
> 
> The asm part could be easily represented on GIMPLE by making those
> constraints explicit.  The call issue would need explicit save/restore
> code which is then exposed to optimization passes.
> 
> But then...
> 
> > It would be nice if GCC was changed such that such vars were expanded to a
> > pseudo like any other var, and copies to/from a hard reg just around the 
> > asm.
> > Gimple doesn't need to do *anything* for that, just keep track that the var
> > is declared as local register var, and the gimple it had now at expand is
> > just fine:
> 
> ... all this could be done at RTL expansion time as well.

Yes, exactly.  Gimple could treat local register cars just like any other
pseudo.  Then at expand time, you copy it into its hard reg right before an
asm, and back out after it (maybe skip either if the var is not an input resp.
an output of the asm), and everything remat and lifetime etc. will work out
automatically.  Unless I am missing something.

But this is *not* what we currently do, and it is not what is documented, and
as far as I can see the testcase here is invalid code.

> Still in GIMPLE we'd have to treat calls at modifying/using
> local reg vars?  That leaves us with forcing of virtual operands
> on all calls eventually using those vars.

I think it will all work out fine without treating local register var any
different from any other local variable.

[Bug inline-asm/87984] [7/8/9 Regression] wrong code for local reg var input to asm

2019-03-30 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87984

--- Comment #32 from Segher Boessenkool  ---
Historically, a local register asm variable *does* live in that variable
for its entire scope.  This stopped working correctly, even with the many
caveats there were for it, and many years ago the manual added language
saying that only using such a var in an extended asm in or out is supported,
and there was language warning you to keep the life time short, etc.

This did *not* change the implementation.  Any other use still is explicitly
unsupported, and all such testcases are invalid code.

It would be nice if GCC was changed such that such vars were expanded to a
pseudo like any other var, and copies to/from a hard reg just around the asm.
Gimple doesn't need to do *anything* for that, just keep track that the var
is declared as local register var, and the gimple it had now at expand is
just fine:

===
f ()
{
  register int a __asm__ (*eax);
  int o;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  a = 1;
  __asm__("add %1, %0" : "=g" o_14 : "r" a, "0" 0);
  clear_eax ();
  __asm__("add %1, %0" : "=g" o_21 : "r" a, "0" o_14);
  clear_eax ();
  __asm__("add %1, %0" : "=g" o_28 : "r" a, "0" o_21);
  clear_eax ();
  return o_28;
;;succ:   EXIT

}
===

But currently "a" is expanded as a hard reg, not a pseudo, and the code does
not
do what you want at all.  As the manual tells you.

===
;; Generating RTL for gimple basic block 2

;; a = 1;

(insn 5 4 0 (set (reg/v:SI 0 ax [ a ])
(const_int 1 [0x1])) "cax.c":6:18 -1
 (nil))
===

(and it gets worse after that).

[Bug inline-asm/87984] [7/8/9 Regression] wrong code for local reg var input to asm

2019-03-29 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87984

--- Comment #31 from Segher Boessenkool  ---
If an asm makes a function non-pure, that asm should be volatile in the
first place?  Or are there any cases where that is not true?

[Bug c/83855] [performance] Improve cse optimization for insn with inout ops

2019-03-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83855

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||segher at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #3 from Segher Boessenkool  ---
The internals manual explains this:

Note that @code{match_dup} should not be used to tell the compiler that
a particular register is being used for two operands (example:
@code{add} that adds one register to another; the second register is
both an input operand and the output operand).  Use a matching
constraint (@pxref{Simple Constraints}) for those.  @code{match_dup} is for the
cases where one
operand is used in two places in the template, such as an instruction
that computes both a quotient and a remainder, where the opcode takes
two input operands but the RTL template has to refer to each of those
twice; once for the quotient pattern and once for the remainder pattern.

[Bug target/89776] sse-movmskb-1.c testcase fails on PPC64 BE 32 bit Power8

2019-03-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89776

Segher Boessenkool  changed:

   What|Removed |Added

 Target|powerpc-*-*-*   |powerpc*-*-*
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Segher Boessenkool  ---
Fixed (on trunk, this code is not on 8 (yet)).

[Bug sanitizer/82501] AddressSanitizer does not handle negative offset for first global variable

2019-03-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82501

--- Comment #28 from Segher Boessenkool  ---
Patches should go to gcc-patches@.  That is where reviews happen, too.

[Bug target/88055] ICE in extract_insn, at recog.c:2305 on ppc64le

2019-03-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88055

Segher Boessenkool  changed:

   What|Removed |Added

   Priority|P2  |P1

--- Comment #8 from Segher Boessenkool  ---
Moving this to P1.  See comment 4.

[Bug target/89746] powerpc-none-eabi-gcc emits code using stfiwx to misaligned address

2019-03-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89746

--- Comment #7 from Segher Boessenkool  ---
Author: segher
Date: Tue Mar 19 16:58:42 2019
New Revision: 269802

URL: https://gcc.gnu.org/viewcvs?rev=269802=gcc=rev
Log:
rs6000: Unaligned stfiwx on older CPUs (PR89746)

The "classic" PowerPCs (6xx/7xx) are not STRICT_ALIGNMENT, but their
floating point units are.  This is not normally a problem, the ABIs
make everything FP aligned.  The RTL patterns converting FP to integer
however get a potentially unaligned destination, and we do not want to
do an stfiwx on that on such older CPUs.

This fixes it.  It does not change anything for TARGET_MFCRF targets
(POWER4 and later).  It also won't change anything for strict-alignment
targets, or CPUs without hardware FP of course, or CPUs that do not
implement stfiwx (older 4xx/5xx/8xx).

It does not change the corresponding fixuns* pattern, because that can
not be enabled on any CPU that cannot handle unaligned FP well.


PR target/89746
* config/rs6000/rs6000.md (fix_truncsi2_stfiwx): If we have a
non-TARGET_MFCRF target, and the dest is memory but not 32-bit aligned,
go via a stack temporary.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.md

[Bug target/89736] New test pr87532-mc.c fails on compiler not defaulting to VSX

2019-03-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89736

--- Comment #2 from Segher Boessenkool  ---
I found this on a Power7 (maybe -m32, not sure).

Your patch is eerily like what I did to fix this in testing, but the comment
right below says it does not use -mvsx on purpose?

[Bug target/89746] powerpc-none-eabi-gcc emits code using stfiwx to misaligned address

2019-03-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89746

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #6 from Segher Boessenkool  ---
Created attachment 45987
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45987=edit
PoC

[Bug target/89746] powerpc-none-eabi-gcc emits code using stfiwx to misaligned address

2019-03-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89746

--- Comment #5 from Segher Boessenkool  ---
Yes, it is just a code quality issue.

I have the attached patch, and it works; it needs to be updated so that the
alignment check is only done for CPUs where it is needed.

[Bug target/89746] powerpc-none-eabi-gcc emits code using stfiwx to misaligned address

2019-03-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89746

--- Comment #2 from Segher Boessenkool  ---
(This is on a PowerPC 750).

The compiler makes an unaligned store for this, because it knows no better
than it is just a SImode store:

  d_5 = (int) f_4(D);
  _10 = (unsigned int) d_5;
  MEM[(short int *)p_7(D) + 6B] = _10;

and *normal* unaligned stores of SImode are just fine -- they just cause
an extra access if crossing an 8B boundary.  OTOH, floating point load/store
cause a misalignment interrupt if unaligned.

[Bug target/89736] New test pr87532-mc.c fails on compiler not defaulting to VSX

2019-03-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89736

Segher Boessenkool  changed:

   What|Removed |Added

 Target||powerpc*-*-*
   Priority|P3  |P5
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-03-16
   Assignee|unassigned at gcc dot gnu.org  |kelvin at gcc dot 
gnu.org
 Ever confirmed|0   |1

[Bug target/89736] New: New test pr87532-mc.c fails on compiler not defaulting to VSX

2019-03-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89736

Bug ID: 89736
   Summary: New test pr87532-mc.c fails on compiler not defaulting
to VSX
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

Hi Kelvin,

The test does not only run on VSX on purpose:

/* This test should run the same on any target that supports altivec/dfp
   instructions.  Intentionally not specifying cpu in order to test
   all code generation paths.  */

OTOH, it does use "vector long long", which requires VSX.  And vector int128.
DFP seems to be a red herring?

I'm not sure what best to do here; maybe cut the test in two?

[Bug rtl-optimization/89721] __builtin_mffs sometimes optimized away

2019-03-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89721

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

[Bug rtl-optimization/89721] __builtin_mffs sometimes optimized away

2019-03-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89721

--- Comment #3 from Segher Boessenkool  ---
Fixed on trunk so far.

[Bug rtl-optimization/89721] __builtin_mffs sometimes optimized away

2019-03-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89721

--- Comment #2 from Segher Boessenkool  ---
Author: segher
Date: Fri Mar 15 22:09:15 2019
New Revision: 269716

URL: https://gcc.gnu.org/viewcvs?rev=269716=gcc=rev
Log:
LRA: side_effects_p stmts' output is not invariant (PR89721)

PR89721 shows LRA treating an unspec_volatile's result as invariant,
which of course isn't correct.  This patch fixes it.


PR rtl-optimization/89721
* lra-constraints (invariant_p): Return false if side_effects_p holds.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c

  1   2   3   4   5   6   7   8   9   10   >