[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #46 from Jakub Jelinek  ---
Author: jakub
Date: Thu Feb 21 12:04:26 2019
New Revision: 269067

URL: https://gcc.gnu.org/viewcvs?rev=269067=gcc=rev
Log:
PR bootstrap/88714
* constraints.md (q): Remove.
* config/arm/ldrdstrd.md (*arm_ldrd, *arm_strd): Use rk constraint
instead of q.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/constraints.md
trunk/gcc/config/arm/ldrdstrd.md

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #45 from Jakub Jelinek  ---
Author: jakub
Date: Mon Feb 18 12:52:36 2019
New Revision: 268985

URL: https://gcc.gnu.org/viewcvs?rev=268985=gcc=rev
Log:
PR bootstrap/88714
* config/arm/arm.md (*arm_movdi, *movdf_soft_insn): Use "r" instead of
"q" constraint.
* config/arm/vfp.md (*movdi_vfp): Likewise.
* config/arm/ldrdstrd.md (*arm_ldrd, *arm_strd): Use "r" instead of
"q" constraint for operands[0].

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.md
trunk/gcc/config/arm/ldrdstrd.md
trunk/gcc/config/arm/vfp.md

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #44 from Jakub Jelinek  ---
Author: jakub
Date: Mon Feb 11 10:39:59 2019
New Revision: 268766

URL: https://gcc.gnu.org/viewcvs?rev=268766=gcc=rev
Log:
PR bootstrap/88714
* config/arm/ldrdstrd.md (*arm_ldrd, *arm_strd): Use q constraint
instead of r.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/ldrdstrd.md

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #43 from Jakub Jelinek  ---
Fixed.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-07 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #42 from Matthew Malcomson  ---
Author: matmal01
Date: Thu Feb  7 14:54:15 2019
New Revision: 268644

URL: https://gcc.gnu.org/viewcvs?rev=268644=gcc=rev
Log:
[Patch] [arm] Fix 88714, Arm LDRD/STRD peepholes.

These peepholes match a pair of SImode loads or stores that can be
implemented with a single LDRD or STRD instruction.
When compiling for TARGET_ARM, these peepholes originally created a set
pattern in DI mode to be caught by movdi patterns.

This approach failed to take into account the possibility that the two
matched insns operated on memory with different aliasing information.
The peepholes lost the aliasing information on one of the insns, which
could then cause the scheduler to make an invalid transformation.

This patch changes the peepholes so they generate a PARALLEL expression
of the two relevant loads or stores, which means the aliasing
information of both is kept.  Such a PARALLEL pattern is what the
peepholes currently produce for TARGET_THUMB2.

In order to match these new insn patterns, we add two new define_insn's.  These
define_insn's use the same checks as the peepholes to find valid insns.

Note that the patterns now created by the peepholes for LDRD and STRD
are very similar to those created by the peepholes for LDM and STM.
Many patterns could be matched by the LDM and STM define_insns, which
means we rely on the order the define_insn patterns are defined in the
machine description, with those for LDRD/STRD defined before those for
LDM/STM.

The difference between the peepholes for LDRD/STRD and those for LDM/STM
are mainly that those for LDRD/STRD have some logic to ensure that the
two registers are consecutive and the first one is even.

Bootstrapped and regtested on arm-none-linux-gnu.
Demonstrated fix of bug 88714 by bootstrapping on armv7l.


gcc/ChangeLog:

2019-02-07  Matthew Malcomson  
Jakub Jelinek  

PR bootstrap/88714
* config/arm/arm-protos.h (valid_operands_ldrd_strd,
arm_count_ldrdstrd_insns): New declarations.
* config/arm/arm.c (mem_ok_for_ldrd_strd): Remove broken handling of
MINUS.
(valid_operands_ldrd_strd): New function.
(arm_count_ldrdstrd_insns): New function.
* config/arm/ldrdstrd.md: Change peepholes to generate PARALLEL SImode
sets instead of single DImode set and define new insns to match this.

gcc/testsuite/ChangeLog:

2019-02-07  Matthew Malcomson  
Jakub Jelinek  

PR bootstrap/88714
* gcc.c-torture/execute/pr88714.c: New test.
* gcc.dg/rtl/arm/ldrd-peepholes.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/execute/pr88714.c
trunk/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm-protos.h
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/ldrdstrd.md
trunk/gcc/testsuite/ChangeLog

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #41 from Jakub Jelinek  ---
Author: jakub
Date: Thu Feb  7 14:27:09 2019
New Revision: 268619

URL: https://gcc.gnu.org/viewcvs?rev=268619=gcc=rev
Log:
Backported from mainline
2019-01-11  Jakub Jelinek  

PR bootstrap/88714
* passes.c (finish_optimization_passes): Call print_combine_total_stats
inside of pass_combine_1 dump rather than pass_profile_1.

Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/passes.c

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #39 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #38)
> I don't mind if you take over, I don't really have good opportunities to
> test on arm anyway.  Though, do you have copyright assignment on file (or
> covered by ARM or Linaro or similar assignments)?

OK, will do.

I'm covered by the ARM assignment.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #38 from Jakub Jelinek  ---
I don't mind if you take over, I don't really have good opportunities to test
on arm anyway.  Though, do you have copyright assignment on file (or covered by
ARM or Linaro or similar assignments)?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #40 from Jakub Jelinek  ---
Oops, sorry, ignore the question, I see you in MAINTAINERS as well as in
several commits.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #37 from Matthew Malcomson  ---
Good point (and interesting about the HOST_WIDE_INT_MIN exception -- I didn't
know that).

To avoid duplication of effort would you prefer I make the change or do you
want to handle it?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #36 from Jakub Jelinek  ---
Furthermore, nothing really guarantees you it must match,
gen_operands_ldrd_strd doesn't call plus_constant, it calls
mem_ok_for_ldrd_strd on each mem and subtracts the offsets.  So, probably a
helper that does exactly that should be used in the condition of the
define_insns.  That already calls arm_legitimate_address_p too. 
mem_ok_for_lrdr_strd is broken too:
  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == MINUS)
{
  *base = XEXP (addr, 0);
  *offset = XEXP (addr, 1);
  return (REG_P (*base) && CONST_INT_P (*offset));
}
The handling of MINUS that way makes no sense.  If it wants to handle MINUS,
offset should be HOST_WIDE_INT rather than rtx and it should do:
  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0)) && CONST_INT_P
(XEXP (addr, 1)))
{
  *base = XEXP (addr, 0);
  *offset = INTVAL (XEXP (addr, 1));
  return true;
}
  else if (GET_CODE (addr) == MINUS && REG_P (XEXP (addr, 0)) && CONST_INT_P
(XEXP (addr, 1)))
{
  *base = XEXP (addr, 0);
  *offset = -UINTVAL (XEXP (addr, 1));
  return true;
}
or just don't try to handle MINUS at all, MINUS with CONST_INT as op2 is not
canonical with the exception of HOST_WIDE_INT_MIN, but that is not possible for
SImode.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #35 from Jakub Jelinek  ---
That is a bad idea.  plus_constant will create new RTL expressions any time it
is called.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #34 from Matthew Malcomson  ---
Yes, I needed to redo that check for an offset of 4 -- I compared the
expression of the first MEM with the result of `plus_constant` with 4 on the
expression of the second MEM in the condition.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #33 from Jakub Jelinek  ---
How could you avoid the arm.c changes from my patch if you are using rtx_equal
on the MEM's addr and first operand of PLUS?  I believe either that arm.c
change is needed, or the predicate used on the new define_insns needs to repeat
the analysis done in gen_operands_ldrd_strd - verify that the two MEMs are 4
bytes appart (just, unlike gen_operands_ldrd_strd, require that the first one
is before second one).

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-02-01 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #32 from Matthew Malcomson  ---
Created attachment 45584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45584=edit
Single define_insn version of above patch

FWIW I've attached the patch I'd made.

The only interesting differences are that I'd added only one define_insn as I
don't believe the existing patterns' difference in constraints is needed and I
made some RTL testcases.


(I've just now added the testcase you found).

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-31 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #31 from Matthew Malcomson  ---
(In reply to Jakub Jelinek from comment #30)
> (In reply to Matthew Malcomson from comment #29)
> > I've been working on a patch that does very similar to the draft patch 
> > posted
> > above, and I notice a few things I've tried to avoid in it.
> > I doubt there are any actual bugs, since I don't know if the patterns that
> > trigger actual faults can occur at the moment.
> > 
> > 
> > 
> > Using the `address_operand` predicate and 'p' constraint to ensure the
> > address
> > is a valid address would use the mode SImode of the operand rather than
> > checking
> > it's valid for the DImode of the emitted ldrd.
> 
> Sure, but does it really matter?
> This is a post reload pattern created by the peephole2s, so nothing that can
> be matched out of the blue sky like combiner normally matches.
> So, if it didn't pass the conditions in the peephole2s, the patterns
> wouldn't be created.

True -- as I mentioned I don't know if a problematic pattern could actually
occur, so I doubt this is actually a problem.

> Are there any addresses that pass arm_legitimate_address_p (DImode, x, true)
> and fail address_operand (x, SImode)?  From brief skimming I couldn't find
> anything.
> So, would you be happy if the && arm_legitimate_address_p (DImode, XEXP
> (operands[n], 0), true)
> condition is added to the insn conditions (after the rtx_equal_p check)?

That sounds good to me.

> 
> > There's a similar problem to the `address_operand` one above with using the
> > `arm_count_output_move_double_insns` function.
> > 
> > It's called on the original operands, which means it eventually calls
> > `output_move_double` with the first two operands (which are in SImode).
> > 
> > This function has some calls to `reg_overlap_mentioned_p`, which depends on
> > the
> > number of hard registers for a given registers mode.
> > 
> > I've only found cases where the `arm_count_output_move_double_insns` 
> > function
> > returns something other than what it should in cases that only match because
> > of
> > the `address_operand` problem above.
> > 
> > This could be replaced by a wrapper that generates DImode registers
> > specifically
> > for checking this.
> 
> For non-vfp or iwmmxt, the length is always 8, are there cases in the vfp
> insn that the length is not 8?

I believe the length *can* be 4 non-vfp, vfp, or iwmmxt (the case below
produces a single ldrd when compiled with each of them).

int __RTL (startwith ("peephole2")) foo_x4 (int *a)
{
(function "foo_x4"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4
A64])) "/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}




Something else I've just noticed:
When compiling for vfp or iwmmxt, the ldm2_ define_insn matches the simpler
case below as it comes first in the md order.
That means we get a ldm instruction instead of the ldrd.

int __RTL (startwith ("peephole2")) foo_x5 (int *a)
{
(function "foo_x5"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (reg:SI r0) [0 S4 A64]))
"/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #30 from Jakub Jelinek  ---
(In reply to Matthew Malcomson from comment #29)
> I've been working on a patch that does very similar to the draft patch posted
> above, and I notice a few things I've tried to avoid in it.
> I doubt there are any actual bugs, since I don't know if the patterns that
> trigger actual faults can occur at the moment.
> 
> 
> 
> Using the `address_operand` predicate and 'p' constraint to ensure the
> address
> is a valid address would use the mode SImode of the operand rather than
> checking
> it's valid for the DImode of the emitted ldrd.

Sure, but does it really matter?
This is a post reload pattern created by the peephole2s, so nothing that can be
matched out of the blue sky like combiner normally matches.
So, if it didn't pass the conditions in the peephole2s, the patterns wouldn't
be created.
Are there any addresses that pass arm_legitimate_address_p (DImode, x, true)
and fail address_operand (x, SImode)?  From brief skimming I couldn't find
anything.
So, would you be happy if the && arm_legitimate_address_p (DImode, XEXP
(operands[n], 0), true)
condition is added to the insn conditions (after the rtx_equal_p check)?

> There's a similar problem to the `address_operand` one above with using the
> `arm_count_output_move_double_insns` function.
> 
> It's called on the original operands, which means it eventually calls
> `output_move_double` with the first two operands (which are in SImode).
> 
> This function has some calls to `reg_overlap_mentioned_p`, which depends on
> the
> number of hard registers for a given registers mode.
> 
> I've only found cases where the `arm_count_output_move_double_insns` function
> returns something other than what it should in cases that only match because
> of
> the `address_operand` problem above.
> 
> This could be replaced by a wrapper that generates DImode registers
> specifically
> for checking this.

For non-vfp or iwmmxt, the length is always 8, are there cases in the vfp insn
that the length is not 8?

> I think generation of patterns of the form 
> (plus:SI (plus:SI (reg) (const_int)) (const_int)) 
> which can happen with these peepholes isn't very nice.

Why?  I've done that intentionally, so that it is easy to verify it is 4 bytes
appart, otherwise one needs to handle all the different cases where address is
this and that etc.  This whole MEM isn't an operand in the instruction, just
mere RTL.  Combiner doesn't run after peephole2 and if something tries to
canonicalize that some way, it will simply fail to be recognized and it will
not try that canonicalization.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-31 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Matthew Malcomson  changed:

   What|Removed |Added

 CC||matmal01 at gcc dot gnu.org

--- Comment #29 from Matthew Malcomson  ---
Hi Jakub,

I've been working on a patch that does very similar to the draft patch posted
above, and I notice a few things I've tried to avoid in it.
I doubt there are any actual bugs, since I don't know if the patterns that
trigger actual faults can occur at the moment.



Using the `address_operand` predicate and 'p' constraint to ensure the address
is a valid address would use the mode SImode of the operand rather than
checking
it's valid for the DImode of the emitted ldrd.

If this happens we generate an ICE in the `adjust_address` call just before
`output_move_double`.

I don't know if such a pattern can actually be generated, but we could use
`arm_legitimate_address_p (DImode, XEXP (operands[1], 0), true)` in the
condition to avoid it just in case.



There's a similar problem to the `address_operand` one above with using the
`arm_count_output_move_double_insns` function.

It's called on the original operands, which means it eventually calls
`output_move_double` with the first two operands (which are in SImode).

This function has some calls to `reg_overlap_mentioned_p`, which depends on the
number of hard registers for a given registers mode.

I've only found cases where the `arm_count_output_move_double_insns` function
returns something other than what it should in cases that only match because of
the `address_operand` problem above.

This could be replaced by a wrapper that generates DImode registers
specifically
for checking this.

---

I think generation of patterns of the form 
(plus:SI (plus:SI (reg) (const_int)) (const_int)) 
which can happen with these peepholes isn't very nice.
I can't find any constraint against these patterns in the canonicalization
rules (maybe there should be?) so I can't say this is an actual problem.


As an example: the following
int __RTL (startwith ("peephole2")) foo_x4 (int *a)
{
(function "foo_x4"
  (insn-chain
(cnote 1 NOTE_INSN_DELETED)
(block 2
  (edge-from entry (flags "FALLTHRU"))
  (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
  (cinsn 101 (set (reg:SI r2)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4
A64])) "/home/matmal01/test.c":18)
  (cinsn 102 (set (reg:SI r3)
  (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4
A32])) "/home/matmal01/test.c":18)
  (cinsn 103 (set (reg:SI r0)
  (plus:SI (reg:SI r2) (reg:SI r3)))
"/home/matmal01/test.c":18)
  (edge-to exit (flags "FALLTHRU"))
) ;; block 2
  ) ;; insn-chain
  (crtl
(return_rtx 
  (reg/i:SI r0)
) ;; return_rtx
  ) ;; crtl
) ;; function "main"
}

Produces
(insn 104 3 103 2 (parallel [
(set (reg:SI 2 r2)
(mem/c:SI (plus:SI (reg:SI 0 r0)
(const_int 8 [0x8])) [0 S4 S4 A64]))
(set (reg:SI 3 r3)
(mem/c:SI (plus:SI (plus:SI (reg:SI 0 r0)
(const_int 8 [0x8]))
(const_int 4 [0x4])) [0 S4 S4 A32]))
]) -1
 (nil))


Maybe we could use the existing operands, and match with
`rtx_equal_p (..., plus_constant (...))`
so that the plus_constant can take care of adding the constants together.
This is what we do in the load_pair patterns for aarch64.




There are a few other tidy-up points around the define_insn patterns, but
overall I believe they can be merged into one pattern.
The difference between the 'q' and 'r' constraints are using either CORE_REGS
or
GENERAL_REGS, where CORE_REGS allows r13 and GENERAL_REGS doesn't.
I guess this is from a line in infocenter that mentions r12 is strongly
recommended to not be used as the first register for ldrdb, as this is stopped
by requiring both the first and second register to not be r13.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihjffga.html

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #28 from Jakub Jelinek  ---
#c27 now successfully bootstrapped where it previously failed, regtest still
pending.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #27 from Jakub Jelinek  ---
Created attachment 45566
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45566=edit
gcc9-pr88714.patch

Untested full patch, will try to bootstrap it now on armv7hl, no access to
other variants though.

A few GCC10 backend cleanup comments - "" constraints in match_operand should
be omitted, it would be nice to replace GET_CODE (x) == REG or GET_CODE (x) !=
MEM
etc. tests with REG_P (x) or !MEM_P (x) etc. and the formatting is sometimes
quite weird.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #26 from Jakub Jelinek  ---
Created attachment 45455
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45455=edit
gcc9-pr88714.patch

I needed a temporary solution for our distro packages and with this patch
armv7hl passes profiledbootstrap.  That said, I think preserving the
MEM_ALIAS_SET and MEM_EXPR is important for proper scheduling etc. decisions
and so it would be better to add new patterns.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-16 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #25 from ktkachov at gcc dot gnu.org ---
Thanks, I've reproduced the failure with the reduced testcase (aborts at -O2
but not at -O0)

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #24 from Jakub Jelinek  ---
Created attachment 45438
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45438=edit
gcc9-pr88714-poc.patch

Proof of concept that fixes the short testcase.
One would need to write remaining non-thumb patterns (strd in vfp.md, ldrd +
strd in arm.md and ldrd + strd in iwmmxt.md, all close to the movdi patterns,
unless there is a possibility to unify them (but, e.g. iwmmxt.md uses r instead
of q, etc.?)) and do it always in ldrdst*.md.  Not really sure about the
predicates, constraints etc. either., will defer that to those familiar with
the backend and architecture.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #23 from Jakub Jelinek  ---
On the #c22 testcase this started with r242549, but guess it has been latent
before.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #22 from Jakub Jelinek  ---
Self-contained testcase which actually fails because of this bug, even e.g.
when compiled with -O0 and gcc 8.2.1.  That doesn't mean this bug shouldn't be
P1, because preventing bootstrap on a primary target is extremely severe.

struct S { int a, b, c; int *d; };
struct T { int *e, *f, *g; } *t = 0;
int *o = 0;

__attribute__((noipa))
void bar (int *x, int y, int z, int w)
{
  if (w == -1)
{
  if (x != 0 || y != 0 || z != 0)
__builtin_abort ();
}
  else if (w != 0 || x != t->g || y != 0 || z != 12)
__builtin_abort ();
}

__attribute__((noipa)) void
foo (struct S *x, struct S *y, int *z, int w)
{
  *o = w;
  if (w)
bar (0, 0, 0, -1);
  x->d = z;
  if (y->d)
y->c = y->c + y->d[0];
  bar (t->g, 0, y->c, 0);
}

int
main ()
{
  int a[4] = { 8, 9, 10, 11 };
  struct S s = { 1, 2, 3, [0] };
  struct T u = { 0, 0, [3] };
  o = [2];
  t = 
  foo (, , [1], 5);
  if (s.c != 12 || s.d != [1])
__builtin_abort ();
  return 0;
}

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #21 from Jakub Jelinek  ---
Short testcase -O2 -mtune=cortex-a9 -mfloat-abi=hard -mfpu=vfpv3-d16
-mtls-dialect=gnu -marm -march=armv7-a+fp:

struct S { int a, b, c; int *d; };
void bar (int, int, int, int);

void
foo (struct S *x, struct S *y, int *z)
{
  x->d = z;
  if (y->d)
y->c = y->c + y->d[0];
  bar (0, 0, y->c, 0);
}

This one actually isn't miscompiled (dunno how to convince sched2 that it wants
to schedule the ldrd before the x->d store), but if you put a breakpoint on
true_dependence_1 if mem->mode == E_DImode || x->mode == E_DImode, then you
should be able to see that it considers swapping those and doesn't find
aliasing reason not to.

What the peephole2 does is similar to what e.g. store-merging does, which for
alias sets does:
  if (!n1->alias_set
  || alias_ptr_types_compatible_p (n1->alias_set, n2->alias_set))
n->alias_set = n1->alias_set;
  else
n->alias_set = ptr_type_node;
i.e. uses alias set 0 if they aren't compatible.

Another possibility would be to use an alternate pattern for the ldrd when it
is matched by such a peephole2, instead of presenting it as a DImode read
present it as 2 SImode reads, so (set r2 (mem:SI ...)) (set r3 (mem:SI ...)). 
That way you could use the original MEM_ALIAS_SET and MEM_EXPRs.  Seems arm.md
even has similar patterns like *thumb2_ldrd_base.  So, in ldrdstrd.md do a
similar thing for TARGET_ARM as for TARGET_THUMB2, just the TARGET_ARM patterns
would need to also verify the two registers are consecutive (can be done in the
insn condition) and make sure it handles any cases where the two memory
addresses are 4 bytes appart (again, can be done in the insn condition).
I think this would be better than to drop alias set to 0, which then can
prevent optimal scheduling etc.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #20 from ktkachov at gcc dot gnu.org ---
Thanks for investigating this.
At an initial glance, I guess this is something gen_operands_ldrd_strd in
config/arm/arm.c should handle

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Jakub Jelinek  changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org,
   ||ramana at gcc dot gnu.org,
   ||rearnsha at gcc dot gnu.org

--- Comment #19 from Jakub Jelinek  ---
To me, this looks like buggy arm peephole2.
In *.compgotos pass we have:
(insn 96 95 97 18 (set (mem/f:SI (plus:SI (reg/v/f:SI 4 r4 [orig:137 vr_ ]
[137])
(const_int 12 [0xc])) [12 MEM[(struct vn_reference_s
*)vr__23(D)].vuse+0 S4 A32])
(reg/v/f:SI 12 ip [orig:135 vuse ] [135]))
"/tmp/tree-ssa-sccvn.ii":87248:12 650 {*arm_movsi_vfp}
 (expr_list:REG_DEAD (reg/v/f:SI 12 ip [orig:135 vuse ] [135])
(expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:137 vr_ ] [137])
(nil
(insn 97 96 98 18 (set (reg/f:SI 3 r3 [orig:118 _11 ] [118])
(mem/f:SI (plus:SI (reg/f:SI 1 r1 [orig:123 prephitmp_29 ] [123])
(const_int 12 [0xc])) [12 prephitmp_29->vuse+0 S4 A32]))
"/tmp/tree-ssa-sccvn.ii":87249:11 650 {*arm_movsi_vfp}
 (nil))
(insn 98 97 99 18 (set (reg:SI 2 r2 [orig:120 _14 ] [120])
(mem:SI (plus:SI (reg/f:SI 1 r1 [orig:123 prephitmp_29 ] [123])
(const_int 8 [0x8])) [4 prephitmp_29->hashcode+0 S4 A32])) 650
{*arm_movsi_vfp}
 (nil))

The first stmt is the vr->vuse = ... store from vr->vuse = vuse_ssa_val (vuse);
The next two stmts load vr->hashcode and vr->vuse, but unfortunately the GIMPLE
optimizers weren't able to figure out that
vr is equal to vr__23(D):
  # _42 = PHI 
  # prephitmp_29 = PHI 
  MEM[(struct vn_reference_s *)vr__23(D)].vuse = _42;
  _11 = prephitmp_29->vuse;
  pretmp_49 = prephitmp_29->hashcode;
at that point (note, vr is address taken variable).

Then comes peephole2 and does:
Splitting with gen_peephole2_11
scanning new insn with uid = 217.
deleting insn with uid = 98.
deleting insn with uid = 97.
verify found no changes in insn with uid = 217.

and constructs
(insn 96 95 217 18 (set (mem/f:SI (plus:SI (reg/v/f:SI 4 r4 [orig:137 vr_ ]
[137])
(const_int 12 [0xc])) [12 MEM[(struct vn_reference_s
*)vr__23(D)].vuse+0 S4 A32])
(reg/v/f:SI 12 ip [orig:135 vuse ] [135]))
"/tmp/tree-ssa-sccvn.ii":87248:12 650 {*arm_movsi_vfp}
 (expr_list:REG_DEAD (reg/v/f:SI 12 ip [orig:135 vuse ] [135])
(expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:137 vr_ ] [137])
(nil
(insn 217 96 99 18 (set (reg:DI 2 r2)
(mem:DI (plus:SI (reg/f:SI 1 r1 [orig:123 prephitmp_29 ] [123])
(const_int 8 [0x8])) [4 prephitmp_29->hashcode+0 S8 A32])) -1
 (nil))
out of this.  The insn 217 is a ldrd.  The bug is that the DImode MEM uses the
same MEM_ALIAS_SET and same MEM_EXPR as
that of the SImode prephitmp_29->hashcode read, even when it now covers two
fields of the structure.  So, either it needs to throw away MEM_EXPR and clear
MEM_ALIAS_SET, or find something conservatively correct covering both.

Finally, sched2 comes and swaps the two, because the (incorrect) aliasing info
makes alias.c believe it can swap the two:
(insn:TI 217 116 96 12 (set (reg:DI 2 r2)
(mem:DI (plus:SI (reg/f:SI 1 r1 [orig:123 prephitmp_29 ] [123])
(const_int 8 [0x8])) [4 prephitmp_29->hashcode+0 S8 A32])) 652
{*movdi_vfp}
 (nil))
(insn:TI 96 217 109 12 (set (mem/f:SI (plus:SI (reg/v/f:SI 4 r4 [orig:137 vr_ ]
[137])
(const_int 12 [0xc])) [12 MEM[(struct vn_reference_s
*)vr__23(D)].vuse+0 S4 A32])
(reg/v/f:SI 12 ip [orig:135 vuse ] [135]))
"/tmp/tree-ssa-sccvn.ii":87248:12 650 {*arm_movsi_vfp}
 (expr_list:REG_DEAD (reg/v/f:SI 12 ip [orig:135 vuse ] [135])
(expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:137 vr_ ] [137])
(nil

and thus, instead of using the new vr->vuse value for the vr->hashcode
computation we use the old one.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #18 from Jakub Jelinek  ---
So, I've hacked up assembly version which contained 2 versions of this function
(good and bad) plus a wrapper function:
void *
vn_reference_lookup_2b (ao_ref *op, tree vuse, unsigned int cnt, void *vr_);
void *
vn_reference_lookup_2c (ao_ref *op, tree vuse, unsigned int cnt, void *vr_);

void *
vn_reference_lookup_2a (ao_ref *op, tree vuse, unsigned int cnt, void *vr_)
{
  vn_reference_t vr = (vn_reference_t)vr_;
  vn_reference_s a = *vr;
  void *r1 = vn_reference_lookup_2a (op, vuse, cnt, vr_);
  vn_reference_s b = *vr;
  *vr = a;
  void *r2 = vn_reference_lookup_2b (op, vuse, cnt, vr_);
  if (r1 != r2 || __builtin_memcmp (vr, , sizeof (b)))
fancy_abort (__FILE__, __LINE__, __FUNCTION__);
  return r1;
}

adjusted in the assembly, so that it is actually that vn_reference_lookup_2
that calls the good and bad versions.
This ICEs on the second call to vn_reference_lookup_2.
vuse is .MEM_59, so is vr->vuse on entry and vr->hashcode is 0xd16d45ea.
The
  if (vr->vuse)
vr->hashcode = vr->hashcode - (vr->vuse)->base.u.version;
is performed correctly in both, changing vr->hashcode to 0xd16d45af (i.e.
subtracting 59), next vr->vuse is updated to .MEM_48.
The problem is with the
  if (vr->vuse)
vr->hashcode = vr->hashcode + (vr->vuse)->base.u.version;
in the good version it does what the source tells it to do, adds 48, making
vr->hashcode 0xd16d45df and calling find_slot_with_hash with that value.
But in the bad version, we actually store 0xd16d45ea again.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #17 from Segher Boessenkool  ---
It's not obvious to me what machine code is wrong here.  Maybe it is obvious
to someone who is better at Arm code than I am?

Does it all work if you use -fno-if-conversion2 though?  Or, what other
later pass causes it?  Or is the RTL code immediately after combine already
bad?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #16 from Jakub Jelinek  ---
Some more progress.
I've used
--- gcc/combine.c.jj2019-01-10 11:43:17.050333949 +0100
+++ gcc/combine.c   2019-01-15 14:47:28.009094300 +0100
@@ -2319,6 +2319,9 @@ contains_muldiv (rtx x)
 }
 }


+int cxcnt = -1;
+int cxcurcnt = 0;
+
 /* Determine whether INSN can be used in a combination.  Return nonzero if
not.  This is used in try_combine to detect early some cases where we
can't perform combinations.  */
@@ -2361,7 +2364,8 @@ cant_combine_insn_p (rtx_insn *insn)
 #endif
  || (HARD_REGISTER_P (dest)
  && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
- && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO
(dest))
+ && (targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO
(dest)))
+ || (getenv ("COMBINE_FIRST") && cxcurcnt == cxcnt)
 return 1;

   return 0;
@@ -14993,6 +14997,12 @@ make_more_copies (void)
 {
   basic_block bb;

+  if (cxcnt == -1 && getenv ("COMBINE_CNT"))
+cxcnt = atoi (getenv ("COMBINE_CNT"));
+  ++cxcurcnt;
+  if (getenv ("COMBINE_SECOND") && cxcurcnt == cxcnt)
+return;
+
   FOR_EACH_BB_FN (bb, cfun)
 {
   rtx_insn *insn;

hack to undo both or any one of the two changes r265398 did on the function of
my choice (initialy for binary search I was using cxcurcnt >= cxcnt instead of
cxcurcnt == cxcnt in the two spots), and found that with
COMBINE_CNT=74 COMBINE_FIRST=1 COMBINE_SECOND=1
sort.i works as in stage1, so  it is
_ZL21vn_reference_lookup_2P6ao_refP9tree_nodejPv that actually matters.
COMBINE_CNT=74 COMBINE_SECOND=1 generates the same (good assembly) as
COMBINE_CNT=74 COMBINE_FIRST=1 COMBINE_SECOND=1, while
COMBINE_CNT=74 COMBINE_FIRST=1 doesn't work the same as COMBINE_CNT=200.
The "bad" to "good" assembly difference is:
.type   _ZL21vn_reference_lookup_2P6ao_refP9tree_nodejPv, %function
 _ZL21vn_reference_lookup_2P6ao_refP9tree_nodejPv:
.fnstart
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
movwr0, #:lower16:global_options
-   mov ip, r1
-   movtr0, #:upper16:global_options
push{r4, r5, r6, lr}
.save {r4, r5, r6, lr}
-   ldr r0, [r0, #88]
+   movtr0, #:upper16:global_options
+   mov r5, r3
.pad #8
sub sp, sp, #8
-   str r3, [sp]
-   ldr r1, [r0, #540]
-   cmp r1, r2
+   ldr r3, [r0, #88]
+   str r5, [sp]
+   ldr r3, [r3, #540]
+   cmp r3, r2
bcc .L2103
-   movwr5, #:lower16:.LANCHOR1
-   mov r4, r3
-   movtr5, #:upper16:.LANCHOR1
-   ldr r3, [r5, #176]
+   movwr4, #:lower16:.LANCHOR1
+   mov ip, r1
+   movtr4, #:upper16:.LANCHOR1
+   ldr r3, [r4, #176]
cmp r3, #0
-   strne   ip, [r3]
-   ldr r3, [r4, #12]
+   strne   r1, [r3]
+   ldr r3, [r5, #12]
cmp r3, #0
ldrne   r2, [r3, #4]
-   ldrne   r3, [r4, #8]
+   ldrne   r3, [r5, #8]
subne   r3, r3, r2
-   strne   r3, [r4, #8]
-   cmp ip, #0
+   strne   r3, [r5, #8]
+   cmp r1, #0
beq .L2104
-   ldr r6, [r5, #12]
+   ldr r6, [r4, #12]
b   .L2101
 .L2127:
-   ldr ip, [r2, #4]
+   ldr ip, [r3, #4]
 .L2099:
-   ldr r3, [r5, #8]
+   ldr r3, [r4, #8]
cmp r3, ip
beq .L2125
ldrbr3, [ip, #3]@ zero_extendqisi2
tst r3, #2
beq .L2126
 .L2101:
ldr r2, [ip, #4]
add r1, sp, #4
mov r0, r6
str ip, [sp, #4]
bl 
_ZN10hash_tableI17vn_ssa_aux_hasher11xcallocatorE14find_with_hashERKP9tree_nodej
-   ldr r2, [r0]
-   cmp r2, #0
+   ldr r3, [r0]
+   cmp r3, #0
beq .L2098
-   ldrbr3, [r2, #16]   @ zero_extendqisi2
-   tst r3, #1
+   ldrbr2, [r3, #16]   @ zero_extendqisi2
+   tst r2, #1
bne .L2127
 .L2098:
ldr ip, [sp, #4]
b   .L2099
 .L2126:
-   ldr r1, [sp]
+   ldr r3, [sp]
 .L2097:
-   ldrdr2, [r1, #8]
-   str ip, [r4, #12]
-   ldr r0, [r5, #28]
-   cmp r3, #0
-   ldrne   r3, [r3, #4]
+   str ip, [r5, #12]
+   ldr r1, [r3, #12]
+   ldr r2, [r3, #8]
+   ldr r0, [r4, #28]
+   cmp r1, #0
+   ldrne   r1, [r1, #4]
ldr r0, [r0, #8]
-   addne   r2, r2, r3
-   mov r3, #0
-   strne   r2, [r1, #8]
+   addne   r2, r2, r1
mov r1, sp
+   strne   r2, [r3, #8]
+   mov r3, #0
bl 
_ZN10hash_tableI19vn_reference_hasher11xcallocatorE19find_slot_with_hashERKP14vn_reference_sj13insert_option
cmp r0, #0
ldrne   

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #15 from Jakub Jelinek  ---
Ah, except that isn't all that r265398 did.  It has both the make_more_copies
part and
  || (HARD_REGISTER_P (dest)
- && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest))
- && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO
(dest))
+ && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)
hunk in cant_combine_insn_p.  If I revert both, then it works properly, but as
I said, not doing make_more_copies alone or reverting this
class_likely_spilled_p check alone doesn't fix it.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #14 from Jakub Jelinek  ---
That said, if I compile this with r267800 (cross-compiler, but identical output
to the attached one) and then on the problematic do_rpo_vn function return in
gdb at the start of make_more_copies so effectively undo r265398 for that
function, it still fails (== produces different sort.s from stage1).

Could somebody familiar with ARM have a look at this?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #13 from Jakub Jelinek  ---
Created attachment 45428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45428=edit
tree-ssa-sccvn.s.xz

And resulting (bad) assembly

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #12 from Jakub Jelinek  ---
Created attachment 45427
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45427=edit
tree-ssa-sccvn.ii.xz

Preprocessed source

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #11 from Jakub Jelinek  ---
If I add __attribute__((optimize (0))) to:
static unsigned
do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   bool iterate, bool eliminate)
and recompile stage2 tree-ssa-sccvn.o + relink stage2 cc1, then sort.i is the
same between stage1/cc1 and stage2/cc1.

../stage1-gcc/cc1plus tree-ssa-sccvn.ii -quiet -mtune=cortex-a9
-mfloat-abi=hard -mfpu=vfpv3-d16 -mtls-dialect=gnu -marm -march=armv7-a+fp -g
-gtoggle -O2 -fno-PIE -fno-checking -fno-exceptions -fno-rtti
-fasynchronous-unwind-tables -fno-ipa-ra -o tree-ssa-sccvn.s

compiled tree-ssa-sccvn.ii without that optimize (0) attribute still works
differently from stage1/cc1.
Let me attach tree-ssa-sccvn.ii and tree-ssa-sccvn.s.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #10 from Jakub Jelinek  ---
In armv7hl --enable-checking=release profiledbootstrap I see:
checking for strtoull... ../../libdecnumber/decNumber.c: In function 'decLnOp':
../../libdecnumber/decNumber.c:5581:13: error: number of counters in profile
data for function 'decLnOp' does not match its profile data (counter 'arcs',
expected 54 and have 55) [-Werror=coverage-mismatch]
 5581 | decNumber * decLnOp(decNumber *res, const decNumber *rhs,
  | ^~~
../../libdecnumber/decNumber.c:5581:13: error: the control flow of function
'decLnOp' does not match its profile data (counter 'time_profiler')
[-Werror=coverage-mismatch]
../../libdecnumber/decNumber.c: In function 'decExpOp':
../../libdecnumber/decNumber.c:5221:13: error: number of counters in profile
data for function 'decExpOp' does not match its profile data (counter 'arcs',
expected 52 and have 53) [-Werror=coverage-mismatch]
 5221 | decNumber * decExpOp(decNumber *res, const decNumber *rhs,
  | ^~~~
../../libdecnumber/decNumber.c:5221:13: error: the control flow of function
'decExpOp' does not match its profile data (counter 'time_profiler')
[-Werror=coverage-mismatch]
../../libdecnumber/decNumber.c: In function 'decMultiplyOp':
../../libdecnumber/decNumber.c:4831:20: error: number of counters in profile
data for function 'decMultiplyOp' does not match its profile data (counter
'arcs', expected 43 and have 46) [-Werror=coverage-mismatch]
 4831 | static decNumber * decMultiplyOp(decNumber *res, const decNumber *lhs,
  |^
../../libdecnumber/decNumber.c:4831:20: error: the control flow of function
'decMultiplyOp' does not match its profile data (counter 'time_profiler')
[-Werror=coverage-mismatch]
../../libdecnumber/decNumber.c: In function 'decDivideOp':
../../libdecnumber/decNumber.c:4211:20: error: number of counters in profile
data for function 'decDivideOp' does not match its profile data (counter
'arcs', expected 111 and have 113) [-Werror=coverage-mismatch]
 4211 | static decNumber * decDivideOp(decNumber *res,
  |^~~
../../libdecnumber/decNumber.c:4211:20: error: the control flow of function
'decDivideOp' does not match its profile data (counter 'single')
[-Werror=coverage-mismatch]
../../libdecnumber/decNumber.c:4211:20: error: the control flow of function
'decDivideOp' does not match its profile data (counter 'time_profiler')
[-Werror=coverage-mismatch]
../../libdecnumber/decNumber.c: In function 'decNumberSquareRoot':
../../libdecnumber/decNumber.c:2797:13: error: number of counters in profile
data for function 'decNumberSquareRoot' does not match its profile data
(counter 'arcs', expected 64 and have 65) [-Werror=coverage-mismatch]
 2797 | decNumber * decNumberSquareRoot(decNumber *res, const decNumber *rhs,
  | ^~~
../../libdecnumber/decNumber.c:2797:13: error: the control flow of function
'decNumberSquareRoot' does not match its profile data (counter 'time_profiler')
[-Werror=coverage-mismatch]
which might be related, if one stage generates starting with PRE slightly
different code from the other stage, then the profile mismatches would make a
lot of sense.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #9 from Jakub Jelinek  ---
Author: jakub
Date: Fri Jan 11 12:05:54 2019
New Revision: 267839

URL: https://gcc.gnu.org/viewcvs?rev=267839=gcc=rev
Log:
PR bootstrap/88714
* passes.c (finish_optimization_passes): Call print_combine_total_stats
inside of pass_combine_1 dump rather than pass_profile_1.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/passes.c

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #8 from Jakub Jelinek  ---
Note, the stage1-gcc compiled tree-ssa-sccvn.o is identical no matter whether
-fno-checking or -fchecking=1 was used, and doesn't fail -fcompare-debug with
either, so it is simply that something is miscompiled somewhere.
BTW, stage3-gcc/cc1 results in the same sort.s as stage1-gcc/cc1, only
stage2-gcc/cc1 is different (if it contains the stage1-gcc/cc1plus compiled and
optimized tree-ssa-sccvn.o).
Guess we want a side-by-side debugging sessions in the debugger of
stage2-gcc/cc1 on sort.i when it has tree-ssa-sccvn.o built by the host
compiler and when it has -O2 -fno-checking -g tree-ssa-sccvn.o built by the
stage1 cc1plus.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-10
 Ever confirmed|0   |1

--- Comment #7 from Jakub Jelinek  ---
As for the rest, seems that when tree-ssa-sccvn.c is compiled by the
stage1-gcc/ with -g -O2 -fno-checking -gtoggle options (among others), it emits
something, while when compiled with stage2-gcc/ it emits something different.
Copying tree-ssa-sccvn.o from stage1-gcc/ (prev-gcc) into stage2-gcc/ (gcc) and
rebuilding cc1 makes it generate the same assembly and same -fdump-tree-pre-all
dump for sort.i as stage1-gcc/ generates, otherwise there are differences like:
-exp_gen[6] := { countp_30 (0022), {mem_ref<0B>,countp_30}@.MEM_25 (0007),
{mem_ref<4294967292B>,countp_30}@.MEM_80 (0008), {plus_expr,_10,_11} (0009),
{pointer_plus_expr,countp_30,4} (0033) }
+exp_gen[6] := { countp_30 (0022), {mem_ref<0B>,countp_30}@.MEM_25 (0007), _11
(0008), {plus_expr,_10,_11} (0009), {pointer_plus_expr,countp_30,4} (0033) }
and various others later on.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #6 from Jakub Jelinek  ---
The profile_estimate difference is a bug introduced in r191883 and later
extended in r193821 I have a fix for, but it can be ignored, it should have
went into the combine dump instead.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-09 Thread mikpelinux at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #5 from Mikael Pettersson  ---
With -da -fdump-tree-all, stage1 and stage2 output starts to differ in
043t.profile_estimate and then more visibly in 130t.pre:

diff -ru stage1/sort.i.043t.profile_estimate
stage2/sort.i.043t.profile_estimate
--- stage1/sort.i.043t.profile_estimate 2019-01-09 19:39:25.973607372 +0100
+++ stage2/sort.i.043t.profile_estimate 2019-01-09 19:39:40.893537693 +0100
@@ -252,5 +252,5 @@



-;; Combiner totals: 55 attempts, 49 substitutions (22 requiring new space),
+;; Combiner totals: 56 attempts, 49 substitutions (21 requiring new space),
 ;; 4 successes.
diff -ru stage1/sort.i.130t.pre stage2/sort.i.130t.pre
--- stage1/sort.i.130t.pre  2019-01-09 19:39:24.673613443 +0100
+++ stage2/sort.i.130t.pre  2019-01-09 19:39:39.993541897 +0100
@@ -226,13 +226,12 @@
   size_t j;
   size_t i;
   unsigned int count[256];
-  unsigned int prephitmp_1;
-  unsigned int pretmp_2;
   unsigned int _5;
   unsigned char _6;
   int _7;
   unsigned int _8;
   unsigned int _9;
+  unsigned int _11;
   unsigned int _12;
   sizetype _13;
   sizetype _14;
@@ -244,13 +243,11 @@
   unsigned int _20;
   void * * _21;
   void * _22;
-  unsigned int prephitmp_36;
   unsigned int _53;
-  unsigned int pretmp_58;
-  unsigned int pretmp_84;
-  unsigned int prephitmp_85;
-  unsigned int pretmp_86;
-  unsigned int prephitmp_87;
+  unsigned int prephitmp_71;
+  unsigned int pretmp_72;
+  unsigned int prephitmp_76;
+  unsigned int pretmp_77;

[local count: 2684354]:
   _5 = n_41(D) * 4;
@@ -284,18 +281,16 @@
 goto ; [11.00%]

[local count: 9556302]:
-  pretmp_58 = MEM[(unsigned int *) + 4B];
-  pretmp_2 = MEM[(unsigned int *)];
+  pretmp_77 = MEM[(unsigned int *) + 4B];

[local count: 10737418]:
-  # prephitmp_1 = PHI 
-  # prephitmp_36 = PHI 
+  # prephitmp_76 = PHI 

[local count: 1063004406]:
   # countp_30 = PHI <[(void *) + 4B](6), countp_46(8)>
-  # prephitmp_85 = PHI 
-  # prephitmp_87 = PHI 
-  _12 = prephitmp_85 + prephitmp_87;
+  # prephitmp_71 = PHI 
+  _11 = MEM[(unsigned int *)countp_30 + 4294967292B];
+  _12 = _11 + prephitmp_71;
   *countp_30 = _12;
   countp_46 = countp_30 + 4;
   if ([(void *) + 1024B] > countp_46)
@@ -304,8 +299,7 @@
 goto ; [1.01%]

[local count: 1052266993]:
-  pretmp_84 = MEM[(unsigned int *)countp_30 + 4B];
-  pretmp_86 = *countp_30;
+  pretmp_72 = MEM[(unsigned int *)countp_30 + 4B];
   goto ; [100.00%]

[local count: 10737418]:

Subsequent dump files also differ, but seem to mirror the above diffs.

Meanwhile I did a bootstrap without specifying --enable-checking=release, and
that one succeeded.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #4 from Jakub Jelinek  ---
Thanks.
So, can you for that sort.i do -da -fdump-tree-all when compiled both with
stage1 and stage2 and see where things start to differ?
Or, try to change either:
STAGE1_TFLAGS += -fno-checking
STAGE2_CFLAGS += -fno-checking
STAGE2_TFLAGS += -fno-checking
in toplevel Makefile.in to -fchecking=1 or
STAGE3_CFLAGS += -fchecking=1
STAGE3_TFLAGS += -fchecking=1
after it to -fno-checking and see if the comparison failures go away.  That
would verify your -fno-checking idea.  If that proves to be true, where
sort.i.* starts to differ could hint on what TU from stage1 or stage2 could be
rebuilt with -fno-checking or -fchecking=1 to see if the difference goes away. 
Or do a binary search among *.o files.
Sorry I can't help more, but this really isn't debuggable with cross-compilers.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-08 Thread mikpelinux at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #3 from Mikael Pettersson  ---
Created attachment 45384
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45384=edit
pre-processed source for libiberty/sort.c

One of the smallest .o files that differ is from libiberty's sort.c
(pre-processed source attached as sort.i; sorry haven't had time to minimize
it).  stage1 and stage2 generate different code for this file:

> stage1-gcc/xgcc -Bstage1-gcc -O2 -S -o sort.s-stage1 sort.i
> stage2-gcc/xgcc -Bstage2-gcc -O2 -S -o sort.s-stage2 sort.i
> diff -u sort.s-stage[12] | wc
109 4602005
> diff -u sort.s-stage[12] | head
--- sort.s-stage1   2019-01-08 22:10:50.288929388 +0100
+++ sort.s-stage2   2019-01-08 22:10:59.148975673 +0100
@@ -21,21 +21,23 @@
 sort_pointers:
@ args = 0, pretend = 0, frame = 1024
@ frame_needed = 0, uses_anonymous_args = 0
-   push{r4, r5, r6, r7, r8, lr}
-   lsl r7, r0, #2
+   push{r4, r5, r6, r7, r8, r9, lr}
+   lsl r8, r0, #2

I wasn't able to trigger anything with -fcompare-debug, using either of the
stage1 or stage2 compilers.

Looking though the build log, I noticed that stage1 compiles stage2 with
-fno-checking, while stage2 compiles stage3 with -fchecking=1.  (This is
deliberate according to the top-level Makefile.tpl.)  Stage1 generates the same
code for sort.i with -fno-checking or -fchecking=1, and stage2 does the same.

Finally I checked the stage3 compiler and it generates the exact same code as
the stage1 does.

To me it looks like -fno-checking (possibly in combination with
--enable-checking=release) causes some breakage somewhere.

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

--- Comment #2 from Segher Boessenkool  ---
Or, do we have any machine in the compile farm on which this can be reproduced?
If so, could you give instructions for that please?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
Can you pick one of the smaller object files that differ, and verify if it is a
debug info issue or something else?
Like see if building the corresponding source with stage1 or stage2 compiler
generates an error with -fcompare-debug (+ the flags normally used)?
Or run both stage1 and stage2 compiler with the same options (i.e. remove the
-gtoggle from one of flag sets) and compare if the same assembly is created.

If it is -fcompare-debug issue, can you attach preprocessed source for that +
full xgcc/xg++ etc. command line?

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Richard Biener  changed:

   What|Removed |Added

   Keywords||build
   Priority|P3  |P1
 CC||segher at gcc dot gnu.org

[Bug bootstrap/88714] [9 regression] bootstrap comparison failure on armv7l since r265398

2019-01-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88714

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.0