Re: PR79286, ira combine_and_move_insns in loops

2017-02-23 Thread Jeff Law

On 02/23/2017 02:52 AM, Alan Modra wrote:

On Thu, Feb 23, 2017 at 12:46:17AM -0700, Jeff Law wrote:

And thus we keep the equivalence.  Ultimately may_trap_p considers a PIC
memory reference as non-trapping.


Which is obviously a bug, because this access is segfaulting..
Not that I want to poke at the bug.  :)
They're explicitly called out as not causing faults.  I'm not sure I 
want to revisit that history right now, whatever it is,  It's clearly 
wrong IMHO though.


Jeff


Re: PR79286, ira combine_and_move_insns in loops

2017-02-23 Thread Alan Modra
On Thu, Feb 23, 2017 at 12:46:17AM -0700, Jeff Law wrote:
> And thus we keep the equivalence.  Ultimately may_trap_p considers a PIC
> memory reference as non-trapping.

Which is obviously a bug, because this access is segfaulting..
Not that I want to poke at the bug.  :)

> I really wonder if we should just drop the may_trap_p test and always do the
> domination check.

Yeah, seems reasonable to me.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR79286, ira combine_and_move_insns in loops

2017-02-22 Thread Jeff Law

On 02/22/2017 09:32 AM, Dominique d'Humières wrote:

Let me stand up an i686 linux instance and see if I can twiddle things without 
compromising the test.


Also darwin is a -fpic platform.
Which is the key here :-)  The test works fine without PIC, but once PIC 
is introduced, it blows up.


It's pretty easy to see in the assembly code and the dumps.

Prior to IRA we have:

bb3 (loop header):
[ ... ]
(insn 8 7 9 3 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [1  S4 A32])
(reg/v:SI 89 [ e ])) "j.c":9 56 {*pushsi2}
 (expr_list:REG_ARGS_SIZE (const_int 8 [0x8])
(nil)))
[ ... ]
bb5 (conditional within the loop):
(insn 28 26 30 5 (set (reg/v:SI 89 [ e ])
(mem/u:SI (plus:SI (reg:SI 87)
(const:SI (plus:SI (unspec:SI [
(symbol_ref:SI ("d") [flags 0x2] 
)

] UNSPEC_GOTOFF)
(const_int 331350016 [0x13c0] [1 
d+331350016 S4 A32])) "j.c":11 75 {*movsi_internal}

 (expr_list:REG_DEAD (reg:SI 87)
(nil)))


After IRA we have:

bb3 (loop header)

(insn 8 7 9 3 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [1  S4 A32])
(mem/u:SI (plus:SI (reg:SI 87)
(const:SI (plus:SI (unspec:SI [
(symbol_ref:SI ("d") [flags 0x2] 
)

] UNSPEC_GOTOFF)
(const_int 331350016 [0x13c0] [1 
d+331350016 S4 A32])) "j.c":9 56 {*pushsi2}

 (expr_list:REG_ARGS_SIZE (const_int 8 [0x8])
(nil)))

And of course that goes boom.

Note that we don't have a REG_EQUAL prior to IRA.  That allows us to 
zero in on this code within IRA which adds the note to insn 28:


3485  /* cse sometimes generates function invariants, but 
doesn't put a
3486 REG_EQUAL note on the insn.  Since this note would 
be redundant,

3487 there's no point creating it earlier than here.  */
3488  if (! note && ! rtx_varies_p (src, 0))
3489note = set_unique_reg_note (insn, REG_EQUAL, 
copy_rtx (src));


may_trap_p returns false for the note:

expr_list:REG_EQUAL (mem/u:SI (plus:SI (reg:SI 87)
(const:SI (plus:SI (unspec:SI [
(symbol_ref:SI ("d") [flags 0x2] 0xb79450fc d>)

] UNSPEC_GOTOFF)
(const_int 331350016 [0x13c0] [1 
d+331350016 S4 A32])



And thus we keep the equivalence.  Ultimately may_trap_p considers a PIC 
memory reference as non-trapping.


I really wonder if we should just drop the may_trap_p test and always do 
the domination check.


jeff







Re: PR79286, ira combine_and_move_insns in loops

2017-02-22 Thread Dominique d'Humières

> Le 22 févr. 2017 à 17:06, Jeff Law  a écrit :
> 
> On 02/22/2017 04:32 AM, Dominique d'Humières wrote:
>> 
>>> Le 21 févr. 2017 à 23:48, Alan Modra  a écrit :
>>> 
>>> On Sat, Feb 18, 2017 at 12:39:08PM +0100, Dominique d'Humières wrote:
> I'm slightly concerned about the test and how it'll behave on targets 
> with small address spaces. If it's a problem we can fault in adjustments.
 
 The test fails on x86_64-apple-darwin16 with -m32 and -O1 and above.
>>> 
>>> Have you investigated just why the test fails?
>> 
>> Segmentation fault
>> 
>> Note that the assembly is the same for revisions r245268 and r245564, i.e., 
>> before and after r245521.
> Given the array index, I kindof expected problems on 32bit platforms.
> 
> Let me stand up an i686 linux instance and see if I can twiddle things 
> without compromising the test.

Also darwin is a -fpic platform.

Dominique

> 
> jeff



Re: PR79286, ira combine_and_move_insns in loops

2017-02-22 Thread Jeff Law

On 02/22/2017 04:32 AM, Dominique d'Humières wrote:



Le 21 févr. 2017 à 23:48, Alan Modra  a écrit :

On Sat, Feb 18, 2017 at 12:39:08PM +0100, Dominique d'Humières wrote:

I'm slightly concerned about the test and how it'll behave on targets with 
small address spaces. If it's a problem we can fault in adjustments.


The test fails on x86_64-apple-darwin16 with -m32 and -O1 and above.


Have you investigated just why the test fails?


Segmentation fault

Note that the assembly is the same for revisions r245268 and r245564, i.e., 
before and after r245521.

Given the array index, I kindof expected problems on 32bit platforms.

Let me stand up an i686 linux instance and see if I can twiddle things 
without compromising the test.


jeff



Re: PR79286, ira combine_and_move_insns in loops

2017-02-22 Thread Dominique d'Humières

> Le 21 févr. 2017 à 23:48, Alan Modra  a écrit :
> 
> On Sat, Feb 18, 2017 at 12:39:08PM +0100, Dominique d'Humières wrote:
>>> I'm slightly concerned about the test and how it'll behave on targets with 
>>> small address spaces. If it's a problem we can fault in adjustments.
>> 
>> The test fails on x86_64-apple-darwin16 with -m32 and -O1 and above.
> 
> Have you investigated just why the test fails?  

Segmentation fault

Note that the assembly is the same for revisions r245268 and r245564, i.e., 
before and after r245521.

Dominique

> I don't know how to go
> about building a cross-toolchain for darwin, due to lack of darwin
> support in gas and ld.
> 
> -- 
> Alan Modra
> Australia Development Lab, IBM



Re: PR79286, ira combine_and_move_insns in loops

2017-02-21 Thread Alan Modra
On Sat, Feb 18, 2017 at 12:39:08PM +0100, Dominique d'Humières wrote:
> > I'm slightly concerned about the test and how it'll behave on targets with 
> > small address spaces. If it's a problem we can fault in adjustments.
> 
> The test fails on x86_64-apple-darwin16 with -m32 and -O1 and above.

Have you investigated just why the test fails?  I don't know how to go
about building a cross-toolchain for darwin, due to lack of darwin
support in gas and ld.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PR79286, ira combine_and_move_insns in loops

2017-02-18 Thread Dominique d'Humières
> I'm slightly concerned about the test and how it'll behave on targets with 
> small address spaces. If it's a problem we can fault in adjustments.

The test fails on x86_64-apple-darwin16 with -m32 and -O1 and above.

TIA

Dominique



Re: PR79286, ira combine_and_move_insns in loops

2017-02-16 Thread Jeff Law

On 02/10/2017 05:29 PM, Alan Modra wrote:

On Fri, Feb 03, 2017 at 01:55:33AM -0700, Jeff Law wrote:

That seems pretty pessimistic -- do we have dominance information at this
point?  If so we could check that the assignment to the register dominates
the use.   If they are in the same block, then you have to look at LUIDs or
somesuch.

That would address the problem you're trying to solve without pessimizing
the code.

Fair enough.  Revised and regression tested x86_64-linux.

PR rtl-optimization/79286
* ira.c (def_dominates_uses): New function.
(update_equiv_regs): Don't create an equivalence for insns that
may trap where the register def does not dominate the use.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.

Thanks.  Installed.

I'm slightly concerned about the test and how it'll behave on targets 
with small address spaces.  If it's a problem we can fault in adjustments.


jeff



Re: PR79286, ira combine_and_move_insns in loops

2017-02-10 Thread Alan Modra
On Fri, Feb 03, 2017 at 01:55:33AM -0700, Jeff Law wrote:
> That seems pretty pessimistic -- do we have dominance information at this
> point?  If so we could check that the assignment to the register dominates
> the use.   If they are in the same block, then you have to look at LUIDs or
> somesuch.
> 
> That would address the problem you're trying to solve without pessimizing
> the code.

Fair enough.  Revised and regression tested x86_64-linux.

PR rtl-optimization/79286
* ira.c (def_dominates_uses): New function.
(update_equiv_regs): Don't create an equivalence for insns that
may trap where the register def does not dominate the use.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index 96b4b62..6fb8aaf 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3300,6 +3300,49 @@ adjust_cleared_regs (rtx loc, const_rtx old_rtx 
ATTRIBUTE_UNUSED, void *data)
   return NULL_RTX;
 }
 
+/* Given register REGNO is set only once, return true if the defining
+   insn dominates all uses.  */
+
+static bool
+def_dominates_uses (int regno)
+{
+  df_ref def = DF_REG_DEF_CHAIN (regno);
+
+  struct df_insn_info *def_info = DF_REF_INSN_INFO (def);
+  /* If this is an artificial def (eh handler regs, hard frame pointer
+ for non-local goto, regs defined on function entry) then def_info
+ is NULL and the reg is always live before any use.  We might
+ reasonably return true in that case, but since the only call
+ of this function is currently here in ira.c when we are looking
+ at a defining insn we can't have an artificial def as that would
+ bump DF_REG_DEF_COUNT.  */
+  gcc_assert (DF_REG_DEF_COUNT (regno) == 1 && def_info != NULL);
+
+  rtx_insn *def_insn = DF_REF_INSN (def);
+  basic_block def_bb = BLOCK_FOR_INSN (def_insn);
+
+  for (df_ref use = DF_REG_USE_CHAIN (regno);
+   use;
+   use = DF_REF_NEXT_REG (use))
+{
+  struct df_insn_info *use_info = DF_REF_INSN_INFO (use);
+  /* Only check real uses, not artificial ones.  */
+  if (use_info)
+   {
+ rtx_insn *use_insn = DF_REF_INSN (use);
+ if (!DEBUG_INSN_P (use_insn))
+   {
+ basic_block use_bb = BLOCK_FOR_INSN (use_insn);
+ if (use_bb != def_bb
+ ? !dominated_by_p (CDI_DOMINATORS, use_bb, def_bb)
+ : DF_INSN_INFO_LUID (use_info) < DF_INSN_INFO_LUID (def_info))
+   return false;
+   }
+   }
+}
+  return true;
+}
+
 /* Find registers that are equivalent to a single value throughout the
compilation (either because they can be referenced in memory or are
set once from a single constant).  Lower their priority for a
@@ -3498,9 +3541,18 @@ update_equiv_regs (void)
= gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns);
 
  /* If this register is known to be equal to a constant, record that
-it is always equivalent to the constant.  */
+it is always equivalent to the constant.
+Note that it is possible to have a register use before
+the def in loops (see gcc.c-torture/execute/pr79286.c)
+where the reg is undefined on first use.  If the def insn
+won't trap we can use it as an equivalence, effectively
+choosing the "undefined" value for the reg to be the
+same as the value set by the def.  */
  if (DF_REG_DEF_COUNT (regno) == 1
- && note && ! rtx_varies_p (XEXP (note, 0), 0))
+ && note
+ && !rtx_varies_p (XEXP (note, 0), 0)
+ && (!may_trap_p (XEXP (note, 0))
+ || def_dominates_uses (regno)))
{
  rtx note_value = XEXP (note, 0);
  remove_note (insn, note);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
new file mode 100644
index 000..e6d0e93
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
@@ -0,0 +1,15 @@
+int a = 0, c = 0;
+static int d[][8] = {};
+
+int main ()
+{
+  int e;
+  for (int b = 0; b < 4; b++)
+{
+  __builtin_printf ("%d\n", b, e);
+  while (a && c++)
+   e = d[30][0];
+}
+
+  return 0;
+}


-- 
Alan Modra
Australia Development Lab, IBM


Re: PR79286, ira combine_and_move_insns in loops

2017-02-03 Thread Jeff Law

On 02/02/2017 02:31 AM, Alan Modra wrote:

Revised patch that cures the lra related -m32 -Os regression too.

The code that I'm patching here is changing a REG_EQUAL note to
REG_EQUIV, ie. asserting that the value of the reg is always the value
set by the current instruction.  Which is not always true when the
insn is in a loop and the use of the reg happens before the def.  Of
course that implies the value of the reg is initially undefined, so
it's not unreasonable to make the initial reg value the same.
Except when the code setting the initial value may trap, as it does in
the testcase.

Bootstrap and regression test on x86_64-linux in progress.  OK
assuming no regressions?

I also took a look at gcc/*.o to see whether there were any code
quality regressions.  No differences found except the expected change
in ira.o.

PR rtl-optimization/79286
* ira.c (update_equiv_regs): Do not create an equivalence for
mems that may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.
So I haven't been able to find where we've hashed through this issue 
before.  My recollection was that if we encountered a use before the def 
that we would avoid recording the equivalence when we later see the 
REG_EQUAL note.


That works within a BB.  If the use/set are in different blocks then 
don't we just need to check that the set dominates the use?


Jeff




Re: PR79286, ira combine_and_move_insns in loops

2017-02-03 Thread Jeff Law

On 02/02/2017 02:31 AM, Alan Modra wrote:

Revised patch that cures the lra related -m32 -Os regression too.

The code that I'm patching here is changing a REG_EQUAL note to
REG_EQUIV, ie. asserting that the value of the reg is always the value
set by the current instruction.  Which is not always true when the
insn is in a loop and the use of the reg happens before the def.  Of
course that implies the value of the reg is initially undefined, so
it's not unreasonable to make the initial reg value the same.
Except when the code setting the initial value may trap, as it does in
the testcase.

Bootstrap and regression test on x86_64-linux in progress.  OK
assuming no regressions?

I also took a look at gcc/*.o to see whether there were any code
quality regressions.  No differences found except the expected change
in ira.o.

PR rtl-optimization/79286
* ira.c (update_equiv_regs): Do not create an equivalence for
mems that may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.
That seems pretty pessimistic -- do we have dominance information at 
this point?  If so we could check that the assignment to the register 
dominates the use.   If they are in the same block, then you have to 
look at LUIDs or somesuch.


That would address the problem you're trying to solve without 
pessimizing the code.


THe hell of it is I know I've poked at this problem in the past.

jeff



Re: PR79286, ira combine_and_move_insns in loops

2017-02-02 Thread Alan Modra
Revised patch that cures the lra related -m32 -Os regression too.

The code that I'm patching here is changing a REG_EQUAL note to
REG_EQUIV, ie. asserting that the value of the reg is always the value
set by the current instruction.  Which is not always true when the
insn is in a loop and the use of the reg happens before the def.  Of
course that implies the value of the reg is initially undefined, so
it's not unreasonable to make the initial reg value the same.
Except when the code setting the initial value may trap, as it does in
the testcase.

Bootstrap and regression test on x86_64-linux in progress.  OK
assuming no regressions?

I also took a look at gcc/*.o to see whether there were any code
quality regressions.  No differences found except the expected change
in ira.o.

PR rtl-optimization/79286
* ira.c (update_equiv_regs): Do not create an equivalence for
mems that may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index 96b4b62..8e79929 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3500,7 +3500,9 @@ update_equiv_regs (void)
  /* If this register is known to be equal to a constant, record that
 it is always equivalent to the constant.  */
  if (DF_REG_DEF_COUNT (regno) == 1
- && note && ! rtx_varies_p (XEXP (note, 0), 0))
+ && note
+ && !rtx_varies_p (XEXP (note, 0), 0)
+ && !may_trap_p (XEXP (note, 0)))
{
  rtx note_value = XEXP (note, 0);
  remove_note (insn, note);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
new file mode 100644
index 000..e6d0e93
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
@@ -0,0 +1,15 @@
+int a = 0, c = 0;
+static int d[][8] = {};
+
+int main ()
+{
+  int e;
+  for (int b = 0; b < 4; b++)
+{
+  __builtin_printf ("%d\n", b, e);
+  while (a && c++)
+   e = d[30][0];
+}
+
+  return 0;
+}

-- 
Alan Modra
Australia Development Lab, IBM


PR79286, ira combine_and_move_insns in loops

2017-02-01 Thread Alan Modra
This patch cures PR79286 by restoring the REG_DEAD note test used
prior to r235660, but modified to only exclude insns that may trap.
I'd like to allow combine/move without a REG_DEAD note in loops
because insns in loops often lack such notes, and I recall seeing
quite a few cases at the time I wrote r235660 where loops benefited
from allowing the combine/move to happen.

I've been battling hardware instability on my x86_64 box all day, so
hopefully this finally passes bootstrap and regression testing
overnight.  OK to apply assuming no regressions?

PR rtl-optimization/79286
* ira.c (combine_and_move_insns): Don't combine or move when
use_insn does not have a REG_DEAD note and def_insn may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index 96b4b62..cdde775 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3682,6 +3682,14 @@ combine_and_move_insns (void)
   gcc_assert (DF_REG_DEF_COUNT (regno) == 1 && DF_REF_INSN_INFO (def));
   rtx_insn *def_insn = DF_REF_INSN (def);
 
+  /* Even though we know this reg is set exactly once and used
+exactly once, check that the reg dies in the use insn or that
+the def insn can't trap.  This is to exclude degenerate cases
+in loops where the use occurs before the def.  See PR79286.  */
+  if (!find_reg_note (use_insn, REG_DEAD, regno_reg_rtx[regno])
+ && may_trap_p (PATTERN (def_insn)))
+   continue;
+
   /* We may not move instructions that can throw, since that
 changes basic block boundaries and we are not prepared to
 adjust the CFG to match.  */
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
new file mode 100644
index 000..e6d0e93
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
@@ -0,0 +1,15 @@
+int a = 0, c = 0;
+static int d[][8] = {};
+
+int main ()
+{
+  int e;
+  for (int b = 0; b < 4; b++)
+{
+  __builtin_printf ("%d\n", b, e);
+  while (a && c++)
+   e = d[30][0];
+}
+
+  return 0;
+}

-- 
Alan Modra
Australia Development Lab, IBM