[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-11 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

Ilya Enkovich  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Ilya Enkovich  ---
Fixed

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-11 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

--- Comment #8 from Ilya Enkovich  ---
Author: ienkovich
Date: Fri Mar 11 11:25:29 2016
New Revision: 234135

URL: https://gcc.gnu.org/viewcvs?rev=234135=gcc=rev
Log:
gcc/

PR target/70160
* config/i386/i386.c (scalar_chain::convert_reg): Skip uses
of uninitialized values.

gcc/testsuite/

PR target/70160
* gcc.target/i386/pr70160.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr70160.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

--- Comment #7 from Jakub Jelinek  ---
(In reply to Ilya Enkovich from comment #6)
> Thanks a lot for your analysis!  When chain is built we scan all register
> definitions and their uses.  Thus it includes all register definitions and
> uses.  Uninitialized uses are missed though.  Simply skipping them in
> convert_reg should be OK.  I'm testing this patch:

Ah, ok.

> I'm not sure what's wrong with debug insns.  If register is converted into a
> vector one then why can't it be used in debug insns?  It is still the same
> DI reg with the same value, we just replace its uses with V2DI subreg uses
> to force SSE hard reg allocation.

I don't have a testcase.  If the original pseudo still contains the right value
after the STV pass, then sure, there is no need to adjust debug insns.

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-10 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

Ilya Enkovich  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ienkovich at gcc dot 
gnu.org

--- Comment #6 from Ilya Enkovich  ---
Thanks a lot for your analysis!  When chain is built we scan all register
definitions and their uses.  Thus it includes all register definitions and
uses.  Uninitialized uses are missed though.  Simply skipping them in
convert_reg should be OK.  I'm testing this patch:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fa7d3ff..3d8dbc4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3372,8 +3372,11 @@ scalar_chain::convert_reg (unsigned regno)
bitmap_clear_bit (conv, DF_REF_INSN_UID (ref));
  }
   }
-else if (NONDEBUG_INSN_P (DF_REF_INSN (ref)))
+/* Skip debug insns and uninitialized uses.  */
+else if (DF_REF_CHAIN (ref)
+&& NONDEBUG_INSN_P (DF_REF_INSN (ref)))
   {
+   gcc_assert (scopy);
replace_rtx (DF_REF_INSN (ref), reg, scopy);
df_insn_rescan (DF_REF_INSN (ref));
   }


I'm not sure what's wrong with debug insns.  If register is converted into a
vector one then why can't it be used in debug insns?  It is still the same DI
reg with the same value, we just replace its uses with V2DI subreg uses to
force SSE hard reg allocation.

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

Jakub Jelinek  changed:

   What|Removed |Added

 CC||ienkovich at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
The STV pass is just weird.
For the addition of insns to chains from candidates (wonder why for candidates
it doesn't consider CONST_INT srcs, those, perhaps with somewhat higher cost,
can be either loaded from memory, or if the constant is some cheap SSE
constant, computed that way (0 or all ones)) it uses the UD and DU chains
(seems walking in both directions greedily), but then convert_reg works not
just on the definitions and uses from the current chain, but all.  For the ICE
e.g. the following works:
--- i386.c.jj1  2016-03-09 10:19:20.0 +0100
+++ i386.c  2016-03-10 11:28:37.465500423 +0100
@@ -3374,8 +3374,22 @@ scalar_chain::convert_reg (unsigned regn
   }
 else if (NONDEBUG_INSN_P (DF_REF_INSN (ref)))
   {
-   replace_rtx (DF_REF_INSN (ref), reg, scopy);
-   df_insn_rescan (DF_REF_INSN (ref));
+   if (!scalar_copy)
+ {
+   /* This can happen for uninitialized uses, if the register
+  is defined later on.  */
+   df_ref ref2;
+   df_link *chain;
+   for (ref2 = DF_INSN_UID_USES (DF_REF_INSN_UID (ref));
+ref2; ref2 = DF_REF_NEXT_LOC (ref2))
+ if (DF_REF_REGNO (ref2) == regno)
+   gcc_assert (DF_REF_CHAIN (ref2) == NULL);
+ }
+   else
+ {
+   replace_rtx (DF_REF_INSN (ref), reg, scopy);
+   df_insn_rescan (DF_REF_INSN (ref));
+ }
   }

   BITMAP_FREE (conv);
or if I didn't want to verify it, perhaps just else if (scalar_copy &&
NONDEBUG_INSN_P ...) would be enough, replacing a reg with NULL is always a
bug,
but wonder how this can work well even if e.g. the same pseudo is set multiple
times in different, unrelated chains - then Id think convert_reg will be called
on the same regno multiple times, for each chain, but have in all cases a
global action on all defs and all uses of that pseudo.
Another problem I see is with DEBUG_INSNs - the analysis phase correctly
ignores those, but e.g. convert_reg ignores them too, while I'd expect if it
replaces all other uses with scopy (for the scalar_reg case), then it sould
replace those too in the debug insns, and otherwise it should figure out if the
original pseudo lives somewhere else (subreg of something, whatever) and
replace those in the debug insns too, or worst case reset those debug insns.

Anyway, not familiar enough with STV, so I'll leave my analysis here and leave
this to Ilya.

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

--- Comment #4 from Jakub Jelinek  ---
So, combine optimizes away the looping, most likely because of the undefined
initial value of the iterator.
And at the start of STV pass we have:
(insn 13 11 15 2 (set (reg:DI 96 [ _23 ])
(const_int 1 [0x1])) 85 {*movdi_internal}
 (nil))
...
(insn 17 15 18 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  ) [1 a+0 S8 A64])
(reg/v:DI 91 [ x ])) pr70160.c:10 85 {*movdi_internal}
 (expr_list:REG_DEAD (reg/v:DI 91 [ x ])
(nil)))
...
(insn 33 22 0 2 (set (reg/v:DI 91 [ x ])
(reg:DI 96 [ _23 ])) 85 {*movdi_internal}
 (nil))
as the only instructions referring to DImode, all this in a single bb that
falls thru to EXIT.  The insn 17 sets memory to an uninitialized DImode reg,
and then insn 33 sets the same pseudo to some other DImode value (dead
assignment, as nothing uses it further).
Searching for mode conversion candidates...
  insn 17 is marked as a candidate
  insn 33 is marked as a candidate
Created a new instruction chain #1
Building chain #1...
  Adding insn 17 to chain #1
Collected chain #1...
  insns: 17
Computing gain for chain #1...
  Instruction conversion gain: 0
  Registers conversion cost: 0
  Total gain: 0
Chain #1 conversion is not profitable
Created a new instruction chain #2
Building chain #2...
  Adding insn 33 to chain #2
  r96 def in insn 13 isn't convertible
  Mark r96 def in insn 13 as requiring both modes in chain #2
Collected chain #2...
  insns: 33
  defs to convert: r96
Computing gain for chain #2...
  Instruction conversion gain: 6
  Registers conversion cost: 5
  Total gain: 1
Converting chain #2...

[Bug rtl-optimization/70160] [6 Regression] gcc ICE at -O2 (seg fault) and above on valid code on x86_64-linux-gnu

2016-03-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70160

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
   Priority|P3  |P1
  Component|tree-optimization   |rtl-optimization