[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-09 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #6 from Uros Bizjak ubizjak at gmail dot com 2013-01-09 16:33:02 
UTC ---

(In reply to comment #3)

 BTW, there is a slight inconsistency between the two patterns, the first

 pattern uses sselog1 type for both the unpckldp %0, %0 and %vmovddup %1, %0 
 and

 V2DFmode mode attribute, while the second pattern uses sselog type for both of

 those and DFmode mode attribute for the movddup case.



Actually, the sselog/sselog1 difference is OK, it makes difference only in the

calculation of memory attribute. By default, sselog looks at operand[2],

which is missing when the pattern has only two operands. So, sselog1 (and all

_1 types) looks at operand[1].



Regarding mode: length calculation depends on it, so for V2DF non-avx sse insns

prefix_data16 is added, and for DF non-avx sse insn prefix_rep is added. While

for sselog insns, V2DF vs. DF makes no difference in length, movddup uses

prefix_rex, so the correct mode for movddup is DF. I will submit a trivial

patch to change this inconsistency.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-09 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #7 from Uros Bizjak ubizjak at gmail dot com 2013-01-09 17:12:16 
UTC ---

Author: vmakarov

Date: Wed Jan  9 17:02:11 2013

New Revision: 195057



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=195057

Log:

2013-01-09  Vladimir Makarov  vmaka...@redhat.com



PR rtl-optimization/pr55829

* lra-constraints.c (match_reload): Add code for absent output.

(curr_insn_transform): Add code for reloads of matched inputs

without output.



2013-01-09  Vladimir Makarov  vmaka...@redhat.com



PR rtl-optimization/pr55829

* gcc.target/i386/pr55829.c: New.





Added:

trunk/gcc/testsuite/gcc.target/i386/pr55829.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/lra-constraints.c

trunk/gcc/testsuite/ChangeLog


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-09 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-09 
17:49:56 UTC ---

Fixed, thanks.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-09 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #9 from Uros Bizjak ubizjak at gmail dot com 2013-01-09 17:52:19 
UTC ---

gcc now generates:



movqp1(%rip), %r12  # 56*movdi_internal_rex64/2 [length = 7]

movq%r12, (%rsp)# 57*movdi_internal_rex64/4 [length = 4]

movddup (%rsp), %xmm1   # 23*vec_concatv2df/3   [length = 5]



is there a reason not to load directly from p1, to avoid extra moves:



movddup p1(%rip), %xmm1


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-09 Thread vmakarov at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #10 from Vladimir Makarov vmakarov at gcc dot gnu.org 2013-01-09 
18:15:52 UTC ---

(In reply to comment #9)

 gcc now generates:

 

 movqp1(%rip), %r12  # 56*movdi_internal_rex64/2 [length = 7]

 movq%r12, (%rsp)# 57*movdi_internal_rex64/4 [length = 4]

 movddup (%rsp), %xmm1   # 23*vec_concatv2df/3   [length = 5]

 

 is there a reason not to load directly from p1, to avoid extra moves:

 

 movddup p1(%rip), %xmm1



I checked reload pass, it has the same problem (and generates even worse code:

+1 insn and using nonzero displacement).  It is possible to fix it, but it will

be not easy.  In any case, I don't think it will fixed soon as I have more

important LRA PRs.



I'll put it on my TODO list.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-08 
09:58:05 UTC ---

Created attachment 29103

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29103

gcc48-pr55829.patch



Yeah, comparing the vec_dupv2df and *vec_concatv2df patterns shows that for the

former we accept for sse3 but not avx x - 0, x - x and x - m, while for the

latter only x - 0, x and x - m, 1 and not x - x, 1, when movddup has 2

different register arguments.  With this change it doesn't ICE anymore, even

when it actually doesn't emit that form of movddup (the vec_concat of 2x

(reg:DF 62) pseudo where (reg:DF 62) is assigned r12 (it is used in the

following loop which contains calls), it is LRA reloaded into two stores of r12

into mem, once loaded into xmm1 and used from mem, i.e. for whatever reason the

x - 0, m alternative is chosen, but postreload then turns it into movddup with

both arguments xmm1 (x - 0, 0).



I think this patch can be useful and does give the RA more freedom, but it is

unclear whether it doesn't make some LRA bug latent.  Vlad?


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jakub at gcc dot gnu.org,

   ||uros at gcc dot gnu.org



--- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2013-01-08 
10:02:16 UTC ---

BTW, there is a slight inconsistency between the two patterns, the first

pattern uses sselog1 type for both the unpckldp %0, %0 and %vmovddup %1, %0 and

V2DFmode mode attribute, while the second pattern uses sselog type for both of

those and DFmode mode attribute for the movddup case.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-08 Thread vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Vladimir Makarov vmakarov at redhat dot com changed:



   What|Removed |Added



 CC||vmakarov at redhat dot com



--- Comment #4 from Vladimir Makarov vmakarov at redhat dot com 2013-01-08 
16:09:58 UTC ---

(In reply to comment #2)

 

 I think this patch can be useful and does give the RA more freedom, but it is

 unclear whether it doesn't make some LRA bug latent.  Vlad?



I am working on it on LRA side.  I hope the patch will be ready today.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-08 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



--- Comment #5 from Uros Bizjak ubizjak at gmail dot com 2013-01-08 16:27:23 
UTC ---

(In reply to comment #3)

 BTW, there is a slight inconsistency between the two patterns, the first

 pattern uses sselog1 type for both the unpckldp %0, %0 and %vmovddup %1, %0 
 and

 V2DFmode mode attribute, while the second pattern uses sselog type for both of

 those and DFmode mode attribute for the movddup case.



I will look at these once LRA fix is committed.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-07 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Keywords||ra

   Priority|P3  |P1


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2013-01-03 Thread pinskia at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Andrew Pinski pinskia at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-01-04

 Ever Confirmed|0   |1



--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2013-01-04 
03:38:40 UTC ---

Confirmed here is a more reduced testcase:

extern double p2[];

extern double ck[];

int chk_pd(void);

int sse3_test (void)

{

  int i = 0;

  int fail = 0;

  __m128d t1 = (__m128d){*p2, 0};

  __m128d t2 = __builtin_ia32_shufpd (t1, t1, 0);

  double p10 = p2[0];

  for (; i  80; i += 1)

{

ck[0] = p10;

__builtin_ia32_storeupd (p2, t2);

fail += chk_pd ();

}

}



--- CUT ---

Note the first difference with -fno-expensive-optimizations is the ira dump. 

Also note if we change t1/t2 into:

  __m128d t2 = (__m128d){*p2, *p2};

It works.  The difference between those two are:

(insn 17 13 7 2 (set (reg/v:V2DF 65 [ t2 ])

(vec_concat:V2DF (reg:DF 80 [ D.1764 ])

(reg:DF 80 [ D.1764 ]))) t6.c:11 1467 {*vec_concatv2df}

 (nil))



(insn 10 9 5 2 (set (reg/v:V2DF 63 [ t2 ])

(vec_duplicate:V2DF (reg:DF 62 [ D.1756 ]))) t6.c:9 1466 {vec_dupv2df}

 (nil))



Note both of those two RTL are the exactly the same, maybe we should convert

the vec_concat of the same value into vec_duplicate but that is a different

issue all together and would make this ICE latent.


[Bug rtl-optimization/55829] [4.8 Regression] ICE: in curr_insn_transform, at lra-constraints.c:3069 with -msse3

2012-12-31 Thread pinskia at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55829



Andrew Pinski pinskia at gcc dot gnu.org changed:



   What|Removed |Added



   Target Milestone|--- |4.8.0