[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-05-24 Thread jamborm at gcc dot gnu dot org


--- Comment #11 from jamborm at gcc dot gnu dot org  2010-05-24 09:43 
---
(In reply to comment #9)
 (In reply to comment #7)
  This is now fixed on both the trunk and the 4.5 branch.
  
 
 this commit produces broken libkhtml.so.5.4.0 from kdelibs-4.4.3.
 in details, it produces different/broken binaries for khtml/css/parser.cpp
 and khtml/svg/SVGGradientElement.cpp.
 

Please file this as a separate bug and CC me.  I can't promise I'll be
able to look at it this week though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-05-24 Thread pluto at agmk dot net


--- Comment #12 from pluto at agmk dot net  2010-05-24 11:04 ---
(From update of attachment 20731)
moved to separated PR44258.


-- 

pluto at agmk dot net changed:

   What|Removed |Added

  Attachment #20731|0   |1
is obsolete||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-05-23 Thread pluto at agmk dot net


--- Comment #9 from pluto at agmk dot net  2010-05-23 11:53 ---
(In reply to comment #7)
 This is now fixed on both the trunk and the 4.5 branch.
 

this commit produces broken libkhtml.so.5.4.0 from kdelibs-4.4.3.
in details, it produces different/broken binaries for khtml/css/parser.cpp
and khtml/svg/SVGGradientElement.cpp.

finally we get nice GPF during knode/kmail/konqueror startup:

[KCrash Handler]
#5  memcpy () at ../sysdeps/x86_64/memcpy.S:78
#6  0x7f546e63fc5e in QString::QString(QChar const*, int) () from 
/usr/lib64/libQtCore.so.4
#7  0x7f5469f70e2e in qString (ps=value optimized out) at 
/usr/src/debug/kdelibs-4.4.3/khtml/css/cssparser.h:84
#8  DOM::CSSParser::parseValue (ps=value optimized out) at 
/usr/src/debug/kdelibs-4.4.3/khtml/css/cssparser.cpp:518
#9  0x7f5469f95075 in cssyyparse (parser=0x7fff08c22820) at 
/usr/src/debug/kdelibs-4.4.3/khtml/css/parser.cpp:2969
#10 0x7f5469f67d00 in DOM::CSSParser::runParser (this=0x7fff08c22820) at 
/usr/src/debug/kdelibs-4.4.3/khtml/css/cssparser.cpp:151
(...)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-05-23 Thread pluto at agmk dot net


--- Comment #10 from pluto at agmk dot net  2010-05-23 21:25 ---
Created an attachment (id=20731)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20731action=view)
parser.i from kdelibs.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-28 Thread jamborm at gcc dot gnu dot org


--- Comment #6 from jamborm at gcc dot gnu dot org  2010-04-28 13:10 ---
Subject: Bug 43846

Author: jamborm
Date: Wed Apr 28 13:09:56 2010
New Revision: 158826

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=158826
Log:
2010-04-28  Martin Jambor  mjam...@suse.cz

PR tree-optimization/43846
* tree-sra.c (struct access): New flag grp_assignment_read.
(build_accesses_from_assign): Set grp_assignment_read.
(sort_and_splice_var_accesses): Propagate grp_assignment_read.
(enum mark_read_status): New type.
(analyze_access_subtree): Propagate grp_assignment_read, create
accesses also if both direct_read and root-grp_assignment_read.

* testsuite/gcc.dg/tree-ssa/sra-10.c: New test.


Added:
branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/tree-ssa/sra-10.c
Modified:
branches/gcc-4_5-branch/gcc/ChangeLog
branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
branches/gcc-4_5-branch/gcc/tree-sra.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-28 Thread jamborm at gcc dot gnu dot org


--- Comment #7 from jamborm at gcc dot gnu dot org  2010-04-28 13:15 ---
This is now fixed on both the trunk and the 4.5 branch.


-- 

jamborm at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-28 Thread tbptbp at gmail dot com


--- Comment #8 from tbptbp at gmail dot com  2010-04-28 13:43 ---
Allow me to extend to you my most profuse praises and blessing; may all the
woman in your vicinity fall pregnant and your male progeny be granted abounding
chest hair. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-23 Thread jamborm at gcc dot gnu dot org


--- Comment #5 from jamborm at gcc dot gnu dot org  2010-04-23 14:52 ---
Subject: Bug 43846

Author: jamborm
Date: Fri Apr 23 14:52:06 2010
New Revision: 158668

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=158668
Log:
2010-04-23  Martin Jambor  mjam...@suse.cz

PR tree-optimization/43846
* tree-sra.c (struct access): New flag grp_assignment_read.
(build_accesses_from_assign): Set grp_assignment_read.
(sort_and_splice_var_accesses): Propagate grp_assignment_read.
(enum mark_read_status): New type.
(analyze_access_subtree): Propagate grp_assignment_read, create
accesses also if both direct_read and root-grp_assignment_read.

* testsuite/gcc.dg/tree-ssa/sra-10.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/sra-10.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-sra.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-22 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2010-04-22 09:07 ---
Hm, frob1 looks like

_Z5frob1RK5foo_tRS_:
.LFB18:
movss   (%rdi), %xmm3
movss   4(%rdi), %xmm2
movaps  %xmm3, %xmm4
movaps  %xmm2, %xmm0
mulss   %xmm3, %xmm4
movss   8(%rdi), %xmm1
mulss   %xmm2, %xmm0
addss   %xmm4, %xmm0
movaps  %xmm1, %xmm4
mulss   %xmm1, %xmm4
addss   %xmm4, %xmm0
rsqrtss %xmm0, %xmm4
mulss   %xmm4, %xmm0
mulss   %xmm4, %xmm0
mulss   .LC1(%rip), %xmm4
addss   .LC0(%rip), %xmm0
mulss   %xmm4, %xmm0
mulss   %xmm0, %xmm3
mulss   %xmm0, %xmm2
mulss   %xmm1, %xmm0
movss   %xmm3, (%rsi)
movss   %xmm2, 4(%rsi)
movss   %xmm0, 8(%rsi)
ret

and frob2 like

_Z5frob2RK5bar_tRS_:
.LFB19:
movss   (%rdi), %xmm3
movss   4(%rdi), %xmm2
movaps  %xmm3, %xmm4
movaps  %xmm2, %xmm0
mulss   %xmm3, %xmm4
movss   8(%rdi), %xmm1
mulss   %xmm2, %xmm0
addss   %xmm4, %xmm0
movaps  %xmm1, %xmm4
mulss   %xmm1, %xmm4
addss   %xmm4, %xmm0
rsqrtss %xmm0, %xmm4
mulss   %xmm4, %xmm0
mulss   %xmm4, %xmm0
mulss   .LC1(%rip), %xmm4
addss   .LC0(%rip), %xmm0
mulss   %xmm4, %xmm0
mulss   %xmm0, %xmm3
mulss   %xmm0, %xmm2
mulss   %xmm1, %xmm0
movss   %xmm3, -24(%rsp)
movss   %xmm2, -20(%rsp)
movq-24(%rsp), %rax
movss   %xmm0, -16(%rsp)
movq%rax, (%rsi)
movl-16(%rsp), %eax
movl%eax, 8(%rsi)
ret

so it's an aggregate copy that is not scalarized in frob2:

  b_1(D)-x = D.2444_20;
  b_1(D)-y = D.2443_19;
  b_1(D)-z = D.2442_18;
  return;

vs.

  D.2464.m[0] = D.2473_20;
  D.2464.m[1] = D.2472_19;
  D.2464.m[2] = D.2471_18;
  *b_1(D) = D.2464;
  return;

all inlining happens during early inlining and frob1 and frob2 are reasonably
similar after early inlining.

But then we have early SRA which does

;; Function void frob1(const foo_t, foo_t) (_Z5frob1RK5foo_tRS_)

Candidate (2452): D.2452
Candidate (2434): v
Candidate (2435): D.2435
Will attempt to totally scalarize D.2435 (UID: 2435):
Will attempt to totally scalarize D.2452 (UID: 2452):
Marking v offset: 0, size: 32:  to be replaced.
Marking v offset: 32, size: 32:  to be replaced.
Marking v offset: 64, size: 32:  to be replaced.
...

;; Function void frob2(const bar_t, bar_t) (_Z5frob2RK5bar_tRS_)

Candidate (2481): D.2481
Candidate (2464): D.2464
Candidate (2463): v
Marking v offset: 0, size: 32:  to be replaced.
Marking v offset: 32, size: 32:  to be replaced.
Marking v offset: 64, size: 32:  to be replaced.
...
! Disqualifying D.2464 - No scalar replacements to be created.

so it doesn't consider the struct with the array for total scalarization
for some reason.  Martin?


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu dot
   ||org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-04-22 09:07:40
   date||
Summary|4.5.0 regression, array vs  |[4.5 Regression] array vs
   |members, dead code removal  |members, total scalarization
   |issues  |issues
   Target Milestone|--- |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-22 Thread jamborm at gcc dot gnu dot org


--- Comment #2 from jamborm at gcc dot gnu dot org  2010-04-22 12:35 ---
(In reply to comment #1)
 
 so it doesn't consider the struct with the array for total scalarization
 for some reason.  Martin?
 

Well, that was a deliberate decision when fixing PR 42585 (see
type_consists_of_records_p).  The code is simpler because it does not
have to know how to iterate over the array index domain.

Of course, we can alleviate this restriction and learn how to iterate.
However, all the accesses for the whole array are already created,
that is not the issue.  The problem basically is that when we see the
sequence

  D.2035.m[0] = D.2044_20;
  D.2035.m[1] = D.2043_19;
  D.2035.m[2] = D.2042_18;
  *b_1(D) = D.2035;

(and there are no other accesses to D.2035) the condition that tries
to prevent us from creating unnecessary replacements kicks in and we
decide not to scalarize.  The intent of the current code (possibly
among other reasons) was to avoid going through a replacement when the
whole structure was then passed as an argument to a function and
similar situations.  But it should not be very difficult to change the
condition (in analyze_access_subtree) to handle both situations right.

Doing this, rather than total scalarization for arrays (which should
be only useful as a substitute for a copy propagation) should enable
us to handle even huge arrays.

I'll get to this right after dealing with PR 43835.


-- 

jamborm at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |jamborm at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-04-22 09:07:40 |2010-04-22 12:35:41
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-22 Thread davidxl at gcc dot gnu dot org


--- Comment #3 from davidxl at gcc dot gnu dot org  2010-04-22 17:04 ---
(In reply to comment #2)
 (In reply to comment #1)
  
  so it doesn't consider the struct with the array for total scalarization
  for some reason.  Martin?
  
 
 Well, that was a deliberate decision when fixing PR 42585 (see
 type_consists_of_records_p).  The code is simpler because it does not
 have to know how to iterate over the array index domain.
 
 Of course, we can alleviate this restriction and learn how to iterate.
 However, all the accesses for the whole array are already created,
 that is not the issue.  The problem basically is that when we see the
 sequence
 
   D.2035.m[0] = D.2044_20;
   D.2035.m[1] = D.2043_19;
   D.2035.m[2] = D.2042_18;
   *b_1(D) = D.2035;
 
 (and there are no other accesses to D.2035) the condition that tries
 to prevent us from creating unnecessary replacements kicks in and we
 decide not to scalarize. 

This code sequence looks like a good motivating factor for
scalarizing/expansion. In fact, small arrays should be treated the same way as
records if all accesses are through compile time constant indices. This is a
common scenario after full unrolling. 

 The intent of the current code (possibly
 among other reasons) was to avoid going through a replacement when the
 whole structure was then passed as an argument to a function and
 similar situations. 

If the temp aggregate is passed to call and the calling convention is not
exposed at the IL level, then it is not a good sra candidate as no copy (both
code and storage) elimination will be exposed. In this one, the temp aggregate
is used as the RHS of an assignment, thus it is a good candidate to expand. So
will be the reverse case:

aggregate1 = aggregate2;
 ..
... = aggregate1.e1;
... = aggregate1.e2;

David

 But it should not be very difficult to change the
 condition (in analyze_access_subtree) to handle both situations right.
 
 Doing this, rather than total scalarization for arrays (which should
 be only useful as a substitute for a copy propagation) should enable
 us to handle even huge arrays.
 
 I'll get to this right after dealing with PR 43835.
 


-- 

davidxl at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||xinliangli at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846



[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues

2010-04-22 Thread jamborm at gcc dot gnu dot org


--- Comment #4 from jamborm at gcc dot gnu dot org  2010-04-22 17:18 ---
Created an attachment (id=20464)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20464action=view)
Proposed fix

I'm currently testing this patch and will submit it tomorrow if everything goes
OK.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846