[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2007-06-17 Thread pinskia at gcc dot gnu dot org


--- Comment #11 from pinskia at gcc dot gnu dot org  2007-06-18 05:28 
---
The trunk no longer produces a loop so this has been fixed unless you can get a
testcase where we still produce worse code.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

  Known to work|4.0.3   |4.0.3 4.3.0
Summary|[4.1/4.2/4.3 Regression] -  |[4.1/4.2 Regression] -ftree-
   |ftree-ch generates worse|ch generates worse code
   |code|


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-24 Thread mmitchel at gcc dot gnu dot org


--- Comment #10 from mmitchel at gcc dot gnu dot org  2006-05-25 02:34 
---
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.1.1   |4.1.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-04 Thread pinskia at gcc dot gnu dot org


--- Comment #9 from pinskia at gcc dot gnu dot org  2006-05-04 21:25 ---
(In reply to comment #8)
 WRT this code generated by tree-ch:
   D.1305_41 = Int_Loc_3 + 1;
   if (Int_Loc_3 = D.1305_41) goto L0; else goto L2;
 
 AFAICT there's exactly one value for which the comparison can be false, IMO it
 would be better to test directly that value instead of generating a new SSA
 name and another expression.

Well CH should not do this as it never sees two expressions together, only the
one COND_EXPR.  If we do a VRP after CH, it will not fix it currently either
because VRP does not record that many symbolic ranges (I forgot that PR number,
it was filed by me). If VRP did that and we added a VRP after CH but before
IV-OPTS, maybe this wil fix itself.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread dann at godzilla dot ics dot uci dot edu


--- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
18:54 ---
IMO Comment #4 does not look close enough at what is actually happening.
IMO tree-ch is the root cause here.

The code looks like this before .ch
Before .ch
  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


after .ch it looks like this: 
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L0; else goto L2; -- this just
complicates the CFG. Look below to see what are the effects of doing this in
later passes. Plus just look at the comparison ...

  # Int_Index_37 = PHI Int_Index_58(1), Int_Loc_3(0);
L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  D.1305_26 = Int_Loc_3 + 1;
  if (D.1305_26 = Int_Index_58) goto L0; else goto L2;

L2:;

Given the above CFG, critical edge splitting transforms this into:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L6; else goto L7;

L7:;
  goto bb 2 (L2);

L6:;

  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 1 (L0);

L9:;

L2:;

Given the above CFG PRE will dutifully fill with code a lot of the empty basic
blocks: 

after pre
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L6; else goto L7;

L7:;
  pretmp.34_45 = Int_Loc.0_4 * 200;
  pretmp.36_57 = (int[50] *) pretmp.34_45;
  pretmp.38_25 = Arr_2_Par_Ref_30 + pretmp.36_57;
  goto bb 2 (L2);

L6:;
  pretmp.30_26 = Int_Loc.0_4 * 200;
  pretmp.31_19 = (int[50] *) pretmp.30_26;
  pretmp.32_1 = pretmp.31_19 + Arr_2_Par_Ref_30;

  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  D.1301_54 = pretmp.30_26;
  D.1302_55 = pretmp.31_19;
  D.1303_56 = pretmp.32_1;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 1 (L0);

L9:;

  # prephitmp.39_23 = PHI D.1303_56(6), pretmp.38_25(4);
  # prephitmp.37_53 = PHI D.1302_55(6), pretmp.36_57(4);
  # prephitmp.35_49 = PHI D.1301_54(6), pretmp.34_45(4);
L2:;


Now when using -fno-tree-ch 

before critical edge splitting the code looks like this:
  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


after crited it looks like this: (i.e. no change) 

  goto bb 2 (L1);

L0:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;

and after PRE

  goto bb 2 (L1);

L0:;
  D.1301_54 = pretmp.31_49;
  D.1302_55 = pretmp.32_45;
  D.1303_56 = pretmp.33_41;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI Int_Loc_3(0), Int_Index_58(1);
L1:;
  D.1305_26 = pretmp.30_19;
  if (Int_Index_1 = D.1305_26) goto L0; else goto L2;

L2:;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



Re: [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread Andrew Pinski
 
 
 
 --- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
 18:54 ---
 IMO Comment #4 does not look close enough at what is actually happening.
 IMO tree-ch is the root cause here.
 
 Given the above CFG, critical edge splitting transforms this into:
 Given the above CFG PRE will dutifully fill with code a lot of the empty basic
 blocks: 

None of the above issues are the real issue.  TREE CH is doing the correct 
thing simplifying
the loop.  PRE is doing the correct thing by getting rid of redundants.  

The main issue is really the RA not being so good.

-- Pinski


[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread pinskia at physics dot uc dot edu


--- Comment #6 from pinskia at physics dot uc dot edu  2006-05-03 19:00 
---
Subject: Re:  [4.1/4.2 Regression] -ftree-ch generates worse code

 
 
 
 --- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
 18:54 ---
 IMO Comment #4 does not look close enough at what is actually happening.
 IMO tree-ch is the root cause here.
 
 Given the above CFG, critical edge splitting transforms this into:
 Given the above CFG PRE will dutifully fill with code a lot of the empty basic
 blocks: 

None of the above issues are the real issue.  TREE CH is doing the correct
thing simplifying
the loop.  PRE is doing the correct thing by getting rid of redundants.  

The main issue is really the RA not being so good.

-- Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread steven at gcc dot gnu dot org


--- Comment #7 from steven at gcc dot gnu dot org  2006-05-03 21:33 ---
Re. comment #5, user code could also have a CFG like that, so we should handle
this case properly (and we do, tree-ch is doing the right thing afaict).  Re.
comment #6, I don't see what the register allocator has to do with this at all. 

The bottom line is that for the case where we produce good code, IVOPTs selects
a simple addressing mode and produces a simple loop exit condition; and for the
complicated code, IVOPTs picks an addressing mode that requires a lea and an
extra register.

Look back at that loop for a moment. With tree-ch, ignoring dead code (the sets
to SSA names 5[456] are dead!), the .cunroll dump (i.e. just before IVOPTs)
looks like this:

  # Int_Index_37 = PHI Int_Index_58(6), Int_Loc_3(4);
L0:;
  (*pretmp.28_49)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 5 (L0);

That looks rather nice to me. But just after IVOPTs (in the .ivopts dump) we
have turned that simple nice code into this mess:

  # ivtmp.38_26 = PHI ivtmp.38_35(6), 0(4);
L0:;
  D.1622_34 = (int *) pretmp.28_49;
  D.1623_33 = (int *) Int_1_Par_Val_2;
  D.1624_22 = (int *) ivtmp.38_26;
  D.1625_21 = D.1623_33 + D.1624_22;
  MEM[base: D.1622_34, index: D.1625_21, step: 4B, offset: 20B] = Int_Loc_3;
  ivtmp.38_35 = ivtmp.38_26 + 1;
  D.1626_20 = (unsigned int) Int_1_Par_Val_2;
  D.1627_17 = D.1626_20 + ivtmp.38_35;
  D.1628_16 = D.1627_17 + 5;
  Int_Index_15 = (One_Fifty) D.1628_16;
  if (D.1563_41 = Int_Index_15) goto L8; else goto L9;

L8:;
  goto bb 5 (L0);

If this is caused by the register allocator, I'd like to know why you'd think
that.  And if this is the doing of tree-ch, then I'd like to know what you
expect tree-ch to do instead.  But as far as I can tell, this is just a very
poor choice by IVOPTs.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-03 Thread dann at godzilla dot ics dot uci dot edu


--- Comment #8 from dann at godzilla dot ics dot uci dot edu  2006-05-03 
21:53 ---
WRT this code generated by tree-ch:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 = D.1305_41) goto L0; else goto L2;

AFAICT there's exactly one value for which the comparison can be false, IMO it
would be better to test directly that value instead of generating a new SSA
name and another expression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-05-02 Thread steven at gcc dot gnu dot org


--- Comment #4 from steven at gcc dot gnu dot org  2006-05-02 17:38 ---
The inner loop in the .cunroll, .ivopts and .final_cleanup with GVN-PRE
disabled look like this:

  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  (*D.1561_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 = Int_Index_58) goto L8; else goto L2;

L8:;
  goto bb 4 (L0);

and

  # ivtmp.34_26 = PHI ivtmp.34_19(5), ivtmp.34_1(3);
  # Int_Index_37 = PHI Int_Index_58(5), Int_Loc_3(3);
L0:;
  D.1613_59 = (int *) ivtmp.34_26;
  MEM[base: D.1613_59, offset: 20B] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  ivtmp.34_19 = ivtmp.34_26 + 4B;
  if (D.1563_41 = Int_Index_58) goto L8; else goto L2;

L8:;
  goto bb 4 (L0);

and

L0:;
  MEM[base: (int *) ivtmp.34, offset: 20B] = Int_Loc;
  Int_Index = Int_Index + 1;
  ivtmp.34 = ivtmp.34 + 4B;
  if (D.1563 = Int_Index) goto L0; else goto L2;

which compiles to:
.L4:
addl$1, %eax
movl%ecx, 20(%edx)
addl$4, %edx
cmpl%eax, %ebx
jge .L4



With PRE enabled, we get this:

  # Int_Index_37 = PHI Int_Index_58(6), Int_Loc_3(4);
L0:;
  D.1559_54 = pretmp.27_59;
  D.1560_55 = pretmp.28_45;
  D.1561_56 = pretmp.28_49;
  (*pretmp.28_49)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 5 (L0);

and

  # ivtmp.38_26 = PHI ivtmp.38_35(6), 0(4);
L0:;
  D.1559_54 = pretmp.27_59;
  D.1560_55 = pretmp.28_45;
  D.1561_56 = pretmp.28_49;
  D.1622_34 = (int *) pretmp.28_49;
  D.1623_33 = (int *) Int_1_Par_Val_2;
  D.1624_22 = (int *) ivtmp.38_26;
  D.1625_21 = D.1623_33 + D.1624_22;
  MEM[base: D.1622_34, index: D.1625_21, step: 4B, offset: 20B] = Int_Loc_3;
  ivtmp.38_35 = ivtmp.38_26 + 1;
  D.1626_20 = (unsigned int) Int_1_Par_Val_2;
  D.1627_17 = D.1626_20 + ivtmp.38_35;
  D.1628_16 = D.1627_17 + 5;
  Int_Index_15 = (One_Fifty) D.1628_16;
  if (D.1563_41 = Int_Index_15) goto L8; else goto L9;

L8:;
  goto bb 5 (L0);

and

L0:;
  MEM[base: (int *) prephitmp.33, index: (int *) Int_1_Par_Val + (int *)
ivtmp.38, step: 4B, offset: 20B] = Int_Loc;
  ivtmp.38 = ivtmp.38 + 1;
  if ((One_Fifty) ((unsigned int) Int_1_Par_Val + 5 + ivtmp.38) = D.1563) goto
L0; else goto L2;

and from there:
.L5:
leal(%edi,%edx), %eax
addl$1, %edx
movl%ecx, 20(%ebx,%eax,4)
leal(%ecx,%edx), %eax
cmpl%esi, %eax
jle .L5

So it's a mix of PRE and IVOPTs that gives this strange code.

BTW regarding Its strange that tree-ch messes up, please next time don't
blame random passes if you don't fully analyze the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-04-16 Thread mmitchel at gcc dot gnu dot org


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-04-02 Thread pinskia at gcc dot gnu dot org


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

   Severity|normal  |minor
   Target Milestone|--- |4.1.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



Re: [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-03-31 Thread Daniel Berlin

 Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
 order? Is there something unstable in the PRE algorithm?
 

No, we just call fold on the expressions we build, and whatever it gives
us, we use :)




[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-03-31 Thread dberlin at dberlin dot org


--- Comment #3 from dberlin at gcc dot gnu dot org  2006-03-31 22:41 ---
Subject: Re:  [4.1/4.2 Regression] -ftree-ch
generates worse code


 Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
 order? Is there something unstable in the PRE algorithm?
 

No, we just call fold on the expressions we build, and whatever it gives
us, we use :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-03-30 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2006-03-30 16:25 ---
Note that this may be also PRE confusing SCEV in presence of loop headers. 
I.e. a sort of dup of PR26939.  Confirmed though.  A regression from 4.0.3,
which is also fine.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

  BugsThisDependsOn||26939
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
 GCC target triplet|i686-pc-linux-gnu   |
   Keywords||missed-optimization
  Known to work||4.0.3
   Last reconfirmed|-00-00 00:00:00 |2006-03-30 16:25:17
   date||
Summary|-ftree-ch generates worse   |[4.1/4.2 Regression] -ftree-
   |code|ch generates worse code


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944



[Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code

2006-03-30 Thread dann at godzilla dot ics dot uci dot edu


--- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-03-30 
16:43 ---
(In reply to comment #1)
 Note that this may be also PRE confusing SCEV in presence of loop headers. 

Talking about PRE, here's a maybe interesting observation in the PRE dump:

L7:;
  pretmp.30_53 = Int_Loc.0_4 * 200;
  pretmp.32_23 = (int[50] *) pretmp.30_53;
  pretmp.32_11 = pretmp.32_23 + Arr_2_Par_Ref_30;
  goto bb 4 (L2);

L6:;
  pretmp.27_59 = Int_Loc.0_4 * 200;
  pretmp.28_45 = (int[50] *) pretmp.27_59;
  pretmp.28_49 = Arr_2_Par_Ref_30 + pretmp.28_45;

  # Int_Index_37 = PHI Int_Index_58(7), Int_Loc_3(5);
L0:;
  D.1544_54 = pretmp.27_59;
  D.1545_55 = pretmp.28_45;
  D.1546_56 = pretmp.28_49;
  (*D.1546_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1548_41 = Int_Index_58) goto L8; else goto L9;

L8:;
  goto bb 3 (L0);

L9:;

  # prephitmp.33_40 = PHI D.1546_56(8), pretmp.32_11(6);
  # prephitmp.33_18 = PHI D.1545_55(8), pretmp.32_23(6);
  # prephitmp.31_25 = PHI D.1544_54(8), pretmp.30_53(6);


Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
order? Is there something unstable in the PRE algorithm?

One has to wonder what are the tree-ch effects on more complex loops. 
It might be interesting test SPEC with and without tree-ch...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944