[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2005-01-28 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-29 
02:34 ---
*** Bug 19680 has been marked as a duplicate of this bug. ***

-- 
   What|Removed |Added

 CC||tbptbp at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-12-04 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-05 
04:29 ---
*** Bug 17647 has been marked as a duplicate of this bug. ***

-- 
   What|Removed |Added

 CC||uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-28 Thread giovannibajo at libero dot it

--- Additional Comments From giovannibajo at libero dot it  2004-11-28 
23:38 ---
While the patch looks great to me, it is not feasable as you said for 4.0. 
Since this is a 4.0 regression, we should probably look for a way to fix this 
problem in a less intrusive (even if not totally correct) way on 4.0. Do you 
have any idea?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-28 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-11-28 
22:56 ---
I have the (experimental) patch for addressing mode selection on trees
(http://atrey.karlin.mff.cuni.cz/~rakdver/diff_lower_address.diff).
It indeed helps; we get

  i = 0;

:;
  mem[aa + 4B * i]{*D.1047} = mem[a + 4B * i]{*D.1048};
  mem[bb + 4B * i]{*D.1050} = mem[b + 4B * i]{*D.1051};
  i = (int) ((unsigned int) i + 1);
  if (n > i) goto ; else goto ;

:;

in .vars dump and

.L4:
movl(%ebp,%edx,4), %eax
movl%eax, (%esi,%edx,4) 
movl(%edi,%edx,4), %eax 
movl%eax, (%ebx,%edx,4)
incl%edx
cmpl%edx, %ecx
jg  .L4

in the assembler, which seems fine (except that the memory references are not
reordered; maybe some of the aliasing information gets lost due to the patch
currently).

The patch definitely won't make it for 4.0, but I would like to get it or
something similar to 4.1.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-26 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-11-26 
08:12 ---
The problem indeed is ivopts - dom interaction. Ivopts decide that since
reg + 4 * reg is a cheap addressing mode, there is no reason to do anything
else than what it does.  To cure this we need to be able to allow ivopts
to express more clearly that it does not want an expression to be played with;
I think the best solution is to have a tree code that would map directly to the
memory access (including the addressing mode).  I am working on the patch.

-- 
   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rakdver at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-25 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-26 
05:11 ---
Actually I missed that you have to use -fomit-frame-pointer, so this is not 
related to PR 18137 after all.

-- 
   What|Removed |Added

  BugsThisDependsOn|18137   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-25 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-25 
23:55 ---
PR 18137 is the one which is about reload fucking up and pull the load of the 
arguments into the loop.

-- 
   What|Removed |Added

  BugsThisDependsOn||18137


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-25 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-25 
23:29 ---
This is mostly a iv-opts problem.
But note we still don't get the most optimal code with -fno-ivopts:
.L4:
movl8(%ebp), %ebx
movl(%ebx,%edx,4), %eax
movl20(%ebp), %ebx
movl%eax, (%esi,%edx,4)
movl(%edi,%edx,4), %eax
movl%eax, (%ebx,%edx,4)
incl%edx
cmpl%edx, %ecx
jg  .L4

But that is because of we are pulling in the load from the agruments into the 
loop (that is a different 
bug but I think I should mark that as a regression).

We still get the same asm as given in comment #0 with -fivopts still on.

-- 
   What|Removed |Added

 CC||rakdver at gcc dot gnu dot
   ||org
  Component|rtl-optimization|tree-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-13 
18:14 ---
Here is the reduced testcase for the problem, it has nothing to do with loops 
at all:
void
fcpy(float *restrict a,  float *restrict b,
 float *restrict aa, float *restrict bb, unsigned n)
{
aa[n]=a[n];
bb[n]=b[n];
}

DOM is doing CSE of n*4 which is the right thing to do.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-13 
17:54 ---
For PPC at least IV-OPTS should note that we have post increment and decrement 
the pointers before 
the loop and then increment all of them inside the loop, aka:
void
fcpy(float *restrict a,  float *restrict b,
 float *restrict aa, float *restrict bb, unsigned n)
{
unsigned i;
aa-=1; a-=1; bb-=1; b-=1;
for(i = 0; i < n; i++) {
aa+=1; a+=1; bb+=1; b+=1;
*bb=*b;
*aa=*a;
}
}
So we get:
L4:
lfsu f0,4(r4)
lfsu f13,4(r3)
stfsu f0,4(r6)
stfsu f13,4(r5)
bdnz L4
which is the most optimal for PPC

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-13 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-11-13 
17:52 ---
At least x86 and ARM have {reg + reg OP const} addressing 
modes.  Unfortunately we rip such expressions apart already 
in the gimplifier.  This is something we canot fix properly 
on trees.  TER could perhaps do it, but that pass should 
really go away itself, and we don't know anything about 
addressing modes on trees anyway.   Looks like we need to 
teach an RTL loop optimizer about this... 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-13 
17:42 ---
Though I should note that PPC is much better on the mainline than before:
gcc 4.0.0:
L4:
lfsx f0,r3,r2
stfsx f0,r5,r2
lfsx f13,r4,r2
stfsx f13,r6,r2
addi r2,r2,4
bdnz L4

gcc 3.3 (Apple's):
L9:
slwi r7,r11,2
addi r11,r11,1
lfsx f0,r7,r3
stfsx f0,r7,r5
lfsx f1,r7,r4
stfsx f1,r7,r6
bdnz L9

So really this is a target specific bug :).

Also here the loop for x86_64:
.L4:
movl(%rdx,%r10), %eax
incl%ecx
movl%eax, (%rdx,%r9)
movl(%rdx,%rdi), %eax
movl%eax, (%rdx,%rsi)
addq$4, %rdx
cmpl%ecx, %r8d
jg  .L4

Note changing the type of n and i to be unsigned we get slightly better code:
.L4:
movl-16(%ebp), %ebx
leal0(,%ecx,4), %eax
incl%ecx
cmpl%ecx, 24(%ebp)
movl(%ebx,%eax), %edx
movl-20(%ebp), %ebx
movl%edx, (%edi,%eax)
movl(%esi,%eax), %edx
movl%edx, (%ebx,%eax)
jne .L4

So IV-OPTs is not doing its job correctly in one place.

-- 
   What|Removed |Added

 GCC target triplet||i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463


[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register

2004-11-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-13 
17:22 ---
Confirmed, the problem is that DOM does:
  D.1192 = (unsigned int) i;
  D.1194 = (float * restrict) D.1192 * 4B;
  *(aa2 + D.1194) = *(a2 + D.1194);
  *(bb2 + D.1194) = *(b2 + D.1194);

Note how we use D.1194 in all three places. for PPC this is the correct thing 
to do but not for x86 which 
has three operands loads.

-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Component|middle-end  |tree-optimization
 Ever Confirmed||1
   Keywords||missed-optimization
   Last reconfirmed|-00-00 00:00:00 |2004-11-13 17:22:54
   date||
   Target Milestone|--- |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463