[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From steven at gcc dot gnu dot org 2005-01-29 02:34 --- *** Bug 19680 has been marked as a duplicate of this bug. *** -- What|Removed |Added CC||tbptbp at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-05 04:29 --- *** Bug 17647 has been marked as a duplicate of this bug. *** -- What|Removed |Added CC||uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From giovannibajo at libero dot it 2004-11-28 23:38 --- While the patch looks great to me, it is not feasable as you said for 4.0. Since this is a 4.0 regression, we should probably look for a way to fix this problem in a less intrusive (even if not totally correct) way on 4.0. Do you have any idea? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-11-28 22:56 --- I have the (experimental) patch for addressing mode selection on trees (http://atrey.karlin.mff.cuni.cz/~rakdver/diff_lower_address.diff). It indeed helps; we get i = 0; :; mem[aa + 4B * i]{*D.1047} = mem[a + 4B * i]{*D.1048}; mem[bb + 4B * i]{*D.1050} = mem[b + 4B * i]{*D.1051}; i = (int) ((unsigned int) i + 1); if (n > i) goto ; else goto ; :; in .vars dump and .L4: movl(%ebp,%edx,4), %eax movl%eax, (%esi,%edx,4) movl(%edi,%edx,4), %eax movl%eax, (%ebx,%edx,4) incl%edx cmpl%edx, %ecx jg .L4 in the assembler, which seems fine (except that the memory references are not reordered; maybe some of the aliasing information gets lost due to the patch currently). The patch definitely won't make it for 4.0, but I would like to get it or something similar to 4.1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-11-26 08:12 --- The problem indeed is ivopts - dom interaction. Ivopts decide that since reg + 4 * reg is a cheap addressing mode, there is no reason to do anything else than what it does. To cure this we need to be able to allow ivopts to express more clearly that it does not want an expression to be played with; I think the best solution is to have a tree code that would map directly to the memory access (including the addressing mode). I am working on the patch. -- What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rakdver at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-26 05:11 --- Actually I missed that you have to use -fomit-frame-pointer, so this is not related to PR 18137 after all. -- What|Removed |Added BugsThisDependsOn|18137 | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-25 23:55 --- PR 18137 is the one which is about reload fucking up and pull the load of the arguments into the loop. -- What|Removed |Added BugsThisDependsOn||18137 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-25 23:29 --- This is mostly a iv-opts problem. But note we still don't get the most optimal code with -fno-ivopts: .L4: movl8(%ebp), %ebx movl(%ebx,%edx,4), %eax movl20(%ebp), %ebx movl%eax, (%esi,%edx,4) movl(%edi,%edx,4), %eax movl%eax, (%ebx,%edx,4) incl%edx cmpl%edx, %ecx jg .L4 But that is because of we are pulling in the load from the agruments into the loop (that is a different bug but I think I should mark that as a regression). We still get the same asm as given in comment #0 with -fivopts still on. -- What|Removed |Added CC||rakdver at gcc dot gnu dot ||org Component|rtl-optimization|tree-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-13 18:14 --- Here is the reduced testcase for the problem, it has nothing to do with loops at all: void fcpy(float *restrict a, float *restrict b, float *restrict aa, float *restrict bb, unsigned n) { aa[n]=a[n]; bb[n]=b[n]; } DOM is doing CSE of n*4 which is the right thing to do. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-13 17:54 --- For PPC at least IV-OPTS should note that we have post increment and decrement the pointers before the loop and then increment all of them inside the loop, aka: void fcpy(float *restrict a, float *restrict b, float *restrict aa, float *restrict bb, unsigned n) { unsigned i; aa-=1; a-=1; bb-=1; b-=1; for(i = 0; i < n; i++) { aa+=1; a+=1; bb+=1; b+=1; *bb=*b; *aa=*a; } } So we get: L4: lfsu f0,4(r4) lfsu f13,4(r3) stfsu f0,4(r6) stfsu f13,4(r5) bdnz L4 which is the most optimal for PPC -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From steven at gcc dot gnu dot org 2004-11-13 17:52 --- At least x86 and ARM have {reg + reg OP const} addressing modes. Unfortunately we rip such expressions apart already in the gimplifier. This is something we canot fix properly on trees. TER could perhaps do it, but that pass should really go away itself, and we don't know anything about addressing modes on trees anyway. Looks like we need to teach an RTL loop optimizer about this... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-13 17:42 --- Though I should note that PPC is much better on the mainline than before: gcc 4.0.0: L4: lfsx f0,r3,r2 stfsx f0,r5,r2 lfsx f13,r4,r2 stfsx f13,r6,r2 addi r2,r2,4 bdnz L4 gcc 3.3 (Apple's): L9: slwi r7,r11,2 addi r11,r11,1 lfsx f0,r7,r3 stfsx f0,r7,r5 lfsx f1,r7,r4 stfsx f1,r7,r6 bdnz L9 So really this is a target specific bug :). Also here the loop for x86_64: .L4: movl(%rdx,%r10), %eax incl%ecx movl%eax, (%rdx,%r9) movl(%rdx,%rdi), %eax movl%eax, (%rdx,%rsi) addq$4, %rdx cmpl%ecx, %r8d jg .L4 Note changing the type of n and i to be unsigned we get slightly better code: .L4: movl-16(%ebp), %ebx leal0(,%ecx,4), %eax incl%ecx cmpl%ecx, 24(%ebp) movl(%ebx,%eax), %edx movl-20(%ebp), %ebx movl%edx, (%edi,%eax) movl(%esi,%eax), %edx movl%edx, (%ebx,%eax) jne .L4 So IV-OPTs is not doing its job correctly in one place. -- What|Removed |Added GCC target triplet||i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463
[Bug tree-optimization/18463] [4.0 Regression] Moving floating point through an integer register
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-13 17:22 --- Confirmed, the problem is that DOM does: D.1192 = (unsigned int) i; D.1194 = (float * restrict) D.1192 * 4B; *(aa2 + D.1194) = *(a2 + D.1194); *(bb2 + D.1194) = *(b2 + D.1194); Note how we use D.1194 in all three places. for PPC this is the correct thing to do but not for x86 which has three operands loads. -- What|Removed |Added Status|UNCONFIRMED |NEW Component|middle-end |tree-optimization Ever Confirmed||1 Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2004-11-13 17:22:54 date|| Target Milestone|--- |4.0.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463