[Bug rtl-optimization/15792] missed subreg optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792 Andrew Pinski changed: What|Removed |Added Known to fail|| --- Comment #12 from Andrew Pinski --- (In reply to Gabriel Ravier from comment #11) > Seems like the issue is present again, except it's test1 that gets the > better asm now. Perhaps this should be re-opened ? This bug was about 32bit x86 and the code looks good in GCC 9, 10, 11, and 12 and the trunk. If you were testing on x86_64, you need to use __int128_t to see what the original issue was about: void gh(); void test(__int128_t x) { long g = (long)x|((long)(x>>64)); if (g) gh(); } void test1(__int128_t x) { if (x) gh(); } GCC 4.8+ produces: test1: .cfi_startproc orq %rdi, %rsi jne .L7 rep ret For both. There was an extra mov in GCC 4.5.0-4.7.0 for test though. In GCC 4.4.0, test1 was two compare and jumps (ok). GCC 4.1.2 had the bad code generation which was mentioned in comment #0.
[Bug rtl-optimization/15792] missed subreg optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792 Gabriel Ravier changed: What|Removed |Added CC||gabravier at gmail dot com --- Comment #11 from Gabriel Ravier --- Seems like the issue is present again, except it's test1 that gets the better asm now. Perhaps this should be re-opened ?
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #10 from rask at gcc dot gnu dot org 2007-11-10 00:15 --- This was fixed in 4.3.0. -- rask at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Keywords||ra Known to fail||4.1.2 4.2.0 4.2.1 4.2.2 Known to work||4.3.0 Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #9 from ian at airs dot com 2006-02-07 08:23 --- I now have a reasonably simple reload patch which eliminates the unnecessary move. For the test case in comment #4, I get this code with -O2 -momit-leaf-frame-pointer: foo: movl12(%esp), %eax movl16(%esp), %edx addl4(%esp), %eax adcl8(%esp), %edx orl %eax, %edx jne .L7 rep ; ret .p2align 4,,7 .L7: jmp gh -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #7 from tony dot linthicum at amd dot com 2006-02-06 17:13 --- So do I, at least for the original code (i.e. test and test1). I'm curious, though, if you've tried the example that I listed above (foo). I still get subregs with that one, though I honestly don't recall at the moment whether or not it makes the register allocator screw up or not (I *think* it does, but I'd have to check). Either way, though, the presence of the subregs provides the needed fodder for RA badness so I'm curious if it's present in what you're working on. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #8 from ian at airs dot com 2006-02-07 00:30 --- Yes, I still get an unnecessary move in your test case which uses addition. One reason this happens is because the addition can not be split until after the reload pass is complete. That is because the add relies on the condition code registers, but reload can clobber the condition code registers between any arbitrary pair of instructions. Another reason this happens is that the compiler knows how to set the condition flags using a bitwise or, but it does so using a scratch register to hold the destination of the bitwise or. The register allocator is not clever enough to see that if it has a DImode pair of registers which dies in the insn, that it can use the second register in the DImode pair as the scratch register. If the register allocator saw that, then it could use that register as the scratch register and avoid allocating a new scratch register and copying the value into it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #6 from ian at airs dot com 2006-02-02 18:18 --- With the version of RTH's subreg lowering pass which I am working on, I get identical code for both functions: test1: movl8(%esp), %eax orl 4(%esp), %eax jne .L7 ret .p2align 4,,7 .L7: jmp gh -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #4 from tony dot linthicum at amd dot com 2006-01-20 15:48 --- I've been looking at this a bit, and tried the patch. It does indeed fix the problem in test1 above, but it does not appear to be the complete solution. The load of 'x' in test1 is actually split fairly early, and from what I can tell, the superfluous move is actually the result of the register allocator doing a poor job of live range analysis when confronted with subregs. I suspect this is why most things (i.e. those things other than branches) are not split into subregs until after reload. Unfortunately, the subreg lowering won't touch a subreg if it's seen a reference to the inner register so we get the same unnecessary move if the code looks like: foo(long long y, long long z) { unsigned long long x; x = y + z; if (x) gh(); } I'm going to experiment with moving where the subreg lowering code occurs and moving up the splitting into subregs and see if I can get the desired results. I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds then please let me know. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #5 from pinskia at gcc dot gnu dot org 2006-01-20 15:52 --- (In reply to comment #4) I'm going to experiment with moving where the subreg lowering code occurs and moving up the splitting into subregs and see if I can get the desired results. I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds then please let me know. This seems right but the other issue is that register allocator allocates DI as two consecutive register as one (that might be only part of the cause). -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||tony dot linthicum at amd ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792
[Bug rtl-optimization/15792] missed subreg optimization
--- Comment #3 from pinskia at gcc dot gnu dot org 2006-01-18 04:45 --- The problem here is that we don't split up the subregister early before register allocation. If we split it up before combine, we would be able to combine the or and get the more optimial results. A patch like http://gcc.gnu.org/ml/gcc-patches/2005-05/msg00554.html should help. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792