[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 & ARM)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 Daniel Santos changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #10 from Daniel Santos --- (In reply to Richard Earnshaw from comment #8) > Unfortunately, computers don't to infinite precision arithmetic by default. > That would perform a different comparison in that it checks that r0 > r1, > not whether r0 - r1 > 0. The difference, for signed comparisons, is when > overflow occurs. Looks like I've forgotten to close this bug after retesting. Thank you again Richard for clarifying this. When modifying the compare function as described above, optimization is perfect on all targets I've tested (x86, ARM and MIPS).
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #9 from Daniel Santos daniel.santos at pobox dot com --- I appologize for my late response. (In reply to Richard Earnshaw from comment #8) Unfortunately, computers don't to infinite precision arithmetic by default. That would perform a different comparison in that it checks that r0 r1, not whether r0 - r1 0. The difference, for signed comparisons, is when overflow occurs. Consider the case where (in your original code) a has the value INT_MIN (ie -2147483648) and b has the value 1. Now clearly a b and by the normal rules of arithmetic (infinite precision) we would expect a - b to be less than zero. However, INT_MIN - 1 cannot be represented in a 32-bit long value and becomes INT_MAX due to overflow; the result is that for these values a - b 0! On ARM and x86, the flag setting that results from a subtract operation is, in effect a comparison of the original operands, rather than a comparison of the result; that is on ARM subs rd, rn, rm is equivalent to cmp rn, rm except that the register rd is not written by the comparison. Power PC is different: it's subtract and compare instruction really does use the result of the subtraction to form the comparison. Thank you very much for your work on this. In re-examining, I'm suspecting that this may be an invalid bug. :( I have modified the test program slightly: extern print_gt(void); extern print_lt(void); extern print_eq(void); void cmp_and_branch(long a, long b) { long diff = a b ? 1 : (a b ? -1 : 0); if (diff 0) { print_gt(); } else if (diff 0) { print_lt(); } else { print_eq(); } } I thought that I had originally tried this and gotten worse results (although the diff was being done via a complicated -findirect-inline situation), but this version of the program leaves a finite number of options. When compiled on x86_64 and ARM, both are flawless: ARM cmp_and_branch: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r0, r1 bgt .L2 blt .L5 b print_eq .L2: b print_gt .L5: b print_lt .size cmp_and_branch, .-cmp_and_branch .ident GCC: (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3 .section.note.GNU-stack,,%progbits x86_64 cmp_and_branch: .LFB0: .cfi_startproc cmpq%rsi, %rdi jg .L2 jl .L5 jmp print_eq .p2align 4,,10 .p2align 3 .L2: jmp print_gt .p2align 4,,10 .p2align 3 .L5: jmp print_lt .cfi_endproc I don't want to close this bug just yet, I want to reset in my other code. This will certainly help clean up some of my code!!
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #8 from Richard Earnshaw rearnsha at gcc dot gnu.org --- (In reply to Daniel Santos from comment #7) First off, I apologize for my late response here. (In reply to comment #5) I'm going to respond a little backwards.. In fact, on ARM there is no branch instruction that can be used for 0 as a side effect of a subtract. To get the desired effect the code would have to be completely re-arranged to factor out the 0 (bmi) and then == 0 (beq) cases first. I'm not an ARM programmer, but I'm looking at my reference book and it would appear that BGT would perform a branch of greater than for signed comparison and and BHI for unsigned comparison. Again, convert the subtraction into a comparison (subtract, but discard the result) and branch based upon the flags (for signed numbers): cmpr0, r1 bgt.L1 bne.L2 ;handle equality here Unfortunately, computers don't to infinite precision arithmetic by default. That would perform a different comparison in that it checks that r0 r1, not whether r0 - r1 0. The difference, for signed comparisons, is when overflow occurs. Consider the case where (in your original code) a has the value INT_MIN (ie -2147483648) and b has the value 1. Now clearly a b and by the normal rules of arithmetic (infinite precision) we would expect a - b to be less than zero. However, INT_MIN - 1 cannot be represented in a 32-bit long value and becomes INT_MAX due to overflow; the result is that for these values a - b 0! On ARM and x86, the flag setting that results from a subtract operation is, in effect a comparison of the original operands, rather than a comparison of the result; that is on ARM subs rd, rn, rm is equivalent to cmp rn, rm except that the register rd is not written by the comparison. Power PC is different: it's subtract and compare instruction really does use the result of the subtraction to form the comparison.
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #7 from Daniel Santos daniel.santos at pobox dot com 2012-11-15 21:56:02 UTC --- First off, I apologize for my late response here. (In reply to comment #5) I'm going to respond a little backwards.. In fact, on ARM there is no branch instruction that can be used for 0 as a side effect of a subtract. To get the desired effect the code would have to be completely re-arranged to factor out the 0 (bmi) and then == 0 (beq) cases first. I'm not an ARM programmer, but I'm looking at my reference book and it would appear that BGT would perform a branch of greater than for signed comparison and and BHI for unsigned comparison. Again, convert the subtraction into a comparison (subtract, but discard the result) and branch based upon the flags (for signed numbers): cmpr0, r1 bgt.L1 bne.L2 ;handle equality here Alternately, if the code to execute for each condition will not change the result flags, then the use of condition instructions could replace the branches. In this example there is no need to check for equality because the previous two branch instructions eliminate all other possibilities. The reason that my C code example above does *not* check for equality is that it is the least likely condition and can be determined by eliminating the two most likely conditions (negative or positive and not equal). The result of the comparison is used in more than one instruction, so combine cannot safely rework the branch instructions that follow to ensure that the result of the subtract is used correctly. I suppose I'm going to have to learn the gcc internals. It will probably be good for me anyway. However, If what you say is correct, then the *problem* lies at a higher level than the combine, but it does not invalidate the problem its self. Where do we make the decision that a result can be discarded? That would seem to be where the cause of this problem lies. So to break it down more accurately, if all of these conditions are true: 1.) we perform an integral subtraction, 2.) and every use of the result can be replaced with fewer instructions that use the CPU flags that were changed by the subtraction instead of the result its self, 3.) and no instructions between the subtraction and the last use of its result modify those flags then we should replace the subtract operation with a compare and use the CPU flags instead. I am not aware of any case, when both a and b are of the same signed integral type, where (a - b) 0 and (a - b) 0 cannot be replaced with a b and a b, respectively.
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #5 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-10-13 16:04:55 UTC --- The result of the comparison is used in more than one instruction, so combine cannot safely rework the branch instructions that follow to ensure that the result of the subtract is used correctly. In fact, on ARM there is no branch instruction that can be used for 0 as a side effect of a subtract. To get the desired effect the code would have to be completely re-arranged to factor out the 0 (bmi) and then == 0 (beq) cases first.
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #6 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-10-13 16:18:03 UTC --- Note also that flag setting behaviour of the PPC instruction essentially is a comparison of the result against zero. On ARM the flags are set as if the two input operands were compared; that's not the same as comparing the result with zero.
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 Daniel Santos daniel.santos at pobox dot com changed: What|Removed |Added Summary|bad optimization: sub |bad optimization: sub |followed by cmp w/ zero |followed by cmp w/ zero |(ARM) |(x86 ARM) --- Comment #3 from Daniel Santos daniel.santos at pobox dot com 2012-10-06 15:55:47 UTC --- Sure it can: extern print_gt(void); extern print_lt(void); extern print_eq(void); void cmp_and_branch(long a, long b) { if (a b) { print_gt(); } else if (a b) { print_lt(); } else { print_eq(); } } Here's x86_64: cmp_and_branch: .LFB0: .cfi_startproc cmpq%rsi, %rdi jg .L5 jl .L6 jmp print_eq .p2align 4,,10 .p2align 3 .L6: jmp print_lt .p2align 4,,10 .p2align 3 .L5: jmp print_gt .cfi_endproc And here's i386: cmp_and_branch: .LFB0: .cfi_startproc movl8(%esp), %eax cmpl%eax, 4(%esp) jg .L5 jl .L6 jmp print_eq .p2align 2,,3 .L6: jmp print_lt .p2align 2,,3 .L5: jmp print_gt .cfi_endproc
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 --- Comment #4 from Daniel Santos daniel.santos at pobox dot com 2012-10-06 15:57:15 UTC --- Please help me out here if I am missing something.
[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2012-10-05 Ever Confirmed|0 |1 Severity|normal |enhancement --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2012-10-05 23:39:56 UTC --- Confirmed: Trying 7 - 8: Failed to match this instruction: (set (reg:CC 17 flags) (compare:CC (minus:DI (reg/v:DI 60 [ a ]) (reg/v:DI 61 [ b ])) (const_int 0 [0])))