[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 & ARM)

2016-08-11 Thread daniel.santos at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829

Daniel Santos  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from Daniel Santos  ---
(In reply to Richard Earnshaw from comment #8)
> Unfortunately, computers don't to infinite precision arithmetic by default. 
> That would perform a different comparison in that it checks that r0 > r1,
> not whether r0 - r1 > 0.  The difference, for signed comparisons, is when
> overflow occurs.

Looks like I've forgotten to close this bug after retesting. Thank you again
Richard for clarifying this. When modifying the compare function as described
above, optimization is perfect on all targets I've tested (x86, ARM and MIPS).

[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2015-02-14 Thread daniel.santos at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829

--- Comment #9 from Daniel Santos daniel.santos at pobox dot com ---
I appologize for my late response.

(In reply to Richard Earnshaw from comment #8)
 Unfortunately, computers don't to infinite precision arithmetic by default. 
 That would perform a different comparison in that it checks that r0  r1,
 not whether r0 - r1  0.  The difference, for signed comparisons, is when
 overflow occurs.
 
 Consider the case where (in your original code) a has the value INT_MIN (ie
 -2147483648) and b has the value 1.
 
 Now clearly a  b and by the normal rules of arithmetic (infinite precision)
 we would expect a - b to be less than zero.
 
 However, INT_MIN - 1 cannot be represented in a 32-bit long value and
 becomes INT_MAX due to overflow; the result is that for these values a - b 
 0!
 
 On ARM and x86, the flag setting that results from a subtract operation is,
 in effect a comparison of the original operands, rather than a comparison of
 the result; that is on ARM
 
subs rd, rn, rm
 
 is equivalent to 
 
cmp rn, rm
 
 except that the register rd is not written by the comparison.
 
 Power PC is different: it's subtract and compare instruction really does use
 the result of the subtraction to form the comparison.

Thank you very much for your work on this. In re-examining, I'm suspecting that
this may be an invalid bug. :( I have modified the test program slightly:

extern print_gt(void);
extern print_lt(void);
extern print_eq(void);

void cmp_and_branch(long a, long b)
{
long diff = a  b ? 1 : (a  b ? -1 : 0);

if (diff  0) {
print_gt();
} else if (diff  0) {
print_lt();
} else {
print_eq();
}
}

I thought that I had originally tried this and gotten worse results (although
the diff was being done via a complicated -findirect-inline situation), but
this version of the program leaves a finite number of options. When compiled on
x86_64 and ARM, both are flawless:

ARM
cmp_and_branch:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, r1
bgt .L2
blt .L5
b   print_eq
.L2:
b   print_gt
.L5:
b   print_lt
.size   cmp_and_branch, .-cmp_and_branch
.ident  GCC: (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3
.section.note.GNU-stack,,%progbits


x86_64
cmp_and_branch:
.LFB0:
.cfi_startproc
cmpq%rsi, %rdi
jg  .L2
jl  .L5
jmp print_eq
.p2align 4,,10
.p2align 3
.L2:
jmp print_gt
.p2align 4,,10
.p2align 3
.L5:
jmp print_lt
.cfi_endproc

I don't want to close this bug just yet, I want to reset in my other code. This
will certainly help clean up some of my code!!


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2013-08-05 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829

--- Comment #8 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
(In reply to Daniel Santos from comment #7)
 First off, I apologize for my late response here.
 
 (In reply to comment #5)
 I'm going to respond a little backwards..
 
  In fact, on ARM there is no branch instruction that can be used for  0 
  as a
  side effect of a subtract.  To get the desired effect the code would have 
  to be
  completely re-arranged to factor out the  0 (bmi) and then == 0 (beq)
  cases first.
 
 I'm not an ARM programmer, but I'm looking at my reference book and it would
 appear that BGT would perform a branch of greater than for signed comparison
 and and BHI for unsigned comparison.  Again, convert the subtraction into a
 comparison (subtract, but discard the result) and branch based upon the
 flags (for signed numbers):
 
 cmpr0, r1
 bgt.L1
 bne.L2
 ;handle equality here
 

Unfortunately, computers don't to infinite precision arithmetic by default. 
That would perform a different comparison in that it checks that r0  r1, not
whether r0 - r1  0.  The difference, for signed comparisons, is when overflow
occurs.

Consider the case where (in your original code) a has the value INT_MIN (ie
-2147483648) and b has the value 1.

Now clearly a  b and by the normal rules of arithmetic (infinite precision) we
would expect a - b to be less than zero.

However, INT_MIN - 1 cannot be represented in a 32-bit long value and becomes
INT_MAX due to overflow; the result is that for these values a - b  0!

On ARM and x86, the flag setting that results from a subtract operation is, in
effect a comparison of the original operands, rather than a comparison of the
result; that is on ARM

   subs rd, rn, rm

is equivalent to 

   cmp rn, rm

except that the register rd is not written by the comparison.

Power PC is different: it's subtract and compare instruction really does use
the result of the subtraction to form the comparison.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-11-15 Thread daniel.santos at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



--- Comment #7 from Daniel Santos daniel.santos at pobox dot com 2012-11-15 
21:56:02 UTC ---

First off, I apologize for my late response here.



(In reply to comment #5)

I'm going to respond a little backwards..



 In fact, on ARM there is no branch instruction that can be used for  0 as a

 side effect of a subtract.  To get the desired effect the code would have to 
 be

 completely re-arranged to factor out the  0 (bmi) and then == 0 (beq)

 cases first.



I'm not an ARM programmer, but I'm looking at my reference book and it would

appear that BGT would perform a branch of greater than for signed comparison

and and BHI for unsigned comparison.  Again, convert the subtraction into a

comparison (subtract, but discard the result) and branch based upon the flags

(for signed numbers):



cmpr0, r1

bgt.L1

bne.L2

;handle equality here



Alternately, if the code to execute for each condition will not change the

result flags, then the use of condition instructions could replace the

branches.  In this example there is no need to check for equality because the

previous two branch instructions eliminate all other possibilities.  The reason

that my C code example above does *not* check for equality is that it is the

least likely condition and can be determined by eliminating the two most likely

conditions (negative or positive and not equal).



 The result of the comparison is used in more than one instruction, so combine

 cannot safely rework the branch instructions that follow to ensure that the

 result of the subtract is used correctly.



I suppose I'm going to have to learn the gcc internals.  It will probably be

good for me anyway.



However, If what you say is correct, then the *problem* lies at a higher level

than the combine, but it does not invalidate the problem its self.  Where do

we make the decision that a result can be discarded?  That would seem to be

where the cause of this problem lies.



So to break it down more accurately, if all of these conditions are true:



1.) we perform an integral subtraction,

2.) and every use of the result can be replaced with fewer instructions that

use the CPU flags that were changed by the subtraction instead of the result

its self,

3.) and no instructions between the subtraction and the last use of its result

modify those flags



then we should replace the subtract operation with a compare and use the CPU

flags instead.  I am not aware of any case, when both a and b are of the same 

signed integral type, where (a - b)  0 and (a - b)  0 cannot be replaced

with a  b and a  b, respectively.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-10-13 Thread rearnsha at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



--- Comment #5 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-10-13 
16:04:55 UTC ---

The result of the comparison is used in more than one instruction, so combine

cannot safely rework the branch instructions that follow to ensure that the

result of the subtract is used correctly.



In fact, on ARM there is no branch instruction that can be used for  0 as a

side effect of a subtract.  To get the desired effect the code would have to be

completely re-arranged to factor out the  0 (bmi) and then == 0 (beq)

cases first.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-10-13 Thread rearnsha at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



--- Comment #6 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-10-13 
16:18:03 UTC ---

Note also that flag setting behaviour of the PPC instruction essentially is a

comparison of the result against zero.  On ARM the flags are set as if the two

input operands were compared; that's not the same as comparing the result with

zero.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-10-06 Thread daniel.santos at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



Daniel Santos daniel.santos at pobox dot com changed:



   What|Removed |Added



Summary|bad optimization: sub   |bad optimization: sub

   |followed by cmp w/ zero |followed by cmp w/ zero

   |(ARM)   |(x86  ARM)



--- Comment #3 from Daniel Santos daniel.santos at pobox dot com 2012-10-06 
15:55:47 UTC ---

Sure it can:



extern print_gt(void);

extern print_lt(void);

extern print_eq(void);



void cmp_and_branch(long a, long b)

{

if (a  b) {

print_gt();

} else if (a  b) {

print_lt();

} else {

print_eq();

}

}



Here's x86_64:



cmp_and_branch:

.LFB0:

.cfi_startproc

cmpq%rsi, %rdi

jg  .L5

jl  .L6

jmp print_eq

.p2align 4,,10

.p2align 3

.L6:

jmp print_lt

.p2align 4,,10

.p2align 3

.L5:

jmp print_gt

.cfi_endproc



And here's i386:



cmp_and_branch:

.LFB0:

.cfi_startproc

movl8(%esp), %eax

cmpl%eax, 4(%esp)

jg  .L5

jl  .L6

jmp print_eq

.p2align 2,,3

.L6:

jmp print_lt

.p2align 2,,3

.L5:

jmp print_gt

.cfi_endproc


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-10-06 Thread daniel.santos at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



--- Comment #4 from Daniel Santos daniel.santos at pobox dot com 2012-10-06 
15:57:15 UTC ---

Please help me out here if I am missing something.


[Bug target/54829] bad optimization: sub followed by cmp w/ zero (x86 ARM)

2012-10-05 Thread pinskia at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54829



Andrew Pinski pinskia at gcc dot gnu.org changed:



   What|Removed |Added



   Keywords||missed-optimization

 Status|UNCONFIRMED |NEW

   Last reconfirmed||2012-10-05

 Ever Confirmed|0   |1

   Severity|normal  |enhancement



--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2012-10-05 
23:39:56 UTC ---

Confirmed:

Trying 7 - 8:

Failed to match this instruction:

(set (reg:CC 17 flags)

(compare:CC (minus:DI (reg/v:DI 60 [ a ])

(reg/v:DI 61 [ b ]))

(const_int 0 [0])))