Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-10-10 Thread Bernd Edlinger
On 10/09/17 15:02, Richard Earnshaw (lists) wrote:
> On 06/09/17 14:17, Bernd Edlinger wrote:
>> Index: gcc/doc/rtl.texi
>> ===
>> --- gcc/doc/rtl.texi (revision 251752)
>> +++ gcc/doc/rtl.texi (working copy)
>> @@ -2252,6 +2252,13 @@
>>   If one of the operands is a constant, it should be placed in the
>>   second operand and the comparison code adjusted as appropriate.
>>   
>> +There may be exceptions to this rule if the mode @var{m} does not
>> +carry enough information for the swapped comparison operator, or
>> +if we try to detect overflow from the subtraction.  That means, while
>> +0-X may overfow X-0 can never overflow.  Under these conditions

s/overfow/overflow/

>> +a compare may have the constant expression at the first operand.
>> +Examples are the ARM negdi2_compare pattern and similar.
>> +
>>   A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>   since there is no way to know in what mode the comparison is to be
>>   performed; the comparison must either be folded during the compilation
>>
> 
> 
> Er, hold on.  Comparisons don't 'overflow'.  They compare two values and
> tell you something about their relationship.  If A cmp B doesn't tell me
> the same basic things as B swapped(cmp) A, then the world will fall
> apart big time.  So your documentation patch can't be right.
> 

Hi Richard,

I think a CC-mode like CC_NCV is inherently unsymmetrical
which means that it does in general not provide enough information for
the swapped(cmp), and I think the truth is that the N-flag
is the sign of A-B, and V-flag is 1 if A-B overflows otherwise 0.
And if you ask for Branch if less-than that is Branch if N ^ V.


But in the moment I have no idea how to proceed.


Thanks
Bernd.


Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-10-09 Thread Richard Earnshaw (lists)
On 06/09/17 14:17, Bernd Edlinger wrote:
> On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
>> On 06/09/17 13:44, Bernd Edlinger wrote:
>>> On 09/04/17 21:54, Bernd Edlinger wrote:
 Hi Kyrill,

 Thanks for your review!


 On 09/04/17 15:55, Kyrill Tkachov wrote:
> Hi Bernd,
>
> On 18/01/17 15:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
 On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on this
>> one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function
>> __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>>  [(set (reg:CC CC_REGNUM)
>>(compare:CC (match_dup 0) (match_dup 1)))
>>   (parallel [(set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 3) (match_dup 4)))
>>  (set (match_dup 2)
>>   (minus:SI (match_dup 5)
>>(ltu:SI (reg:CC_C CC_REGNUM)
>> (const_int
>> 0])]
>>
>>  [(set (reg:CC CC_REGNUM)
>>(compare:CC (match_dup 2) (match_dup 3)))
>>   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>  (set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 0) (match_dup 1]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are
>> identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
> I agree you've found a valid problem here, but I have some issues
> with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
> [(set (reg:CC_NCV CC_REGNUM)
>   (compare:CC_NCV
> (match_operand:DI 1 "register_operand" "r")
> (match_operand:DI 2 "register_operand" "r")))
>  (set (match_operand:DI 0 "register_operand" "=&r")
>   (minus:DI (match_dup 1) (match_dup 2)))]
> "TARGET_32BIT"
> "#"
> "&& reload_completed"
> [(parallel [(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 1) (match_dup 2)))
> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
> 2)))])
>  (parallel [(set (reg:CC_C CC_REGNUM)
>  (compare:CC_C
>(zero_extend:DI (match_dup 4))
>(plus:DI (zero_extend:DI (match_dup 5))
> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
> (set (match_dup 3)
>  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>
>
> This pattern is now no-longer self consistent in that before the
> split
> the overall result for the condition register is in mode CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then
> implies that
> the cc mode of subsi3_carryin_compare is incorrect as well and
> should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
> agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out from
> fixing this (it's possible that we might need a change to
> select_cc_mode
> as well).
>
 Yes, this is still a bit awkward

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Kyrill Tkachov


On 06/09/17 14:17, Bernd Edlinger wrote:

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:

On 06/09/17 13:44, Bernd Edlinger wrote:

On 09/04/17 21:54, Bernd Edlinger wrote:

Hi Kyrill,

Thanks for your review!


On 09/04/17 15:55, Kyrill Tkachov wrote:

Hi Bernd,

On 18/01/17 15:36, Bernd Edlinger wrote:

On 01/13/17 19:28, Bernd Edlinger wrote:

On 01/13/17 17:10, Bernd Edlinger wrote:

On 01/13/17 14:50, Richard Earnshaw (lists) wrote:

On 18/12/16 12:58, Bernd Edlinger wrote:

Hi,

this is related to PR77308, the follow-up patch will depend on this
one.

When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
before reload, a mis-compilation in libgcc function
__gnu_satfractdasq
was discovered, see [1] for more details.

The reason seems to be that when the *arm_cmpdi_insn is directly
followed by a *arm_cmpdi_unsigned instruction, both are split
up into this:

  [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 0) (match_dup 1)))
   (parallel [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 3) (match_dup 4)))
  (set (match_dup 2)
   (minus:SI (match_dup 5)
(ltu:SI (reg:CC_C CC_REGNUM)
(const_int
0])]

  [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 2) (match_dup 3)))
   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
  (set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 0) (match_dup 1]

The problem is that the reg:CC from the *subsi3_carryin_compare
is not mentioning that the reg:CC is also dependent on the reg:CC
from before.  Therefore the *arm_cmpsi_insn appears to be
redundant and thus got removed, because the data values are
identical.

I think that applies to a number of similar pattern where data
flow is happening through the CC reg.

So this is a kind of correctness issue, and should be fixed
independently from the optimization issue PR77308.

Therefore I think the patterns need to specify the true
value that will be in the CC reg, in order for cse to
know what the instructions are really doing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


I agree you've found a valid problem here, but I have some issues
with
the patch itself.


(define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


This pattern is now no-longer self consistent in that before the
split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then
implies that
the cc mode of subsi3_carryin_compare is incorrect as well and
should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to
select_cc_mode
as well).


Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but
it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
   (compare:CC_C
 (zero_extend:DI (match_dup 4))
 (plus:DI (zero_extend:DI (match_dup 5))
  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
set (reg:CC_NV CC_REGNUM)
   (compare:CC_NV
(match_dup 4))
(plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
(const_int 0)))
set (match_dup 3)
   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)


But I doubt that wil

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger
On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
 Hi Bernd,

 On 18/01/17 15:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
 On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
>
> this is related to PR77308, the follow-up patch will depend on this
> one.
>
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function
> __gnu_satfractdasq
> was discovered, see [1] for more details.
>
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 0) (match_dup 1)))
>   (parallel [(set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 3) (match_dup 4)))
>  (set (match_dup 2)
>   (minus:SI (match_dup 5)
>(ltu:SI (reg:CC_C CC_REGNUM)
> (const_int
> 0])]
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 2) (match_dup 3)))
>   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>  (set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 0) (match_dup 1]
>
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are
> identical.
>
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
>
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
>
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
 I agree you've found a valid problem here, but I have some issues
 with
 the patch itself.


 (define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
 2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


 This pattern is now no-longer self consistent in that before the
 split
 the overall result for the condition register is in mode CC_NCV, but
 afterwards it is just CC_C.

 I think CC_NCV is correct mode (the N, C and V bits all correctly
 reflect the result of the 64-bit comparison), but that then
 implies that
 the cc mode of subsi3_carryin_compare is incorrect as well and
 should in
 fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
 agree
 that CC_NCV is the correct mode for this operation

 I'm not sure if there are other consequences that will fall out from
 fixing this (it's possible that we might need a change to
 select_cc_mode
 as well).

>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger
On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
 Hi Bernd,

 On 18/01/17 15:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
 On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
>
> this is related to PR77308, the follow-up patch will depend on this
> one.
>
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function
> __gnu_satfractdasq
> was discovered, see [1] for more details.
>
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 0) (match_dup 1)))
>   (parallel [(set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 3) (match_dup 4)))
>  (set (match_dup 2)
>   (minus:SI (match_dup 5)
>(ltu:SI (reg:CC_C CC_REGNUM)
> (const_int
> 0])]
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 2) (match_dup 3)))
>   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>  (set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 0) (match_dup 1]
>
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are
> identical.
>
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
>
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
>
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
 I agree you've found a valid problem here, but I have some issues
 with
 the patch itself.


 (define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
 2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


 This pattern is now no-longer self consistent in that before the
 split
 the overall result for the condition register is in mode CC_NCV, but
 afterwards it is just CC_C.

 I think CC_NCV is correct mode (the N, C and V bits all correctly
 reflect the result of the 64-bit comparison), but that then
 implies that
 the cc mode of subsi3_carryin_compare is incorrect as well and
 should in
 fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
 agree
 that CC_NCV is the correct mode for this operation

 I'm not sure if there are other consequences that will fall out from
 fixing this (it's possible that we might need a change to
 select_cc_mode
 as well).

>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Richard Earnshaw (lists)
On 06/09/17 13:44, Bernd Edlinger wrote:
> On 09/04/17 21:54, Bernd Edlinger wrote:
>> Hi Kyrill,
>>
>> Thanks for your review!
>>
>>
>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>> Hi Bernd,
>>>
>>> On 18/01/17 15:36, Bernd Edlinger wrote:
 On 01/13/17 19:28, Bernd Edlinger wrote:
> On 01/13/17 17:10, Bernd Edlinger wrote:
>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>> On 18/12/16 12:58, Bernd Edlinger wrote:
 Hi,

 this is related to PR77308, the follow-up patch will depend on this
 one.

 When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
 before reload, a mis-compilation in libgcc function 
 __gnu_satfractdasq
 was discovered, see [1] for more details.

 The reason seems to be that when the *arm_cmpdi_insn is directly
 followed by a *arm_cmpdi_unsigned instruction, both are split
 up into this:

 [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 0) (match_dup 1)))
  (parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 3) (match_dup 4)))
 (set (match_dup 2)
  (minus:SI (match_dup 5)
   (ltu:SI (reg:CC_C CC_REGNUM) 
 (const_int
 0])]

 [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 2) (match_dup 3)))
  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
 (set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1]

 The problem is that the reg:CC from the *subsi3_carryin_compare
 is not mentioning that the reg:CC is also dependent on the reg:CC
 from before.  Therefore the *arm_cmpsi_insn appears to be
 redundant and thus got removed, because the data values are 
 identical.

 I think that applies to a number of similar pattern where data
 flow is happening through the CC reg.

 So this is a kind of correctness issue, and should be fixed
 independently from the optimization issue PR77308.

 Therefore I think the patterns need to specify the true
 value that will be in the CC reg, in order for cse to
 know what the instructions are really doing.


 Bootstrapped and reg-tested on arm-linux-gnueabihf.
 Is it OK for trunk?

>>> I agree you've found a valid problem here, but I have some issues 
>>> with
>>> the patch itself.
>>>
>>>
>>> (define_insn_and_split "subdi3_compare1"
>>>[(set (reg:CC_NCV CC_REGNUM)
>>>  (compare:CC_NCV
>>>(match_operand:DI 1 "register_operand" "r")
>>>(match_operand:DI 2 "register_operand" "r")))
>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>  (minus:DI (match_dup 1) (match_dup 2)))]
>>>"TARGET_32BIT"
>>>"#"
>>>"&& reload_completed"
>>>[(parallel [(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>> 2)))])
>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>> (compare:CC_C
>>>   (zero_extend:DI (match_dup 4))
>>>   (plus:DI (zero_extend:DI (match_dup 5))
>>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>>(set (match_dup 3)
>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>>
>>>
>>> This pattern is now no-longer self consistent in that before the 
>>> split
>>> the overall result for the condition register is in mode CC_NCV, but
>>> afterwards it is just CC_C.
>>>
>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>> reflect the result of the 64-bit comparison), but that then 
>>> implies that
>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>> should in
>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>> agree
>>> that CC_NCV is the correct mode for this operation
>>>
>>> I'm not sure if there are other consequences that will fall out from
>>> fixing this (it's possible that we might need a change to 
>>> select_cc_mode
>>> as well).
>>>
>> Yes, this is still a bit awkward...
>>
>> The N and V bit will be the correct result for the subdi3_compare1
>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>> only gets the C bit correct, the expression for N and V is a different
>> one.
>>
>> It probably works, because the subsi3_carryin_compare instruction sets
>> mor

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger
On 09/04/17 21:54, Bernd Edlinger wrote:
> Hi Kyrill,
> 
> Thanks for your review!
> 
> 
> On 09/04/17 15:55, Kyrill Tkachov wrote:
>> Hi Bernd,
>>
>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
 On 01/13/17 17:10, Bernd Edlinger wrote:
> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is related to PR77308, the follow-up patch will depend on this
>>> one.
>>>
>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>> before reload, a mis-compilation in libgcc function 
>>> __gnu_satfractdasq
>>> was discovered, see [1] for more details.
>>>
>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>> up into this:
>>>
>>> [(set (reg:CC CC_REGNUM)
>>>   (compare:CC (match_dup 0) (match_dup 1)))
>>>  (parallel [(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 3) (match_dup 4)))
>>> (set (match_dup 2)
>>>  (minus:SI (match_dup 5)
>>>   (ltu:SI (reg:CC_C CC_REGNUM) 
>>> (const_int
>>> 0])]
>>>
>>> [(set (reg:CC CC_REGNUM)
>>>   (compare:CC (match_dup 2) (match_dup 3)))
>>>  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>> (set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 0) (match_dup 1]
>>>
>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>> redundant and thus got removed, because the data values are 
>>> identical.
>>>
>>> I think that applies to a number of similar pattern where data
>>> flow is happening through the CC reg.
>>>
>>> So this is a kind of correctness issue, and should be fixed
>>> independently from the optimization issue PR77308.
>>>
>>> Therefore I think the patterns need to specify the true
>>> value that will be in the CC reg, in order for cse to
>>> know what the instructions are really doing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>> I agree you've found a valid problem here, but I have some issues 
>> with
>> the patch itself.
>>
>>
>> (define_insn_and_split "subdi3_compare1"
>>[(set (reg:CC_NCV CC_REGNUM)
>>  (compare:CC_NCV
>>(match_operand:DI 1 "register_operand" "r")
>>(match_operand:DI 2 "register_operand" "r")))
>> (set (match_operand:DI 0 "register_operand" "=&r")
>>  (minus:DI (match_dup 1) (match_dup 2)))]
>>"TARGET_32BIT"
>>"#"
>>"&& reload_completed"
>>[(parallel [(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 1) (match_dup 2)))
>>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>> 2)))])
>> (parallel [(set (reg:CC_C CC_REGNUM)
>> (compare:CC_C
>>   (zero_extend:DI (match_dup 4))
>>   (plus:DI (zero_extend:DI (match_dup 5))
>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>(set (match_dup 3)
>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>
>>
>> This pattern is now no-longer self consistent in that before the 
>> split
>> the overall result for the condition register is in mode CC_NCV, but
>> afterwards it is just CC_C.
>>
>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>> reflect the result of the 64-bit comparison), but that then 
>> implies that
>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>> should in
>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>> agree
>> that CC_NCV is the correct mode for this operation
>>
>> I'm not sure if there are other consequences that will fall out from
>> fixing this (it's possible that we might need a change to 
>> select_cc_mode
>> as well).
>>
> Yes, this is still a bit awkward...
>
> The N and V bit will be the correct result for the subdi3_compare1
> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
> only gets the C bit correct, the expression for N and V is a different
> one.
>
> It probably works, because the subsi3_carryin_compare instruction sets
> more CC bits than the pattern does explicitly specify the value.
> We know the subsi3_carryin_compare also computes the NV bits, but 
> it is
> hard to

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-04 Thread Bernd Edlinger
Hi Kyrill,

Thanks for your review!


On 09/04/17 15:55, Kyrill Tkachov wrote:
> Hi Bernd,
> 
> On 18/01/17 15:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
 On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on this
>> one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function 
>> __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>> [(set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 0) (match_dup 1)))
>>  (parallel [(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 3) (match_dup 4)))
>> (set (match_dup 2)
>>  (minus:SI (match_dup 5)
>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>> 0])]
>>
>> [(set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 2) (match_dup 3)))
>>  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>> (set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 0) (match_dup 1]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are 
>> identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
> I agree you've found a valid problem here, but I have some issues with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
>[(set (reg:CC_NCV CC_REGNUM)
>  (compare:CC_NCV
>(match_operand:DI 1 "register_operand" "r")
>(match_operand:DI 2 "register_operand" "r")))
> (set (match_operand:DI 0 "register_operand" "=&r")
>  (minus:DI (match_dup 1) (match_dup 2)))]
>"TARGET_32BIT"
>"#"
>"&& reload_completed"
>[(parallel [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 1) (match_dup 2)))
>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
> 2)))])
> (parallel [(set (reg:CC_C CC_REGNUM)
> (compare:CC_C
>   (zero_extend:DI (match_dup 4))
>   (plus:DI (zero_extend:DI (match_dup 5))
>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>(set (match_dup 3)
> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>
>
> This pattern is now no-longer self consistent in that before the split
> the overall result for the condition register is in mode CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then implies 
> that
> the cc mode of subsi3_carryin_compare is incorrect as well and 
> should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
> agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out from
> fixing this (it's possible that we might need a change to 
> select_cc_mode
> as well).
>
 Yes, this is still a bit awkward...

 The N and V bit will be the correct result for the subdi3_compare1
 a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
 only gets the C bit correct, the expression for N and V is a different
 one.

 It probably works, because the subsi3_carryin_compare instruction sets
 more CC bits than the pattern does explicitly specify the value.
 We know the subsi3_carryin_compare also computes the NV bits, but it is
 hard to write down the correct rtl expression for it.

 In theory the pattern should describe everything correctly,
 maybe, like:

 set (reg:CC_C CC_REGNUM)
  (compare:CC_C
>>>

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-04 Thread Kyrill Tkachov

Hi Bernd,

On 18/01/17 15:36, Bernd Edlinger wrote:

On 01/13/17 19:28, Bernd Edlinger wrote:

On 01/13/17 17:10, Bernd Edlinger wrote:

On 01/13/17 14:50, Richard Earnshaw (lists) wrote:

On 18/12/16 12:58, Bernd Edlinger wrote:

Hi,

this is related to PR77308, the follow-up patch will depend on this
one.

When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
before reload, a mis-compilation in libgcc function __gnu_satfractdasq
was discovered, see [1] for more details.

The reason seems to be that when the *arm_cmpdi_insn is directly
followed by a *arm_cmpdi_unsigned instruction, both are split
up into this:

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1)))
 (parallel [(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 3) (match_dup 4)))
(set (match_dup 2)
 (minus:SI (match_dup 5)
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
0])]

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 2) (match_dup 3)))
 (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 0) (match_dup 1]

The problem is that the reg:CC from the *subsi3_carryin_compare
is not mentioning that the reg:CC is also dependent on the reg:CC
from before.  Therefore the *arm_cmpsi_insn appears to be
redundant and thus got removed, because the data values are identical.

I think that applies to a number of similar pattern where data
flow is happening through the CC reg.

So this is a kind of correctness issue, and should be fixed
independently from the optimization issue PR77308.

Therefore I think the patterns need to specify the true
value that will be in the CC reg, in order for cse to
know what the instructions are really doing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


I agree you've found a valid problem here, but I have some issues with
the patch itself.


(define_insn_and_split "subdi3_compare1"
   [(set (reg:CC_NCV CC_REGNUM)
 (compare:CC_NCV
   (match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "register_operand" "r")))
(set (match_operand:DI 0 "register_operand" "=&r")
 (minus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 1) (match_dup 2)))
   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
(parallel [(set (reg:CC_C CC_REGNUM)
(compare:CC_C
  (zero_extend:DI (match_dup 4))
  (plus:DI (zero_extend:DI (match_dup 5))
   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
   (set (match_dup 3)
(minus:SI (minus:SI (match_dup 4) (match_dup 5))
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


This pattern is now no-longer self consistent in that before the split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then implies that
the cc mode of subsi3_carryin_compare is incorrect as well and should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to select_cc_mode
as well).


Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
 (compare:CC_C
   (zero_extend:DI (match_dup 4))
   (plus:DI (zero_extend:DI (match_dup 5))
(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
set (reg:CC_NV CC_REGNUM)
 (compare:CC_NV
  (match_dup 4))
  (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
set (match_dup 3)
 (minus:SI (minus:SI (match_dup 4) (match_dup 5))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)


But I doubt that will work to set CC_REGNUM with two different modes
in parallel?

Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
defines C from the DImode result, and NV from the SImode result,
similar to the CC_NOOVmode, that also leaves something open what
bits it really defines?


What do you think?


Thanks
Bernd.

I think maybe the 

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-01-18 Thread Bernd Edlinger
On 01/13/17 19:28, Bernd Edlinger wrote:
> On 01/13/17 17:10, Bernd Edlinger wrote:
>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>> On 18/12/16 12:58, Bernd Edlinger wrote:
 Hi,

 this is related to PR77308, the follow-up patch will depend on this
 one.

 When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
 before reload, a mis-compilation in libgcc function __gnu_satfractdasq
 was discovered, see [1] for more details.

 The reason seems to be that when the *arm_cmpdi_insn is directly
 followed by a *arm_cmpdi_unsigned instruction, both are split
 up into this:

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1)))
 (parallel [(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 3) (match_dup 4)))
(set (match_dup 2)
 (minus:SI (match_dup 5)
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
 0])]

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 2) (match_dup 3)))
 (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 0) (match_dup 1]

 The problem is that the reg:CC from the *subsi3_carryin_compare
 is not mentioning that the reg:CC is also dependent on the reg:CC
 from before.  Therefore the *arm_cmpsi_insn appears to be
 redundant and thus got removed, because the data values are identical.

 I think that applies to a number of similar pattern where data
 flow is happening through the CC reg.

 So this is a kind of correctness issue, and should be fixed
 independently from the optimization issue PR77308.

 Therefore I think the patterns need to specify the true
 value that will be in the CC reg, in order for cse to
 know what the instructions are really doing.


 Bootstrapped and reg-tested on arm-linux-gnueabihf.
 Is it OK for trunk?

>>>
>>> I agree you've found a valid problem here, but I have some issues with
>>> the patch itself.
>>>
>>>
>>> (define_insn_and_split "subdi3_compare1"
>>>   [(set (reg:CC_NCV CC_REGNUM)
>>> (compare:CC_NCV
>>>   (match_operand:DI 1 "register_operand" "r")
>>>   (match_operand:DI 2 "register_operand" "r")))
>>>(set (match_operand:DI 0 "register_operand" "=&r")
>>> (minus:DI (match_dup 1) (match_dup 2)))]
>>>   "TARGET_32BIT"
>>>   "#"
>>>   "&& reload_completed"
>>>   [(parallel [(set (reg:CC CC_REGNUM)
>>>(compare:CC (match_dup 1) (match_dup 2)))
>>>   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>>(parallel [(set (reg:CC_C CC_REGNUM)
>>>(compare:CC_C
>>>  (zero_extend:DI (match_dup 4))
>>>  (plus:DI (zero_extend:DI (match_dup 5))
>>>   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>>   (set (match_dup 3)
>>>(minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>>
>>>
>>> This pattern is now no-longer self consistent in that before the split
>>> the overall result for the condition register is in mode CC_NCV, but
>>> afterwards it is just CC_C.
>>>
>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>> reflect the result of the 64-bit comparison), but that then implies that
>>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>>> that CC_NCV is the correct mode for this operation
>>>
>>> I'm not sure if there are other consequences that will fall out from
>>> fixing this (it's possible that we might need a change to select_cc_mode
>>> as well).
>>>
>>
>> Yes, this is still a bit awkward...
>>
>> The N and V bit will be the correct result for the subdi3_compare1
>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>> only gets the C bit correct, the expression for N and V is a different
>> one.
>>
>> It probably works, because the subsi3_carryin_compare instruction sets
>> more CC bits than the pattern does explicitly specify the value.
>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>> hard to write down the correct rtl expression for it.
>>
>> In theory the pattern should describe everything correctly,
>> maybe, like:
>>
>> set (reg:CC_C CC_REGNUM)
>> (compare:CC_C
>>   (zero_extend:DI (match_dup 4))
>>   (plus:DI (zero_extend:DI (match_dup 5))
>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>> set (reg:CC_NV CC_REGNUM)
>> (compare:CC_NV
>>  (match_dup 4))
>>  (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
>> set (match_dup 3)
>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_in

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-01-13 Thread Bernd Edlinger
On 01/13/17 17:10, Bernd Edlinger wrote:
> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is related to PR77308, the follow-up patch will depend on this one.
>>>
>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>>> was discovered, see [1] for more details.
>>>
>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>> up into this:
>>>
>>>[(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 0) (match_dup 1)))
>>> (parallel [(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>(set (match_dup 2)
>>> (minus:SI (match_dup 5)
>>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>> 0])]
>>>
>>>[(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 2) (match_dup 3)))
>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 0) (match_dup 1]
>>>
>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>> redundant and thus got removed, because the data values are identical.
>>>
>>> I think that applies to a number of similar pattern where data
>>> flow is happening through the CC reg.
>>>
>>> So this is a kind of correctness issue, and should be fixed
>>> independently from the optimization issue PR77308.
>>>
>>> Therefore I think the patterns need to specify the true
>>> value that will be in the CC reg, in order for cse to
>>> know what the instructions are really doing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>>
>> I agree you've found a valid problem here, but I have some issues with
>> the patch itself.
>>
>>
>> (define_insn_and_split "subdi3_compare1"
>>   [(set (reg:CC_NCV CC_REGNUM)
>> (compare:CC_NCV
>>   (match_operand:DI 1 "register_operand" "r")
>>   (match_operand:DI 2 "register_operand" "r")))
>>(set (match_operand:DI 0 "register_operand" "=&r")
>> (minus:DI (match_dup 1) (match_dup 2)))]
>>   "TARGET_32BIT"
>>   "#"
>>   "&& reload_completed"
>>   [(parallel [(set (reg:CC CC_REGNUM)
>>(compare:CC (match_dup 1) (match_dup 2)))
>>   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>>(parallel [(set (reg:CC_C CC_REGNUM)
>>(compare:CC_C
>>  (zero_extend:DI (match_dup 4))
>>  (plus:DI (zero_extend:DI (match_dup 5))
>>   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>   (set (match_dup 3)
>>(minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>
>>
>> This pattern is now no-longer self consistent in that before the split
>> the overall result for the condition register is in mode CC_NCV, but
>> afterwards it is just CC_C.
>>
>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>> reflect the result of the 64-bit comparison), but that then implies that
>> the cc mode of subsi3_carryin_compare is incorrect as well and should in
>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
>> that CC_NCV is the correct mode for this operation
>>
>> I'm not sure if there are other consequences that will fall out from
>> fixing this (it's possible that we might need a change to select_cc_mode
>> as well).
>>
>
> Yes, this is still a bit awkward...
>
> The N and V bit will be the correct result for the subdi3_compare1
> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
> only gets the C bit correct, the expression for N and V is a different
> one.
>
> It probably works, because the subsi3_carryin_compare instruction sets
> more CC bits than the pattern does explicitly specify the value.
> We know the subsi3_carryin_compare also computes the NV bits, but it is
> hard to write down the correct rtl expression for it.
>
> In theory the pattern should describe everything correctly,
> maybe, like:
>
> set (reg:CC_C CC_REGNUM)
> (compare:CC_C
>   (zero_extend:DI (match_dup 4))
>   (plus:DI (zero_extend:DI (match_dup 5))
>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
> set (reg:CC_NV CC_REGNUM)
> (compare:CC_NV
>  (match_dup 4))
>  (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
> set (match_dup 3)
> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)
>
>
> But I doubt that will work to set CC_REGNUM with two different modes
> in parallel?
>
> Another idea would be to invent a CC_CNV_NOOV mode, that implic

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-01-13 Thread Bernd Edlinger
On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on this one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>>[(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 0) (match_dup 1)))
>> (parallel [(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 3) (match_dup 4)))
>>(set (match_dup 2)
>> (minus:SI (match_dup 5)
>>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>> 0])]
>>
>>[(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 2) (match_dup 3)))
>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 0) (match_dup 1]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>
> I agree you've found a valid problem here, but I have some issues with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
>   [(set (reg:CC_NCV CC_REGNUM)
>   (compare:CC_NCV
> (match_operand:DI 1 "register_operand" "r")
> (match_operand:DI 2 "register_operand" "r")))
>(set (match_operand:DI 0 "register_operand" "=&r")
>   (minus:DI (match_dup 1) (match_dup 2)))]
>   "TARGET_32BIT"
>   "#"
>   "&& reload_completed"
>   [(parallel [(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 1) (match_dup 2)))
> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
>(parallel [(set (reg:CC_C CC_REGNUM)
>  (compare:CC_C
>(zero_extend:DI (match_dup 4))
>(plus:DI (zero_extend:DI (match_dup 5))
> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
> (set (match_dup 3)
>  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>
>
> This pattern is now no-longer self consistent in that before the split
> the overall result for the condition register is in mode CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then implies that
> the cc mode of subsi3_carryin_compare is incorrect as well and should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out from
> fixing this (it's possible that we might need a change to select_cc_mode
> as well).
>

Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
 (compare:CC_C
   (zero_extend:DI (match_dup 4))
   (plus:DI (zero_extend:DI (match_dup 5))
(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
set (reg:CC_NV CC_REGNUM)
 (compare:CC_NV
  (match_dup 4))
  (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
set (match_dup 3)
 (minus:SI (minus:SI (match_dup 4) (match_dup 5))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)


But I doubt that will work to set CC_REGNUM with two different modes
in parallel?

Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
defines C from the DImode result, and NV from the SImode result,
similar to the CC_NOOVmode, that also leaves someth

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-01-13 Thread Richard Earnshaw (lists)
On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
> 
> this is related to PR77308, the follow-up patch will depend on this one.
> 
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
> was discovered, see [1] for more details.
> 
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
> 
>[(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 0) (match_dup 1)))
> (parallel [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 3) (match_dup 4)))
>(set (match_dup 2)
> (minus:SI (match_dup 5)
>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 
> 0])]
> 
>[(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 2) (match_dup 3)))
> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 0) (match_dup 1]
> 
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are identical.
> 
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
> 
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
> 
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
> 
> 
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
> 

I agree you've found a valid problem here, but I have some issues with
the patch itself.


(define_insn_and_split "subdi3_compare1"
  [(set (reg:CC_NCV CC_REGNUM)
(compare:CC_NCV
  (match_operand:DI 1 "register_operand" "r")
  (match_operand:DI 2 "register_operand" "r")))
   (set (match_operand:DI 0 "register_operand" "=&r")
(minus:DI (match_dup 1) (match_dup 2)))]
  "TARGET_32BIT"
  "#"
  "&& reload_completed"
  [(parallel [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
   (parallel [(set (reg:CC_C CC_REGNUM)
   (compare:CC_C
 (zero_extend:DI (match_dup 4))
 (plus:DI (zero_extend:DI (match_dup 5))
  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
  (set (match_dup 3)
   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


This pattern is now no-longer self consistent in that before the split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then implies that
the cc mode of subsi3_carryin_compare is incorrect as well and should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to select_cc_mode
as well).

R.

> 
> Thanks
> Bernd.
> 
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00680.html
> 
> 
> patch-pr77308-5.diff
> 
> 
> 2016-12-10  Bernd Edlinger  
> 
>   PR target/77308
>   * config/arm/arm.md (adddi3_compareV, *addsi3_compareV_upper,
>   adddi3_compareC, *addsi3_compareC_upper, subdi3_compare1,
>   subsi3_carryin_compare, subsi3_carryin_compare_const,
>   negdi2_compare, *negsi2_carryin_compare,
>   *arm_cmpdi_insn): Fix the CC reg dataflow.
> 
> Index: gcc/config/arm/arm.md
> ===
> --- gcc/config/arm/arm.md (revision 243515)
> +++ gcc/config/arm/arm.md (working copy)
> @@ -669,17 +669,15 @@
> (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
> (parallel [(set (reg:CC_V CC_REGNUM)
>  (ne:CC_V
> - (plus:DI (plus:DI
> -   (sign_extend:DI (match_dup 4))
> -   (sign_extend:DI (match_dup 5)))
> -  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
> - (plus:DI (sign_extend:DI
> -   (plus:SI (match_dup 4) (match_dup 5)))
> -  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
> -  (set (match_dup 3) (plus:SI (plus:SI
> -   (match_dup 4) (match_dup 5))
> -  (ltu:SI (reg:CC_C