Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated part 2

2017-10-20 Thread Richard Earnshaw (lists)
On 13/10/17 08:41, Christophe Lyon wrote:
> Hi,
> 
> The attached small patch solves PR 67591 and removes occurrences of
> "IT blocks containing 32-bit Thumb instructions are deprecated in
> ARMv8". It is similar to the patch I committed recently and updates
> the 3 remaining patterns that can generate such instructions. I
> checked gcc.log, g++.log, libstdc++.log and gfortran.log and found no
> occurrence of the warning with this patch applied.
> 
> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
> and --with-cpu=cortex-a57 --with-mode=thumb, and also bootstrapped
> successfully on armv8 HW in thumb mode.
> 
> Benchmarking shows no noticeable difference.
> 
> OK for trunk?
> 

OK.

R.

> Thanks,
> 
> Christophe
> 
> 
> depr-it-2.chlog.txt
> 
> 
> 2017-10-13  Christophe Lyon  
> 
>   PR target/67591
>   * config/arm/arm.md (*sub_shiftsi): Add predicable_short_it
>   attribute.
>   (*cmp_ite0): Add enabled_for_depr_it attribute.
>   (*cmp_ite1): Likewise.
> 
> 
> depr-it-2.patch.txt
> 
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index f241f9d..093db74 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -8960,6 +8960,7 @@
>"TARGET_32BIT"
>"sub%?\\t%0, %1, %3%S2"
>[(set_attr "predicable" "yes")
> +   (set_attr "predicable_short_it" "no")
> (set_attr "shift" "3")
> (set_attr "arch" "32,a")
> (set_attr "type" "alus_shift_imm,alus_shift_reg")])
> @@ -9398,6 +9399,7 @@
>}"
>[(set_attr "conds" "set")
> (set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
> +   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
> (set_attr "type" "multiple")
> (set_attr_alternative "length"
>[(const_int 6)
> @@ -9481,6 +9483,7 @@
>}"
>[(set_attr "conds" "set")
> (set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
> +   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
> (set_attr_alternative "length"
>[(const_int 6)
> (const_int 8)
> 



[ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated part 2

2017-10-13 Thread Christophe Lyon
Hi,

The attached small patch solves PR 67591 and removes occurrences of
"IT blocks containing 32-bit Thumb instructions are deprecated in
ARMv8". It is similar to the patch I committed recently and updates
the 3 remaining patterns that can generate such instructions. I
checked gcc.log, g++.log, libstdc++.log and gfortran.log and found no
occurrence of the warning with this patch applied.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, and also bootstrapped
successfully on armv8 HW in thumb mode.

Benchmarking shows no noticeable difference.

OK for trunk?

Thanks,

Christophe
2017-10-13  Christophe Lyon  

PR target/67591
* config/arm/arm.md (*sub_shiftsi): Add predicable_short_it
attribute.
(*cmp_ite0): Add enabled_for_depr_it attribute.
(*cmp_ite1): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f241f9d..093db74 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8960,6 +8960,7 @@
   "TARGET_32BIT"
   "sub%?\\t%0, %1, %3%S2"
   [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
(set_attr "shift" "3")
(set_attr "arch" "32,a")
(set_attr "type" "alus_shift_imm,alus_shift_reg")])
@@ -9398,6 +9399,7 @@
   }"
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr "type" "multiple")
(set_attr_alternative "length"
   [(const_int 6)
@@ -9481,6 +9483,7 @@
   }"
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)


Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-15 Thread Christophe Lyon
On 13 September 2017 at 18:33, Kyrill  Tkachov
 wrote:
> Hi Christophe,
>
>
> On 13/09/17 16:23, Christophe Lyon wrote:
>>
>> Hi,
>>
>> On 12 October 2016 at 11:22, Christophe Lyon 
>> wrote:
>>>
>>> On 12 October 2016 at 11:14, Kyrill Tkachov 
>>> wrote:

 On 12/10/16 09:59, Christophe Lyon wrote:
>
> Hi Kyrill,
>
> On 7 October 2016 at 17:00, Kyrill Tkachov
> 
> wrote:
>>
>> Hi Christophe,
>>
>>
>> On 07/09/16 21:05, Christophe Lyon wrote:
>>>
>>> Hi,
>>>
>>> The attached patch is a first part to solve PR 67591: it removes
>>> several occurrences of "IT blocks containing 32-bit Thumb
>>> instructions are deprecated in ARMv8" messages in the
>>> gcc/g++/libstdc++/fortran testsuites.
>>>
>>> It does not remove them all yet. This patch only modifies the
>>> *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
>>> *and_scc_scc and *and_scc_scc_cmp patterns.
>>> Additional work is required in sub_shiftsi etc, at least.
>>> I've started looking at these, but I decided I could already
>>> post this self-contained patch to check if this implementation
>>> is OK.
>>>
>>> Regarding *cmp_and and *cmp_ior patterns, the addition of the
>>> enabled_for_depr_it attribute is aggressive in the sense that it
>>> keeps
>>> only the alternatives with 'l' and 'Py' constraints, while in some
>>> cases the constraints could be relaxed. Indeed, these 2 patterns can
>>> swap their input comparisons, meaning that any of them can be emitted
>>> in the IT-block, and is thus subject to the ARMv8 deprecation.
>>> The generated code is possibly suboptimal in the cases where the
>>> operands are not swapped, since 'r' could be used.
>>>
>>> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
>>> and --with-cpu=cortex-a57 --with-mode=thumb, showing only
>>> improvements:
>>>
>>>
>>>
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html
>>>
>>> Bootstrapped OK on armv8l HW.
>>>
>>> Is this OK?
>>>
>>> Thanks,
>>>
>>> Christophe
>>
>>
>>(define_insn_and_split "*ior_scc_scc"
>> -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
>> +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
>>   (ior:SI (match_operator:SI 3 "arm_comparison_operator"
>> -[(match_operand:SI 1 "s_register_operand" "r")
>> - (match_operand:SI 2 "arm_add_operand" "rIL")])
>> +[(match_operand:SI 1 "s_register_operand" "r,l")
>> + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
>>   (match_operator:SI 6 "arm_comparison_operator"
>> -[(match_operand:SI 4 "s_register_operand" "r")
>> - (match_operand:SI 5 "arm_add_operand" "rIL")])))
>> +[(match_operand:SI 4 "s_register_operand" "r,l")
>> + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
>>
>> Can you please put the more restrictive alternatives (lPy) first?
>> Same with the other patterns your patch touches.
>> Ok with that change if a rebased testing run is ok.
>> Sorry for the delay in reviewing.
>>
> OK, I will update my patch accordingly.
>
> However, when I discussed this with Ramana during the Cauldron,
> he requested benchmark results. So far, I was able to run spec2006
> on an APM machine, and I'm seeing performance changes in the
> range 11% improvement (465.tonto) to 7% degradation (433.milc).
>
> Would that be acceptable?


 Those sound like quite large swings.
>>>
>>> Indeed, but most are in the -1%-+1% range.
>>>
 Are you sure the machine was not running anything else at the time
 or playing tricks with frequency scaling?
>>>
>>> No, I had no such guarantee. I used this machine temporarily,
>>> first to check that bootstrap worked. I planed to use another
>>> board with an A57 "standard" microarch for proper
>>> benchmarking, but I'm not sure when I'll have access to it
>>> (wrt to e/o gcc stage1), that's why I reported these early
>>> figures.
>>>
 Did all iterations of SPEC show a consistent difference?

 If the changes are consistent, could you have a look at the codegen
 to see if there are any clues to the differences?
>>>
>>> I will update my patch according to your comment, re-run the bench
>>> and have a deeper look at the codegen differences.
>>>
>> I have finally been able to run benchmarks with my patch updated
>> according to your comment, on new machines where we have
>> better control of the environment (frequency, etc...).
>>
>> These machines use cortex-a57 CPUs 

Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-13 Thread Kyrill Tkachov

Hi Christophe,

On 13/09/17 16:23, Christophe Lyon wrote:

Hi,

On 12 October 2016 at 11:22, Christophe Lyon  wrote:

On 12 October 2016 at 11:14, Kyrill Tkachov  wrote:

On 12/10/16 09:59, Christophe Lyon wrote:

Hi Kyrill,

On 7 October 2016 at 17:00, Kyrill Tkachov 
wrote:

Hi Christophe,


On 07/09/16 21:05, Christophe Lyon wrote:

Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:


http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe


   (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
  (ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
  (match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

Can you please put the more restrictive alternatives (lPy) first?
Same with the other patterns your patch touches.
Ok with that change if a rebased testing run is ok.
Sorry for the delay in reviewing.


OK, I will update my patch accordingly.

However, when I discussed this with Ramana during the Cauldron,
he requested benchmark results. So far, I was able to run spec2006
on an APM machine, and I'm seeing performance changes in the
range 11% improvement (465.tonto) to 7% degradation (433.milc).

Would that be acceptable?


Those sound like quite large swings.

Indeed, but most are in the -1%-+1% range.


Are you sure the machine was not running anything else at the time
or playing tricks with frequency scaling?

No, I had no such guarantee. I used this machine temporarily,
first to check that bootstrap worked. I planed to use another
board with an A57 "standard" microarch for proper
benchmarking, but I'm not sure when I'll have access to it
(wrt to e/o gcc stage1), that's why I reported these early
figures.


Did all iterations of SPEC show a consistent difference?

If the changes are consistent, could you have a look at the codegen
to see if there are any clues to the differences?

I will update my patch according to your comment, re-run the bench
and have a deeper look at the codegen differences.


I have finally been able to run benchmarks with my patch updated
according to your comment, on new machines where we have
better control of the environment (frequency, etc...).

These machines use cortex-a57 CPUs and spec2006 shows little
difference with and without this patch.

Comparing several runs with and without the patch:
- gcc is about 1% slower
- povray about 1% faster
- omnetpp about 1.5% faster


Yeah, that looks line with what I'd expect for this change.


The others are in the noise level.

OK for trunk?


This is ok if a bootstrap and test run on top of a recent compiler 
configured for --with-thumb

and for armv8-a passes okay (to exercise the new code).
Thanks for persevering with this!

Kyrill



Thanks,

Christophe



I'd like to get an explanation for these differences before committing
this patch. If they are just an effect of the more restrictive constraints
then there may be not much we can do, but I'd like to make sure there's not
anything else unexpected going on.


Thanks,


Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-13 Thread Christophe Lyon
Hi,

On 12 October 2016 at 11:22, Christophe Lyon  wrote:
> On 12 October 2016 at 11:14, Kyrill Tkachov  
> wrote:
>>
>> On 12/10/16 09:59, Christophe Lyon wrote:
>>>
>>> Hi Kyrill,
>>>
>>> On 7 October 2016 at 17:00, Kyrill Tkachov 
>>> wrote:

 Hi Christophe,


 On 07/09/16 21:05, Christophe Lyon wrote:
>
> Hi,
>
> The attached patch is a first part to solve PR 67591: it removes
> several occurrences of "IT blocks containing 32-bit Thumb
> instructions are deprecated in ARMv8" messages in the
> gcc/g++/libstdc++/fortran testsuites.
>
> It does not remove them all yet. This patch only modifies the
> *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
> *and_scc_scc and *and_scc_scc_cmp patterns.
> Additional work is required in sub_shiftsi etc, at least.
> I've started looking at these, but I decided I could already
> post this self-contained patch to check if this implementation
> is OK.
>
> Regarding *cmp_and and *cmp_ior patterns, the addition of the
> enabled_for_depr_it attribute is aggressive in the sense that it keeps
> only the alternatives with 'l' and 'Py' constraints, while in some
> cases the constraints could be relaxed. Indeed, these 2 patterns can
> swap their input comparisons, meaning that any of them can be emitted
> in the IT-block, and is thus subject to the ARMv8 deprecation.
> The generated code is possibly suboptimal in the cases where the
> operands are not swapped, since 'r' could be used.
>
> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
> and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
>
>
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html
>
> Bootstrapped OK on armv8l HW.
>
> Is this OK?
>
> Thanks,
>
> Christophe


   (define_insn_and_split "*ior_scc_scc"
 -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
 +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
  (ior:SI (match_operator:SI 3 "arm_comparison_operator"
 -[(match_operand:SI 1 "s_register_operand" "r")
 - (match_operand:SI 2 "arm_add_operand" "rIL")])
 +[(match_operand:SI 1 "s_register_operand" "r,l")
 + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
  (match_operator:SI 6 "arm_comparison_operator"
 -[(match_operand:SI 4 "s_register_operand" "r")
 - (match_operand:SI 5 "arm_add_operand" "rIL")])))
 +[(match_operand:SI 4 "s_register_operand" "r,l")
 + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

 Can you please put the more restrictive alternatives (lPy) first?
 Same with the other patterns your patch touches.
 Ok with that change if a rebased testing run is ok.
 Sorry for the delay in reviewing.

>>> OK, I will update my patch accordingly.
>>>
>>> However, when I discussed this with Ramana during the Cauldron,
>>> he requested benchmark results. So far, I was able to run spec2006
>>> on an APM machine, and I'm seeing performance changes in the
>>> range 11% improvement (465.tonto) to 7% degradation (433.milc).
>>>
>>> Would that be acceptable?
>>
>>
>> Those sound like quite large swings.
> Indeed, but most are in the -1%-+1% range.
>
>> Are you sure the machine was not running anything else at the time
>> or playing tricks with frequency scaling?
> No, I had no such guarantee. I used this machine temporarily,
> first to check that bootstrap worked. I planed to use another
> board with an A57 "standard" microarch for proper
> benchmarking, but I'm not sure when I'll have access to it
> (wrt to e/o gcc stage1), that's why I reported these early
> figures.
>
>> Did all iterations of SPEC show a consistent difference?
>>
>> If the changes are consistent, could you have a look at the codegen
>> to see if there are any clues to the differences?
> I will update my patch according to your comment, re-run the bench
> and have a deeper look at the codegen differences.
>

I have finally been able to run benchmarks with my patch updated
according to your comment, on new machines where we have
better control of the environment (frequency, etc...).

These machines use cortex-a57 CPUs and spec2006 shows little
difference with and without this patch.

Comparing several runs with and without the patch:
- gcc is about 1% slower
- povray about 1% faster
- omnetpp about 1.5% faster

The others are in the noise level.

OK for trunk?

Thanks,

Christophe


>> I'd like to get an explanation for these differences before committing
>> this patch. If they are just an effect of the more restrictive constraints
>> then 

Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-10-12 Thread Christophe Lyon
On 12 October 2016 at 11:14, Kyrill Tkachov  wrote:
>
> On 12/10/16 09:59, Christophe Lyon wrote:
>>
>> Hi Kyrill,
>>
>> On 7 October 2016 at 17:00, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Christophe,
>>>
>>>
>>> On 07/09/16 21:05, Christophe Lyon wrote:

 Hi,

 The attached patch is a first part to solve PR 67591: it removes
 several occurrences of "IT blocks containing 32-bit Thumb
 instructions are deprecated in ARMv8" messages in the
 gcc/g++/libstdc++/fortran testsuites.

 It does not remove them all yet. This patch only modifies the
 *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
 *and_scc_scc and *and_scc_scc_cmp patterns.
 Additional work is required in sub_shiftsi etc, at least.
 I've started looking at these, but I decided I could already
 post this self-contained patch to check if this implementation
 is OK.

 Regarding *cmp_and and *cmp_ior patterns, the addition of the
 enabled_for_depr_it attribute is aggressive in the sense that it keeps
 only the alternatives with 'l' and 'Py' constraints, while in some
 cases the constraints could be relaxed. Indeed, these 2 patterns can
 swap their input comparisons, meaning that any of them can be emitted
 in the IT-block, and is thus subject to the ARMv8 deprecation.
 The generated code is possibly suboptimal in the cases where the
 operands are not swapped, since 'r' could be used.

 Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
 and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:


 http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

 Bootstrapped OK on armv8l HW.

 Is this OK?

 Thanks,

 Christophe
>>>
>>>
>>>   (define_insn_and_split "*ior_scc_scc"
>>> -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
>>> +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
>>>  (ior:SI (match_operator:SI 3 "arm_comparison_operator"
>>> -[(match_operand:SI 1 "s_register_operand" "r")
>>> - (match_operand:SI 2 "arm_add_operand" "rIL")])
>>> +[(match_operand:SI 1 "s_register_operand" "r,l")
>>> + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
>>>  (match_operator:SI 6 "arm_comparison_operator"
>>> -[(match_operand:SI 4 "s_register_operand" "r")
>>> - (match_operand:SI 5 "arm_add_operand" "rIL")])))
>>> +[(match_operand:SI 4 "s_register_operand" "r,l")
>>> + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
>>>
>>> Can you please put the more restrictive alternatives (lPy) first?
>>> Same with the other patterns your patch touches.
>>> Ok with that change if a rebased testing run is ok.
>>> Sorry for the delay in reviewing.
>>>
>> OK, I will update my patch accordingly.
>>
>> However, when I discussed this with Ramana during the Cauldron,
>> he requested benchmark results. So far, I was able to run spec2006
>> on an APM machine, and I'm seeing performance changes in the
>> range 11% improvement (465.tonto) to 7% degradation (433.milc).
>>
>> Would that be acceptable?
>
>
> Those sound like quite large swings.
Indeed, but most are in the -1%-+1% range.

> Are you sure the machine was not running anything else at the time
> or playing tricks with frequency scaling?
No, I had no such guarantee. I used this machine temporarily,
first to check that bootstrap worked. I planed to use another
board with an A57 "standard" microarch for proper
benchmarking, but I'm not sure when I'll have access to it
(wrt to e/o gcc stage1), that's why I reported these early
figures.

> Did all iterations of SPEC show a consistent difference?
>
> If the changes are consistent, could you have a look at the codegen
> to see if there are any clues to the differences?
I will update my patch according to your comment, re-run the bench
and have a deeper look at the codegen differences.

> I'd like to get an explanation for these differences before committing
> this patch. If they are just an effect of the more restrictive constraints
> then there may be not much we can do, but I'd like to make sure there's not
> anything else unexpected going on.
>
Thanks,

Christophe

>>
>> The number of warnings (IT blocks containing 32-bit Thumb instructions
>> are deprecated in ARMv8)
>> was 712 without my patch and 122 with it. (using the hosts's binutils
>> 2.24/ubuntu).
>> I expected some warning, since as I said earlier other patterns need
>> to be updated.
>
>
> Understood. That's fine.
>
> Thanks,
> Kyrill
>
>
>>
>> Christophe
>>
>>
>>> Thanks for improving this area!
>>> Kyrill
>>>
>


Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-10-12 Thread Kyrill Tkachov


On 12/10/16 09:59, Christophe Lyon wrote:

Hi Kyrill,

On 7 October 2016 at 17:00, Kyrill Tkachov  wrote:

Hi Christophe,


On 07/09/16 21:05, Christophe Lyon wrote:

Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:

http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe


  (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
 (ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
 (match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

Can you please put the more restrictive alternatives (lPy) first?
Same with the other patterns your patch touches.
Ok with that change if a rebased testing run is ok.
Sorry for the delay in reviewing.


OK, I will update my patch accordingly.

However, when I discussed this with Ramana during the Cauldron,
he requested benchmark results. So far, I was able to run spec2006
on an APM machine, and I'm seeing performance changes in the
range 11% improvement (465.tonto) to 7% degradation (433.milc).

Would that be acceptable?


Those sound like quite large swings.
Are you sure the machine was not running anything else at the time
or playing tricks with frequency scaling?
Did all iterations of SPEC show a consistent difference?

If the changes are consistent, could you have a look at the codegen
to see if there are any clues to the differences?

I'd like to get an explanation for these differences before committing
this patch. If they are just an effect of the more restrictive constraints
then there may be not much we can do, but I'd like to make sure there's not
anything else unexpected going on.



The number of warnings (IT blocks containing 32-bit Thumb instructions
are deprecated in ARMv8)
was 712 without my patch and 122 with it. (using the hosts's binutils
2.24/ubuntu).
I expected some warning, since as I said earlier other patterns need
to be updated.


Understood. That's fine.

Thanks,
Kyrill



Christophe



Thanks for improving this area!
Kyrill





Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-10-12 Thread Christophe Lyon
Hi Kyrill,

On 7 October 2016 at 17:00, Kyrill Tkachov  wrote:
> Hi Christophe,
>
>
> On 07/09/16 21:05, Christophe Lyon wrote:
>>
>> Hi,
>>
>> The attached patch is a first part to solve PR 67591: it removes
>> several occurrences of "IT blocks containing 32-bit Thumb
>> instructions are deprecated in ARMv8" messages in the
>> gcc/g++/libstdc++/fortran testsuites.
>>
>> It does not remove them all yet. This patch only modifies the
>> *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
>> *and_scc_scc and *and_scc_scc_cmp patterns.
>> Additional work is required in sub_shiftsi etc, at least.
>> I've started looking at these, but I decided I could already
>> post this self-contained patch to check if this implementation
>> is OK.
>>
>> Regarding *cmp_and and *cmp_ior patterns, the addition of the
>> enabled_for_depr_it attribute is aggressive in the sense that it keeps
>> only the alternatives with 'l' and 'Py' constraints, while in some
>> cases the constraints could be relaxed. Indeed, these 2 patterns can
>> swap their input comparisons, meaning that any of them can be emitted
>> in the IT-block, and is thus subject to the ARMv8 deprecation.
>> The generated code is possibly suboptimal in the cases where the
>> operands are not swapped, since 'r' could be used.
>>
>> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
>> and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
>>
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html
>>
>> Bootstrapped OK on armv8l HW.
>>
>> Is this OK?
>>
>> Thanks,
>>
>> Christophe
>
>
>  (define_insn_and_split "*ior_scc_scc"
> -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
> +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
> (ior:SI (match_operator:SI 3 "arm_comparison_operator"
> -[(match_operand:SI 1 "s_register_operand" "r")
> - (match_operand:SI 2 "arm_add_operand" "rIL")])
> +[(match_operand:SI 1 "s_register_operand" "r,l")
> + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
> (match_operator:SI 6 "arm_comparison_operator"
> -[(match_operand:SI 4 "s_register_operand" "r")
> - (match_operand:SI 5 "arm_add_operand" "rIL")])))
> +[(match_operand:SI 4 "s_register_operand" "r,l")
> + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
>
> Can you please put the more restrictive alternatives (lPy) first?
> Same with the other patterns your patch touches.
> Ok with that change if a rebased testing run is ok.
> Sorry for the delay in reviewing.
>

OK, I will update my patch accordingly.

However, when I discussed this with Ramana during the Cauldron,
he requested benchmark results. So far, I was able to run spec2006
on an APM machine, and I'm seeing performance changes in the
range 11% improvement (465.tonto) to 7% degradation (433.milc).

Would that be acceptable?

The number of warnings (IT blocks containing 32-bit Thumb instructions
are deprecated in ARMv8)
was 712 without my patch and 122 with it. (using the hosts's binutils
2.24/ubuntu).
I expected some warning, since as I said earlier other patterns need
to be updated.

Christophe


> Thanks for improving this area!
> Kyrill
>


Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-10-07 Thread Kyrill Tkachov

Hi Christophe,

On 07/09/16 21:05, Christophe Lyon wrote:

Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe


 (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
(ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
(match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

Can you please put the more restrictive alternatives (lPy) first?
Same with the other patterns your patch touches.
Ok with that change if a rebased testing run is ok.
Sorry for the delay in reviewing.

Thanks for improving this area!
Kyrill



[ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-09-07 Thread Christophe Lyon
Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe
2016-09-05  Christophe Lyon  

PR target/67591
* config/arm/arm.md (*cmp_and): Add enabled_for_depr_it attribute.
(*cmp_ior): Likewise.
(*ior_scc_scc): Add alternative for enabled_for_depr_it attribute.
(*ior_scc_scc_cmp): Likewise.
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 318db75..0374bdd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9340,6 +9340,7 @@
   [(set_attr "conds" "set")
(set_attr "predicable" "no")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)
@@ -9422,6 +9423,7 @@
   "
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)
@@ -9444,13 +9446,13 @@
 )
 
 (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
(ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
(match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT
&& (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
@@ -9469,6 +9471,7 @@
  DOM_CC_X_OR_Y),
CC_REGNUM);"
   [(set_attr "conds" "clob")
+   (set_attr "enabled_for_depr_it" "no,yes")
(set_attr "length" "16")
(set_attr "type" "multiple")]
 )
@@ -9478,13 +9481,13 @@
 (define_insn_and_split "*ior_scc_scc_cmp"
   [(set (match_operand 0 "dominant_cc_register" "")
(compare (ior:SI (match_operator:SI 3 "arm_comparison_operator"
- [(match_operand:SI 1 "s_register_operand" "r")
-  (match_operand:SI 2 "arm_add_operand" "rIL")])
+ [(match_operand:SI 1 "s_register_operand" "r,l")
+  (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
 (match_operator:SI 6 "arm_comparison_operator"
- [(match_operand:SI 4 "s_register_operand" "r")
-  (match_operand:SI 5 "arm_add_operand" "rIL")]))
+ [(match_operand:SI 4 "s_register_operand" "r,l")
+  (match_operand:SI 5 "arm_add_operand" "rIL,lPy")]))
 (const_int 0)))
-   (set (match_operand:SI 7 "s_register_operand" "=Ts")
+   (set (match_operand:SI 7 "s_register_operand" "=Ts,Ts")
(ior:SI (match_op_dup 3 [(match_dup 1) (match_dup 2)])
(match_op_dup 6 [(match_dup 4) (match_dup 5)])))]
   "TARGET_32BIT"
@@ -9499,18 +9502,19 @@
(set (match_dup 7) (ne:SI (match_dup 0) (const_int 0)))]
   ""
   [(set_attr "conds" "set")
+