Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-03-07 Thread Christophe Lyon
On 7 March 2016 at 05:43, Ramana Radhakrishnan
 wrote:
> On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon
>  wrote:
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>>  wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:

 On 15 February 2016 at 12:32, Kyrill Tkachov
  wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
>> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE
>>> definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded
> benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-03-06 Thread Ramana Radhakrishnan
On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon
 wrote:
> On 17 February 2016 at 11:05, Kyrill Tkachov
>  wrote:
>>
>> On 17/02/16 10:03, Christophe Lyon wrote:
>>>
>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>>  wrote:

 On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>
> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
> wrote:
>>
>> This is my suggested fix for PR 65932, which is a linux kernel
>> miscompile with gcc-5.1.
>>
>> The problem here is caused by a chain of events.  The first is that
>> the relatively new eipa_sra pass creates fake parameters that behave
>> slightly differently than normal parameters.  The second is that the
>> optimizer creates phi nodes that copy local variables to fake
>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>> assumes that it can emit simple move instructions for these phi nodes.
>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>> forces QImode and HImode to unsigned, but a
>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>> short parameters have different in register representations than local
>> variables, and require a conversion when copying between them, a
>> conversion that the out-of-ssa pass can't easily emit.
>>
>> Ultimately, I think this is a problem in the arm backend.  It should
>> not have a PROMOTE_MODE macro that is changing the sign of char and
>> short local variables.  I also think that we should merge the
>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>> prevent this from happening again.
>>
>> I see four general problems with the current ARM PROMOTE_MODE
>> definition.
>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>> instruction was added.  It is a lose for armv6 and later.
>> 2) Unsigned short was only faster for targets that don't support
>> unaligned accesses.  Support for these targets was removed a while
>> ago, and this PROMODE_MODE hunk should have been removed at the same
>> time.  It was accidentally left behind.
>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>> converted to a function, the PROMOTE_MODE code was copied without the
>> UNSIGNEDP changes.  Thus it is only an accident that
>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>> changes to resolve the difference are safe.
>> 4) There is a general principle that you should only change signedness
>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>> in extra conversion instructions that make code slower.  The mips64
>> hardware for instance requires that 32-bit values be sign-extended
>> regardless of type, and instructions may trap if this is not true.
>> However, it has a set of 32-bit instructions that operate on these
>> values, and hence no conversions are required.  There is no similar
>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>> can be seen in the testcases where gcc emits both a zero-extend and a
>> sign-extend inside a loop, as the sign-extend is required for a
>> compare, and the zero-extend is required by PROMOTE_MODE.
>
> Given Kyrill's testing with the patch and the reasonably detailed
> check of the effects of code generation changes - The arm.h hunk is ok
> - I do think we should make this explicit in the documentation that
> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
> better still maybe put in a checking assert for the same in the
> mid-end but that could be the subject of a follow-up patch.
>
> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
> the testsuite fallout separately.

 Hi all,

 I'd like to backport the arm.h from this ( r233130) to the GCC 5
 branch. As the CSE patch from my series had some fallout on x86_64
 due to a deficiency in the AVX patterns that is too invasive to fix
 at this stage (and presumably backport), I'd like to just backport
 this arm.h fix and adjust the tests to XFAIL the fallout that comes
 with not applying the CSE patch. The attached patch does that.

 The code quality fallout on code outside the testsuite is not
 that gread. The SPEC benchmarks are not affected by not applying
 the CSE change, and only a single sequence in a popular embedded
 benchmark
 shows some degradation for -mtune=cortex-a9 in the same way as the
 wmul-1.c and wmul-2.c tests.

 I think that's a fair tradeoff for fixing the wrong code bug on that
 branch.

 Ok to backport r233130 and the 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-18 Thread Christophe Lyon
On 17 February 2016 at 11:22, Kyrill Tkachov
 wrote:
>
> On 17/02/16 10:20, Christophe Lyon wrote:
>>
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>>  wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:

 On 15 February 2016 at 12:32, Kyrill Tkachov
  wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
>> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi
>>> nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than
>>> local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE
>>> definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the
>>> sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it
>>> was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change
>>> signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded
> benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-17 Thread Kyrill Tkachov


On 17/02/16 10:20, Christophe Lyon wrote:

On 17 February 2016 at 11:05, Kyrill Tkachov
 wrote:

On 17/02/16 10:03, Christophe Lyon wrote:

On 15 February 2016 at 12:32, Kyrill Tkachov
 wrote:

On 04/02/16 08:58, Ramana Radhakrishnan wrote:

On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE
definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.

Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded
benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that
branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5
branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  

  PR target/65932
  * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
  xfail the scan-assembler test.
  * gcc.target/arm/wmul-2.c: Likewise.
  * gcc.target/arm/wmul-3.c: Simplify test to generate a single
smulbb.



Hi Kyrill,

I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
configuration to:
--with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
(target arm-none-linux-gnueabihf)

The generated code is:

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-17 Thread Christophe Lyon
On 17 February 2016 at 11:05, Kyrill Tkachov
 wrote:
>
> On 17/02/16 10:03, Christophe Lyon wrote:
>>
>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>  wrote:
>>>
>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:

 On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
 wrote:
>
> This is my suggested fix for PR 65932, which is a linux kernel
> miscompile with gcc-5.1.
>
> The problem here is caused by a chain of events.  The first is that
> the relatively new eipa_sra pass creates fake parameters that behave
> slightly differently than normal parameters.  The second is that the
> optimizer creates phi nodes that copy local variables to fake
> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
> assumes that it can emit simple move instructions for these phi nodes.
> And the fourth is that the ARM port has a PROMOTE_MODE macro that
> forces QImode and HImode to unsigned, but a
> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
> short parameters have different in register representations than local
> variables, and require a conversion when copying between them, a
> conversion that the out-of-ssa pass can't easily emit.
>
> Ultimately, I think this is a problem in the arm backend.  It should
> not have a PROMOTE_MODE macro that is changing the sign of char and
> short local variables.  I also think that we should merge the
> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
> prevent this from happening again.
>
> I see four general problems with the current ARM PROMOTE_MODE
> definition.
> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
> instruction was added.  It is a lose for armv6 and later.
> 2) Unsigned short was only faster for targets that don't support
> unaligned accesses.  Support for these targets was removed a while
> ago, and this PROMODE_MODE hunk should have been removed at the same
> time.  It was accidentally left behind.
> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
> converted to a function, the PROMOTE_MODE code was copied without the
> UNSIGNEDP changes.  Thus it is only an accident that
> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
> changes to resolve the difference are safe.
> 4) There is a general principle that you should only change signedness
> in PROMOTE_MODE if the hardware forces it, as otherwise this results
> in extra conversion instructions that make code slower.  The mips64
> hardware for instance requires that 32-bit values be sign-extended
> regardless of type, and instructions may trap if this is not true.
> However, it has a set of 32-bit instructions that operate on these
> values, and hence no conversions are required.  There is no similar
> case on ARM. Thus the conversions are unnecessary and unwise.  This
> can be seen in the testcases where gcc emits both a zero-extend and a
> sign-extend inside a loop, as the sign-extend is required for a
> compare, and the zero-extend is required by PROMOTE_MODE.

 Given Kyrill's testing with the patch and the reasonably detailed
 check of the effects of code generation changes - The arm.h hunk is ok
 - I do think we should make this explicit in the documentation that
 TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
 better still maybe put in a checking assert for the same in the
 mid-end but that could be the subject of a follow-up patch.

 Ok to apply just the arm.h hunk as I think Kyrill has taken care of
 the testsuite fallout separately.
>>>
>>> Hi all,
>>>
>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>> branch. As the CSE patch from my series had some fallout on x86_64
>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>> at this stage (and presumably backport), I'd like to just backport
>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>> with not applying the CSE patch. The attached patch does that.
>>>
>>> The code quality fallout on code outside the testsuite is not
>>> that gread. The SPEC benchmarks are not affected by not applying
>>> the CSE change, and only a single sequence in a popular embedded
>>> benchmark
>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>> wmul-1.c and wmul-2.c tests.
>>>
>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>> branch.
>>>
>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>> branch?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-02-15  Kyrylo Tkachov  
>>>
>>>  PR target/65932
>>>  * 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-17 Thread Kyrill Tkachov


On 17/02/16 10:03, Christophe Lyon wrote:

On 15 February 2016 at 12:32, Kyrill Tkachov
 wrote:

On 04/02/16 08:58, Ramana Radhakrishnan wrote:

On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.

Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  

 PR target/65932
 * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
 xfail the scan-assembler test.
 * gcc.target/arm/wmul-2.c: Likewise.
 * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.



Hi Kyrill,

I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
configuration to:
--with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
(target arm-none-linux-gnueabihf)

The generated code is:
 sxthr0, r0
 sxthr1, r1
 mul r0, r1, r0
instead of
 smulbb  r0, r1, r0
on trunk.

I guess 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-17 Thread Christophe Lyon
On 15 February 2016 at 12:32, Kyrill Tkachov
 wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  
>
> PR target/65932
> * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
> xfail the scan-assembler test.
> * gcc.target/arm/wmul-2.c: Likewise.
> * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>
>
Hi Kyrill,

I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
configuration to:
--with-cpu 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-16 Thread Ramana Radhakrishnan
On Mon, Feb 15, 2016 at 11:32 AM, Kyrill Tkachov
 wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.

I would hope that someone looks to fix the AVX port for GCC 7  - as
the CSE patch is something that is useful on ARM for performance in
real terms and could have benefits elsewhere.

OK to apply if no regressions - thanks for pushing this through.

Thanks,
Ramana

>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  
>
> PR target/65932
> * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
> 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-15 Thread Kyrill Tkachov


On 04/02/16 08:58, Ramana Radhakrishnan wrote:

On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.

Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  

PR target/65932
* gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
xfail the scan-assembler test.
* gcc.target/arm/wmul-2.c: Likewise.
* gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.





regards
Ramana





My change was tested with an arm bootstrap, make check, and SPEC
CPU2000 run.  The original poster verified that this gives a linux
kernel that boots correctly.

The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
are tests to verify that smulbb and/or smlabb are generated.
Eliminating the unnecessary sign conversions causes us to get better
code that doesn't include the smulbb and smlabb instructions.  I had
to modify the testcases 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-04 Thread Ramana Radhakrishnan
On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:
> This is my suggested fix for PR 65932, which is a linux kernel
> miscompile with gcc-5.1.
>
> The problem here is caused by a chain of events.  The first is that
> the relatively new eipa_sra pass creates fake parameters that behave
> slightly differently than normal parameters.  The second is that the
> optimizer creates phi nodes that copy local variables to fake
> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
> assumes that it can emit simple move instructions for these phi nodes.
> And the fourth is that the ARM port has a PROMOTE_MODE macro that
> forces QImode and HImode to unsigned, but a
> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
> short parameters have different in register representations than local
> variables, and require a conversion when copying between them, a
> conversion that the out-of-ssa pass can't easily emit.
>
> Ultimately, I think this is a problem in the arm backend.  It should
> not have a PROMOTE_MODE macro that is changing the sign of char and
> short local variables.  I also think that we should merge the
> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
> prevent this from happening again.
>
> I see four general problems with the current ARM PROMOTE_MODE definition.
> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
> instruction was added.  It is a lose for armv6 and later.
> 2) Unsigned short was only faster for targets that don't support
> unaligned accesses.  Support for these targets was removed a while
> ago, and this PROMODE_MODE hunk should have been removed at the same
> time.  It was accidentally left behind.
> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
> converted to a function, the PROMOTE_MODE code was copied without the
> UNSIGNEDP changes.  Thus it is only an accident that
> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
> changes to resolve the difference are safe.
> 4) There is a general principle that you should only change signedness
> in PROMOTE_MODE if the hardware forces it, as otherwise this results
> in extra conversion instructions that make code slower.  The mips64
> hardware for instance requires that 32-bit values be sign-extended
> regardless of type, and instructions may trap if this is not true.
> However, it has a set of 32-bit instructions that operate on these
> values, and hence no conversions are required.  There is no similar
> case on ARM. Thus the conversions are unnecessary and unwise.  This
> can be seen in the testcases where gcc emits both a zero-extend and a
> sign-extend inside a loop, as the sign-extend is required for a
> compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.


regards
Ramana




>
> My change was tested with an arm bootstrap, make check, and SPEC
> CPU2000 run.  The original poster verified that this gives a linux
> kernel that boots correctly.
>
> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
> are tests to verify that smulbb and/or smlabb are generated.
> Eliminating the unnecessary sign conversions causes us to get better
> code that doesn't include the smulbb and smlabb instructions.  I had
> to modify the testcases to get them to emit the desired instructions.
> With the testcase changes there are no additional testsuite failures,
> though I'm concerned that these testcases with the changes may be
> fragile, and future changes may break them again.



>
> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
> then maybe we should try to add new patterns to get the instructions
> emitted again for the unmodified testcases.
>
> Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-16 Thread Richard Earnshaw
On 15/07/15 16:54, Jim Wilson wrote:
 On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Tue, 14 Jul 2015, Jim Wilson wrote:

 Now that we do have the problem, we can't fix it without an ARM port ABI
 change, which is undesirable, so we may have to fix it with a MI change.

 What's the ABI implication of fixing the inconsistency?
 

I think that's the wrong question.  We wouldn't change the ABI to fix an
internal problem in GCC.  So the real question is what's the performance
impact of changing PROMOTE_MODE to be the same as the ABI requirements?

There's no easy answer to that since we'd have to consider all the
different architecture variants we currently have to support.  At the
very least we'd have to think about ARMv3 (no signed byte or half-word
loads), ARMv4 (signed byte and half-word loads), ARMv4t (thumb1) and
ARMv7 (thumb2).

R.

 Currently signed chars and signed shorts are passed sign-extended.  If
 we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE,
 then they will be passed zero-extended.
 
 Given the testcase:
 
 int sub (int) __attribute__ ((noinline));
 int sub2 (signed char) __attribute__ ((noinline));
 int sub (int i) { return sub2 (i); }
 int sub2 (signed char c) { return c  0xff; }
 
 Currently sub will do a char sign-extend to convert the int to signed
 char, and sub2 will do a char zero-extend for the and.  With the
 change, sub will do a char zero-extend to convert the int to unsigned
 char, and sub2 will do nothing.  If you compile sub without the change
 and sub2 with the change, then you lose the and operation and get a
 sign-extended char at the end.
 
 There were two MI changes suggested, one was fixing the out-of-ssa pass
 to handle SUBREG_PROMOTED_P promotions.  The other was to disallow
 creating PHI nodes between parms and locals.  I haven't had a chance to
 try implementing the second one yet; I hope to work on that today.

 Don't bother with the latter, it doesn't have a chance of being accepted.
 
 I tried looking at it anyways, as I need to learn more about this
 stuff.  It didn't seem feasible without changing a lot of optimization
 passes which doesn't seem reasonable.
 
 If the terrible hack in outof-ssa really will be necessary (and I really
 really hope it won't) then I think I prefer the approach you partly tried
 in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to
 the promoted subreg and deal with that situation in emit_partition_copy;
 I'd then hope that the unsignedsrcp parameter could go away (unfortunately
 the sizeexp will have to stay).
 
 Yes, I think that is a cleaner way to do it, but I had trouble getting
 that to work as I don't know enough about the code yet.  Doing it
 directly in emit_partition_copy was easier, just to prove it could
 work.  I can go back and try to make this work again.
 
 Jim
 



Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-16 Thread Richard Earnshaw
On 16/07/15 16:00, Michael Matz wrote:
 Hi,
 
 On Thu, 16 Jul 2015, Richard Earnshaw wrote:
 
 Now that we do have the problem, we can't fix it without an ARM port 
 ABI change, which is undesirable, so we may have to fix it with a MI 
 change.

 What's the ABI implication of fixing the inconsistency?


 I think that's the wrong question.  We wouldn't change the ABI to fix an 
 internal problem in GCC.  So the real question is what's the performance 
 impact of changing PROMOTE_MODE to be the same as the ABI requirements?
 
 Perhaps, I really only wanted to get a feeling what type of changes 
 in code generation would result with the flip.  I wonder why this ABI 
 implication was no problem back when PROMOTE_MODE and 
 target.promote_function_mode were seperated and the inconsistency 
 introduced.
 
 
 Ciao,
 Michael.
 

Promote_function_mode requirements were new with the EABI.  However, I
think it's probably only recent changes in the mid-end that have exposed
the problem.

R.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-16 Thread Michael Matz
Hi,

On Thu, 16 Jul 2015, Richard Earnshaw wrote:

  Now that we do have the problem, we can't fix it without an ARM port 
  ABI change, which is undesirable, so we may have to fix it with a MI 
  change.
 
  What's the ABI implication of fixing the inconsistency?
  
 
 I think that's the wrong question.  We wouldn't change the ABI to fix an 
 internal problem in GCC.  So the real question is what's the performance 
 impact of changing PROMOTE_MODE to be the same as the ABI requirements?

Perhaps, I really only wanted to get a feeling what type of changes 
in code generation would result with the flip.  I wonder why this ABI 
implication was no problem back when PROMOTE_MODE and 
target.promote_function_mode were seperated and the inconsistency 
introduced.


Ciao,
Michael.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-15 Thread Jim Wilson
On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Tue, 14 Jul 2015, Jim Wilson wrote:

 Now that we do have the problem, we can't fix it without an ARM port ABI
 change, which is undesirable, so we may have to fix it with a MI change.

 What's the ABI implication of fixing the inconsistency?

Currently signed chars and signed shorts are passed sign-extended.  If
we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE,
then they will be passed zero-extended.

Given the testcase:

int sub (int) __attribute__ ((noinline));
int sub2 (signed char) __attribute__ ((noinline));
int sub (int i) { return sub2 (i); }
int sub2 (signed char c) { return c  0xff; }

Currently sub will do a char sign-extend to convert the int to signed
char, and sub2 will do a char zero-extend for the and.  With the
change, sub will do a char zero-extend to convert the int to unsigned
char, and sub2 will do nothing.  If you compile sub without the change
and sub2 with the change, then you lose the and operation and get a
sign-extended char at the end.

 There were two MI changes suggested, one was fixing the out-of-ssa pass
 to handle SUBREG_PROMOTED_P promotions.  The other was to disallow
 creating PHI nodes between parms and locals.  I haven't had a chance to
 try implementing the second one yet; I hope to work on that today.

 Don't bother with the latter, it doesn't have a chance of being accepted.

I tried looking at it anyways, as I need to learn more about this
stuff.  It didn't seem feasible without changing a lot of optimization
passes which doesn't seem reasonable.

 If the terrible hack in outof-ssa really will be necessary (and I really
 really hope it won't) then I think I prefer the approach you partly tried
 in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to
 the promoted subreg and deal with that situation in emit_partition_copy;
 I'd then hope that the unsignedsrcp parameter could go away (unfortunately
 the sizeexp will have to stay).

Yes, I think that is a cleaner way to do it, but I had trouble getting
that to work as I don't know enough about the code yet.  Doing it
directly in emit_partition_copy was easier, just to prove it could
work.  I can go back and try to make this work again.

Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-15 Thread Michael Matz
Hi,

On Tue, 14 Jul 2015, Richard Earnshaw wrote:

  I think it's a backend bug that parameters and locals are extended 
  differently.  The code in tree-outof-ssa was written with the 
  assumption that the modes of RTL objects might be different (larger) 
  than the tree types suggest, but that they be _consistent_, i.e. that 
  the particular extension depends on only the types, not on the 
  tree-type of the decl.
 
 We went through this a couple of weeks back.  The backend documentation 
 for PROMOTE_MODE says:
 
  For most machines, the macro definition does not change 
 @var{unsignedp}. However, some machines, have instructions that 
 preferentially handle either signed or unsigned quantities of certain 
 modes.  For example, on the DEC Alpha, 32-bit loads from memory and 
 32-bit add instructions sign-extend the result to 64 bits.  On such 
 machines, set @var{unsignedp} according to which kind of extension is 
 more efficient.

We aren't talking about what PROMOTE_MODE is or is not allowed, but about 
an inconsistency between what PROMOTE_MODE does and what 
target.promote_function_mode does.  This is quite clearly a historic 
accident in the ARM port, namely that they extend parameters and locals in 
opposite ways, which originally wasn't even possible to describe in GCC.

 So clearly it anticipates that all permitted extensions should work, and 
 in particular it makes no mention of this having to match some 
 abi-mandated promotions.  That makes this a MI bug not a target one.

All extensions do work just fine, except when the LHS and RHS of a copy 
instruction with the same types require different extension types 
depending on if the rhs is a PARM_DECL or not.  It's also fundamental in 
gimple that copies can't change signedness, and this insonsistency breaks 
that assumption.  Jim says that this is actually harmless in reality, 
because even a copy propagation will at most replace a use of a VAR_DECL 
with a PARM_DECL, and at the point of such use the other promotion would 
be added by expand.  While the latter is true, I haven't really convinced 
myself that this really leads to correct code in all cases.


Ciao,
Michael.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-15 Thread Michael Matz
Hi,

On Tue, 14 Jul 2015, Jim Wilson wrote:

 Now that we do have the problem, we can't fix it without an ARM port ABI 
 change, which is undesirable, so we may have to fix it with a MI change.

What's the ABI implication of fixing the inconsistency?

 There were two MI changes suggested, one was fixing the out-of-ssa pass 
 to handle SUBREG_PROMOTED_P promotions.  The other was to disallow 
 creating PHI nodes between parms and locals.  I haven't had a chance to 
 try implementing the second one yet; I hope to work on that today.

Don't bother with the latter, it doesn't have a chance of being accepted.

If the terrible hack in outof-ssa really will be necessary (and I really 
really hope it won't) then I think I prefer the approach you partly tried 
in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to 
the promoted subreg and deal with that situation in emit_partition_copy; 
I'd then hope that the unsignedsrcp parameter could go away (unfortunately 
the sizeexp will have to stay).


Ciao,
Michael.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-14 Thread Richard Earnshaw
On 13/07/15 16:29, Michael Matz wrote:
 Hi,
 
 On Mon, 13 Jul 2015, Richard Biener wrote:
 
 On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson jim.wil...@linaro.org wrote:
 On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org 
 wrote:
 signed sub-word locals.  Thus to detect the need for a conversion, you
 have to have the decls, and we don't have them here.  There is also

 It probably is.  The decks for the parameter based SSA names are 
 available, for the PHI destination there might be no decl.

 I tried looking again, and found the decls.  I'm able to get correct
 code for my testcase with the attached patch to force the conversion.
 It is rather inelegant, but I think I can cache the values I need to
 make this simpler and cleaner.  I still don't have decls from
 insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
 looks like those are for breaking cycles, and hence might not need
 conversions.

 Yes, that looks like a defect.  CCing Micha who wrote this code
 
 I think it's a backend bug that parameters and locals are extended 
 differently.  The code in tree-outof-ssa was written with the assumption 
 that the modes of RTL objects might be different (larger) than the tree 
 types suggest, but that they be _consistent_, i.e. that the particular 
 extension depends on only the types, not on the tree-type of the decl.
 

We went through this a couple of weeks back.  The backend documentation
for PROMOTE_MODE says:

 For most machines, the macro definition does not change @var{unsignedp}.
However, some machines, have instructions that preferentially handle
either signed or unsigned quantities of certain modes.  For example, on
the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
sign-extend the result to 64 bits.  On such machines, set
@var{unsignedp} according to which kind of extension is more efficient.

So clearly it anticipates that all permitted extensions should work, and
in particular it makes no mention of this having to match some
abi-mandated promotions.  That makes this a MI bug not a target one.

R.


 I think the above assumption does make sense because it's also a 
 fundamental assumption in the whole gimple pipeline, types matter, not the 
 objects (or better, we slowly but surely work towards this).  Hence such 
 mismatches should either not exist (changing the backend), or should be 
 exposed explicitely during gimplification already.  The latter is a large 
 change, though.
 
 I think dealing with this situation in outof-ssa is a hack and I don't 
 like it.  It would extend the ugliness of different modes for same types 
 even more, and that's something we should (gradually) move away from.
 
 
 Ciao,
 Michael.
 



Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-14 Thread Richard Biener
On July 14, 2015 6:13:13 PM GMT+02:00, Richard Earnshaw 
richard.earns...@foss.arm.com wrote:
On 13/07/15 16:29, Michael Matz wrote:
 Hi,
 
 On Mon, 13 Jul 2015, Richard Biener wrote:
 
 On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson jim.wil...@linaro.org
wrote:
 On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson
jim.wil...@linaro.org wrote:
 signed sub-word locals.  Thus to detect the need for a
conversion, you
 have to have the decls, and we don't have them here.  There is
also

 It probably is.  The decks for the parameter based SSA names are
available, for the PHI destination there might be no decl.

 I tried looking again, and found the decls.  I'm able to get
correct
 code for my testcase with the attached patch to force the
conversion.
 It is rather inelegant, but I think I can cache the values I need
to
 make this simpler and cleaner.  I still don't have decls from
 insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
 looks like those are for breaking cycles, and hence might not need
 conversions.

 Yes, that looks like a defect.  CCing Micha who wrote this code
 
 I think it's a backend bug that parameters and locals are extended 
 differently.  The code in tree-outof-ssa was written with the
assumption 
 that the modes of RTL objects might be different (larger) than the
tree 
 types suggest, but that they be _consistent_, i.e. that the
particular 
 extension depends on only the types, not on the tree-type of the
decl.
 

We went through this a couple of weeks back.  The backend documentation
for PROMOTE_MODE says:

 For most machines, the macro definition does not change
@var{unsignedp}.
However, some machines, have instructions that preferentially handle
either signed or unsigned quantities of certain modes.  For example, on
the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
sign-extend the result to 64 bits.  On such machines, set
@var{unsignedp} according to which kind of extension is more
efficient.

So clearly it anticipates that all permitted extensions should work,
and
in particular it makes no mention of this having to match some
abi-mandated promotions.  That makes this a MI bug not a target one.

We could also decide that it is a documentation defect.  Are there any other 
targets with this inconsistency?

FWIW I'd prefer to expose the promoted incoming decls after gimplification. 
Independent on any inconsistency.

Richard.

R.


 I think the above assumption does make sense because it's also a 
 fundamental assumption in the whole gimple pipeline, types matter,
not the 
 objects (or better, we slowly but surely work towards this).  Hence
such 
 mismatches should either not exist (changing the backend), or should
be 
 exposed explicitely during gimplification already.  The latter is a
large 
 change, though.
 
 I think dealing with this situation in outof-ssa is a hack and I
don't 
 like it.  It would extend the ugliness of different modes for same
types 
 even more, and that's something we should (gradually) move away from.
 
 
 Ciao,
 Michael.
 




Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-14 Thread Jim Wilson
On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw
richard.earns...@foss.arm.com wrote:
 We went through this a couple of weeks back.  The backend documentation
 for PROMOTE_MODE says:

I disagree that this is a fully resolved issue.  I see clear problems
with how the ARM port uses PROMOTE_MODE.

  For most machines, the macro definition does not change @var{unsignedp}.
 However, some machines, have instructions that preferentially handle
 either signed or unsigned quantities of certain modes.  For example, on
 the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
 sign-extend the result to 64 bits.  On such machines, set
 @var{unsignedp} according to which kind of extension is more efficient.

The Alpha case is different than the ARM case.  The Alpha only changes
sign for 32-bit values in 64-bit registers.  The alpha happens to have
a nice set of instructions that operates on 32-bit values, that
accepts these wrong-signed values and handle them correctly.  Thus on
the alpha, it appears that there are no extra zero/sign extends
required, and everything is the same speed or faster with the wrong
sign.  The MIPS port works the same way.  This is not true on the arm
though.  The arm port is changing sign on 8 and 16 bit value, but does
not have instructions that directly operate on 8 and 16 bit values.
This requires the compiler to emit extra zero/sign extend instructions
that affect the performance of the code.  Other than the thumb1 8-bit
load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't
seen anything that is faster in the wrong sign, and there are a number
of operations that are actually slower because of the extra zero/sign
extends required by the arm PROMOTE_MODE definition.  I've given some
examples.

I have since noticed that the relatively new pass to optimize away
unnecessary zero/sign extensions is actually fixing some of the damage
caused by the ARM PROMOTE_MODE definition.  But there are still some
cases that won't get fixed, as per my earlier examples.  It is better
to emit the fast code at the beginning than to rely on an optimization
pass to convert the slow code into fast code.  So I think the ARM
PROMOTE_MODE definition still needs to be fixed.

 So clearly it anticipates that all permitted extensions should work, and
 in particular it makes no mention of this having to match some
 abi-mandated promotions.  That makes this a MI bug not a target one.

If you look at older gcc releases, like gcc-2.95.3, you will see that
there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which
says whether PROMOTE_MODE is applied to function arguments or not.
You can't have different zero/sign extensions for parameters and
locals in gcc-2.95.3.  The fact that you can have it today is more an
accident than a deliberate choice.  When PROMOTE_FUNCTION_ARGS was
hookized, it was made into a separate function, and for the ARM port,
the code for PROMOTE_MODE was not correctly copied into the new hook,
resulting in the accidental difference for parameter and local
zero/sign promotion for ARM.  The PROMOTE_MODE docs were written back
when different sign/zero extensions for parms/locals weren't possible,
and hence did not consider the case.

Now that we do have the problem, we can't fix it without an ARM port
ABI change, which is undesirable, so we may have to fix it with a MI
change.

There were two MI changes suggested, one was fixing the out-of-ssa
pass to handle SUBREG_PROMOTED_P promotions.  The other was to
disallow creating PHI nodes between parms and locals.  I haven't had a
chance to try implementing the second one yet; I hope to work on that
today.

Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-14 Thread Richard Biener
On July 14, 2015 6:49:18 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org wrote:
On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw
richard.earns...@foss.arm.com wrote:
 We went through this a couple of weeks back.  The backend
documentation
 for PROMOTE_MODE says:

I disagree that this is a fully resolved issue.  I see clear problems
with how the ARM port uses PROMOTE_MODE.

  For most machines, the macro definition does not change
@var{unsignedp}.
 However, some machines, have instructions that preferentially handle
 either signed or unsigned quantities of certain modes.  For example,
on
 the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
 sign-extend the result to 64 bits.  On such machines, set
 @var{unsignedp} according to which kind of extension is more
efficient.

The Alpha case is different than the ARM case.  The Alpha only changes
sign for 32-bit values in 64-bit registers.  The alpha happens to have
a nice set of instructions that operates on 32-bit values, that
accepts these wrong-signed values and handle them correctly.  Thus on
the alpha, it appears that there are no extra zero/sign extends
required, and everything is the same speed or faster with the wrong
sign.  The MIPS port works the same way.  This is not true on the arm
though.  The arm port is changing sign on 8 and 16 bit value, but does
not have instructions that directly operate on 8 and 16 bit values.
This requires the compiler to emit extra zero/sign extend instructions
that affect the performance of the code.  Other than the thumb1 8-bit
load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't
seen anything that is faster in the wrong sign, and there are a number
of operations that are actually slower because of the extra zero/sign
extends required by the arm PROMOTE_MODE definition.  I've given some
examples.

I have since noticed that the relatively new pass to optimize away
unnecessary zero/sign extensions is actually fixing some of the damage
caused by the ARM PROMOTE_MODE definition.  But there are still some
cases that won't get fixed, as per my earlier examples.  It is better
to emit the fast code at the beginning than to rely on an optimization
pass to convert the slow code into fast code.  So I think the ARM
PROMOTE_MODE definition still needs to be fixed.

 So clearly it anticipates that all permitted extensions should work,
and
 in particular it makes no mention of this having to match some
 abi-mandated promotions.  That makes this a MI bug not a target one.

If you look at older gcc releases, like gcc-2.95.3, you will see that
there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which
says whether PROMOTE_MODE is applied to function arguments or not.
You can't have different zero/sign extensions for parameters and
locals in gcc-2.95.3.  The fact that you can have it today is more an
accident than a deliberate choice.  When PROMOTE_FUNCTION_ARGS was
hookized, it was made into a separate function, and for the ARM port,
the code for PROMOTE_MODE was not correctly copied into the new hook,
resulting in the accidental difference for parameter and local
zero/sign promotion for ARM.  The PROMOTE_MODE docs were written back
when different sign/zero extensions for parms/locals weren't possible,
and hence did not consider the case.

Now that we do have the problem, we can't fix it without an ARM port
ABI change, which is undesirable, so we may have to fix it with a MI
change.

There were two MI changes suggested, one was fixing the out-of-ssa
pass to handle SUBREG_PROMOTED_P promotions.  The other was to
disallow creating PHI nodes between parms and locals.  I haven't had a
chance to try implementing the second one yet; I hope to work on that
today.

I will definitely veto such restriction.

Richard.

Jim




Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-13 Thread H.J. Lu
On Mon, Jul 13, 2015 at 8:29 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 13 Jul 2015, Richard Biener wrote:

 On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson jim.wil...@linaro.org wrote:
  On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
  richard.guent...@gmail.com wrote:
  On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org 
  wrote:
 signed sub-word locals.  Thus to detect the need for a conversion, you
 have to have the decls, and we don't have them here.  There is also
 
  It probably is.  The decks for the parameter based SSA names are 
  available, for the PHI destination there might be no decl.
 
  I tried looking again, and found the decls.  I'm able to get correct
  code for my testcase with the attached patch to force the conversion.
  It is rather inelegant, but I think I can cache the values I need to
  make this simpler and cleaner.  I still don't have decls from
  insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
  looks like those are for breaking cycles, and hence might not need
  conversions.

 Yes, that looks like a defect.  CCing Micha who wrote this code

 I think it's a backend bug that parameters and locals are extended
 differently.  The code in tree-outof-ssa was written with the assumption
 that the modes of RTL objects might be different (larger) than the tree
 types suggest, but that they be _consistent_, i.e. that the particular
 extension depends on only the types, not on the tree-type of the decl.

 I think the above assumption does make sense because it's also a
 fundamental assumption in the whole gimple pipeline, types matter, not the
 objects (or better, we slowly but surely work towards this).  Hence such
 mismatches should either not exist (changing the backend), or should be
 exposed explicitely during gimplification already.  The latter is a large
 change, though.

 I think dealing with this situation in outof-ssa is a hack and I don't
 like it.  It would extend the ugliness of different modes for same types
 even more, and that's something we should (gradually) move away from.


Different parts of GCC have their own ideas how parameters should
be promoted, which leads to subtle bugs like PR 64037.


-- 
H.J.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-13 Thread Michael Matz
Hi,

On Mon, 13 Jul 2015, Richard Biener wrote:

 On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson jim.wil...@linaro.org wrote:
  On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
  richard.guent...@gmail.com wrote:
  On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org 
  wrote:
 signed sub-word locals.  Thus to detect the need for a conversion, you
 have to have the decls, and we don't have them here.  There is also
 
  It probably is.  The decks for the parameter based SSA names are 
  available, for the PHI destination there might be no decl.
 
  I tried looking again, and found the decls.  I'm able to get correct
  code for my testcase with the attached patch to force the conversion.
  It is rather inelegant, but I think I can cache the values I need to
  make this simpler and cleaner.  I still don't have decls from
  insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
  looks like those are for breaking cycles, and hence might not need
  conversions.
 
 Yes, that looks like a defect.  CCing Micha who wrote this code

I think it's a backend bug that parameters and locals are extended 
differently.  The code in tree-outof-ssa was written with the assumption 
that the modes of RTL objects might be different (larger) than the tree 
types suggest, but that they be _consistent_, i.e. that the particular 
extension depends on only the types, not on the tree-type of the decl.

I think the above assumption does make sense because it's also a 
fundamental assumption in the whole gimple pipeline, types matter, not the 
objects (or better, we slowly but surely work towards this).  Hence such 
mismatches should either not exist (changing the backend), or should be 
exposed explicitely during gimplification already.  The latter is a large 
change, though.

I think dealing with this situation in outof-ssa is a hack and I don't 
like it.  It would extend the ugliness of different modes for same types 
even more, and that's something we should (gradually) move away from.


Ciao,
Michael.


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-13 Thread Richard Biener
On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson jim.wil...@linaro.org wrote:
 On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org 
 wrote:
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also

 It probably is.  The decks for the parameter based SSA names are available, 
 for the PHI destination there might be no decl.

 I tried looking again, and found the decls.  I'm able to get correct
 code for my testcase with the attached patch to force the conversion.
 It is rather inelegant, but I think I can cache the values I need to
 make this simpler and cleaner.  I still don't have decls from
 insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
 looks like those are for breaking cycles, and hence might not need
 conversions.

Yes, that looks like a defect.  CCing Micha who wrote this code

Richard.

 Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-10 Thread Jim Wilson
On Wed, Jul 8, 2015 at 3:54 PM, Jeff Law l...@redhat.com wrote:
 On 07/07/2015 10:29 AM, Jim Wilson wrote:
 This is critically important as various parts of the compiler will take a
 degenerate PHI node and propagate the RHS of the PHI into the uses of the
 LHS of the PHI -- without doing any conversions.

I think this is OK, because tree-outof-ssa does send code in basic
blocks through expand_expr, which will emit conversions if necessary.
it is only the conversion of PHI nodes to RTL that is the problem, as
it doesn't use expand_expr, and hence doesn't get the
SUBREG_PROMOTED_P conversions.

Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-10 Thread Jim Wilson
On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org 
 wrote:
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also

 It probably is.  The decks for the parameter based SSA names are available, 
 for the PHI destination there might be no decl.

I tried looking again, and found the decls.  I'm able to get correct
code for my testcase with the attached patch to force the conversion.
It is rather inelegant, but I think I can cache the values I need to
make this simpler and cleaner.  I still don't have decls from
insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
looks like those are for breaking cycles, and hence might not need
conversions.

Jim
Index: tree-outof-ssa.c
===
--- tree-outof-ssa.c	(revision 225477)
+++ tree-outof-ssa.c	(working copy)
@@ -230,11 +230,32 @@ set_location_for_edge (edge e)
SRC/DEST might be BLKmode memory locations SIZEEXP is a tree from
which we deduce the size to copy in that case.  */
 
-static inline rtx_insn *
-emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp)
+rtx_insn *
+emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp,
+		 tree var2 ATTRIBUTE_UNUSED)
 {
   start_sequence ();
 
+  /* If var2 is set, then sizeexp is the src decl and var2 is the dest decl.  */
+  if (var2)
+{
+  tree src_var = (TREE_CODE (sizeexp) == SSA_NAME
+		  ? SSA_NAME_VAR (sizeexp) : sizeexp);
+  tree dest_var = (TREE_CODE (var2) == SSA_NAME
+		   ? SSA_NAME_VAR (var2) : var2);
+  int src_unsignedp = TYPE_UNSIGNED (TREE_TYPE (src_var));
+  int dest_unsignedp = TYPE_UNSIGNED (TREE_TYPE (dest_var));
+  machine_mode src_mode = promote_decl_mode (src_var, src_unsignedp);
+  machine_mode dest_mode = promote_decl_mode (dest_var, dest_unsignedp);
+  if (src_unsignedp != dest_unsignedp
+	   src_mode != DECL_MODE (src_var)
+	   dest_mode != DECL_MODE (dest_var))
+	{
+	  src = gen_lowpart_common (DECL_MODE (src_var), src);
+	  unsignedsrcp = dest_unsignedp;
+	}
+}
+
   if (GET_MODE (src) != VOIDmode  GET_MODE (src) != GET_MODE (dest))
 src = convert_to_mode (GET_MODE (dest), src, unsignedsrcp);
   if (GET_MODE (src) == BLKmode)
@@ -256,7 +277,7 @@ emit_partition_copy (rtx dest, rtx src,
 static void
 insert_partition_copy_on_edge (edge e, int dest, int src, source_location locus)
 {
-  tree var;
+  tree var, var2;
   if (dump_file  (dump_flags  TDF_DETAILS))
 {
   fprintf (dump_file,
@@ -276,10 +297,11 @@ insert_partition_copy_on_edge (edge e, i
 set_curr_insn_location (locus);
 
   var = partition_to_var (SA.map, src);
+  var2 = partition_to_var (SA.map, dest);
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
    copy_rtx (SA.partition_to_pseudo[src]),
    TYPE_UNSIGNED (TREE_TYPE (var)),
-   var);
+   var, var2);
 
   insert_insn_on_edge (seq, e);
 }
@@ -373,7 +395,8 @@ insert_rtx_to_part_on_edge (edge e, int
  involved), so it doesn't matter.  */
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
    src, unsignedsrcp,
-   partition_to_var (SA.map, dest));
+   partition_to_var (SA.map, dest), 0);
+
 
   insert_insn_on_edge (seq, e);
 }
@@ -406,7 +429,7 @@ insert_part_to_rtx_on_edge (edge e, rtx
   rtx_insn *seq = emit_partition_copy (dest,
    copy_rtx (SA.partition_to_pseudo[src]),
    TYPE_UNSIGNED (TREE_TYPE (var)),
-   var);
+   var, 0);
 
   insert_insn_on_edge (seq, e);
 }


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-08 Thread Jeff Law

On 07/07/2015 10:29 AM, Jim Wilson wrote:


My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.
It'd be real interesting to see if that call ever does any real 
conversions -- ie, is it ever called with types that are not useless 
type conversions (useless_type_conversion_p has essentially become the 
definition of the gimple type system, for better or worse).



This is critically important as various parts of the compiler will take 
a degenerate PHI node and propagate the RHS of the PHI into the uses of 
the LHS of the PHI -- without doing any conversions.


So if you had something like

X = PHI (Y);

Where X is a sub-word local and Y is a sub-word parameter, the 
propagators will blindly replace uses of X with Y, which may be wrong if 
I understand your next paragraph correctly.





  The problem in this case is that we have the exact same type

for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.
And this would seem to imply that those objects can't be set/referenced 
in the same PHI because of the zero vs sign extension issue.




I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.
It's less about connecting parameters to locals and more about ensuring 
that objects connected are close enough in their types that one could be 
substituted for the other.


jeff


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jim Wilson
On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote:
 On 06/29/2015 07:15 PM, Jim Wilson wrote:
 So if these copies require a  conversion, then isn't it fundamentally
 wrong to have a PHI node which copies between them?  That would seem to
 implicate the eipa_sra pass as needing to be aware of these promotions and
 avoid having these objects with different representations appearing on the
 lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jim Wilson
On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw
richard.earns...@foss.arm.com wrote:
 Not quite, ARM state still has more flexible addressing modes for
 unsigned byte loads than for signed byte loads.  It's even worse with
 thumb1 where some signed loads have no single-register addressing mode
 (ie you have to copy zero into another register to use as an index
 before doing the load).

I wasn't aware of the load address problem.  That was something I
hadn't considered, and will have to look at that.  Load is just one
instruction though.  For most other instructions, a zero-extend
results in less efficient code, because it then forces a sign-extend
before a signed operation.  The fact that parameters and locals are
handled differently which requires conversions when copying between
them results in more inefficient code.  And changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be
unwise, so changing PROMOTE_MODE is the safer option.

Consider this testcase
extern signed short gs;

short
sub (void)
{
  signed short s = gs;
  int i;

  for (i = 0; i  10; i++)
{
  s += 1;
  if (s  10) break;
}

  return s;
}

The inner loop ends up as
.L3:
adds r3, r3, #1
mov r0, r1
uxth r3, r3
sxth r2, r3
cmp r2, #10
bgt .L8
cmp r2, r1
bne .L3
bx lr

We need the sign-extension for the compare.  We need the
zero-extension for the loop carried dependency.  We have two
extensions in every loop iteration, plus some extra register usage and
register movement.  We get better code for this example if we aren't
forcing signed shorts to be zero-extended via PROMOTE_MODE.

The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does
look like a problem though.  But this means the difference between
generating
movs r2, #0
ldrsh r3, [r3, r2]
with my patch, or
ldrh r3, [r3]
lsls r2, r3, #16
asrs r2, r2, #16
without my patch.  It isn't clear which sequence is better.  The
sign-extends in the second sequence can sometimes be optimized away,
and sometimes they can't be optimized away.  Similarly, in the first
sequence, loading zero into a reg can sometimes be optimized, and
sometimes it can't.  There is also no guarantee that you get the first
sequence with the patch or the second sequence without the patch.
There is a splitter for ldrsh, so you can get the second pattern
sometimes with the patch.  Similarly, it might be possible to get the
first pattern without the patch in some cases, though I don't have one
at the moment.

Jim


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Richard Biener
On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org wrote:
On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote:
 On 06/29/2015 07:15 PM, Jim Wilson wrote:
 So if these copies require a  conversion, then isn't it
fundamentally
 wrong to have a PHI node which copies between them?  That would seem
to
 implicate the eipa_sra pass as needing to be aware of these
promotions and
 avoid having these objects with different representations appearing
on the
 lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

It probably is.  The decks for the parameter based SSA names are available, for 
the PHI destination there might be no decl.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

I don't think disallowing such PHI nodes is the right thing to do.  I'd rather 
expose the TARGET_PROMOTE_FUNCTION_MODE effect earlier by modifying the 
parameter types during, say, gimplification.

Richard.

Jim




Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jeff Law

On 06/29/2015 07:15 PM, Jim Wilson wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.
So if these copies require a  conversion, then isn't it fundamentally 
wrong to have a PHI node which copies between them?  That would seem to 
implicate the eipa_sra pass as needing to be aware of these promotions 
and avoid having these objects with different representations appearing 
on the lhs/rhs of a PHI node.


Jeff


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-02 Thread Richard Earnshaw
On 30/06/15 02:15, Jim Wilson wrote:
 This is my suggested fix for PR 65932, which is a linux kernel
 miscompile with gcc-5.1.
 
 The problem here is caused by a chain of events.  The first is that
 the relatively new eipa_sra pass creates fake parameters that behave
 slightly differently than normal parameters.  The second is that the
 optimizer creates phi nodes that copy local variables to fake
 parameters and/or vice versa.  The third is that the ouf-of-ssa pass
 assumes that it can emit simple move instructions for these phi nodes.
 And the fourth is that the ARM port has a PROMOTE_MODE macro that
 forces QImode and HImode to unsigned, but a
 TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
 short parameters have different in register representations than local
 variables, and require a conversion when copying between them, a
 conversion that the out-of-ssa pass can't easily emit.
 
 Ultimately, I think this is a problem in the arm backend.  It should
 not have a PROMOTE_MODE macro that is changing the sign of char and
 short local variables.  I also think that we should merge the
 PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
 prevent this from happening again.
 

The documentation for PROMOTE_MODE says:

For most machines, the macro definition does not change @var{unsignedp}.
However, some machines, have instructions that preferentially handle
either signed or unsigned quantities of certain modes.  For example, on
the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
sign-extend the result to 64 bits.  On such machines, set
@var{unsignedp} according to which kind of extension is more efficient.

So it seems to me that the ARM backend is only doing what the
documentation says should work.

 I see four general problems with the current ARM PROMOTE_MODE definition.
 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
 instruction was added.  It is a lose for armv6 and later.

Not quite, ARM state still has more flexible addressing modes for
unsigned byte loads than for signed byte loads.  It's even worse with
thumb1 where some signed loads have no single-register addressing mode
(ie you have to copy zero into another register to use as an index
before doing the load).


R.

 2) Unsigned short was only faster for targets that don't support
 unaligned accesses.  Support for these targets was removed a while
 ago, and this PROMODE_MODE hunk should have been removed at the same
 time.  It was accidentally left behind.
 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
 converted to a function, the PROMOTE_MODE code was copied without the
 UNSIGNEDP changes.  Thus it is only an accident that
 TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
 TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
 changes to resolve the difference are safe.
 4) There is a general principle that you should only change signedness
 in PROMOTE_MODE if the hardware forces it, as otherwise this results
 in extra conversion instructions that make code slower.  The mips64
 hardware for instance requires that 32-bit values be sign-extended
 regardless of type, and instructions may trap if this is not true.
 However, it has a set of 32-bit instructions that operate on these
 values, and hence no conversions are required.  There is no similar
 case on ARM. Thus the conversions are unnecessary and unwise.  This
 can be seen in the testcases where gcc emits both a zero-extend and a
 sign-extend inside a loop, as the sign-extend is required for a
 compare, and the zero-extend is required by PROMOTE_MODE.
 
 My change was tested with an arm bootstrap, make check, and SPEC
 CPU2000 run.  The original poster verified that this gives a linux
 kernel that boots correctly.
 
 The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
 are tests to verify that smulbb and/or smlabb are generated.
 Eliminating the unnecessary sign conversions causes us to get better
 code that doesn't include the smulbb and smlabb instructions.  I had
 to modify the testcases to get them to emit the desired instructions.
 With the testcase changes there are no additional testsuite failures,
 though I'm concerned that these testcases with the changes may be
 fragile, and future changes may break them again.
 
 If there are ARM parts where smulbb/smlabb are faster than mul/mla,
 then maybe we should try to add new patterns to get the instructions
 emitted again for the unmodified testcases.
 
 Jim
 
 
 pr65932-3.patch
 
 
 gcc/
 2015-06-29  Jim Wilson  jim.wil...@linaro.org
 
   PR target/65932
   * config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and
   HImode.
 
 gcc/testsuite/
 2015-06-29  Jim Wilson  jim.wil...@linaro.org
 
   PR target/65932
   * gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers.  Cast
   multiply operands to short.
   * gcc.target/arm/wmul-2.c (vec_mpy): Cast 

[PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-06-29 Thread Jim Wilson
This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

My change was tested with an arm bootstrap, make check, and SPEC
CPU2000 run.  The original poster verified that this gives a linux
kernel that boots correctly.

The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
are tests to verify that smulbb and/or smlabb are generated.
Eliminating the unnecessary sign conversions causes us to get better
code that doesn't include the smulbb and smlabb instructions.  I had
to modify the testcases to get them to emit the desired instructions.
With the testcase changes there are no additional testsuite failures,
though I'm concerned that these testcases with the changes may be
fragile, and future changes may break them again.

If there are ARM parts where smulbb/smlabb are faster than mul/mla,
then maybe we should try to add new patterns to get the instructions
emitted again for the unmodified testcases.

Jim
gcc/
2015-06-29  Jim Wilson  jim.wil...@linaro.org

	PR target/65932
	* config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and
	HImode.

gcc/testsuite/
2015-06-29  Jim Wilson  jim.wil...@linaro.org

	PR target/65932
	* gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers.  Cast
	multiply operands to short.
	* gcc.target/arm/wmul-2.c (vec_mpy): Cast multiply result to short.
	* gcc.target/arm/wmul-3.c (mac): Cast multiply results to short.

Index: config/arm/arm.h
===
--- config/arm/arm.h	(revision 224672)
+++ config/arm/arm.h	(working copy)
@@ -523,16 +523,10 @@ extern int arm_arch_crc;
type, but kept valid in the wider mode.  The signedness of the
extension may differ from that of the type.  */
 
-/* It is far faster to zero extend chars than to sign extend them */
-
 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
   if (GET_MODE_CLASS (MODE) == MODE_INT		\
GET_MODE_SIZE (MODE)  4)  	\
 {		\
-  if (MODE == QImode)			\
-	UNSIGNEDP = 1;\
-  else if (MODE == HImode)			\
-	UNSIGNEDP = 1;\
   (MODE) = SImode;\
 }
 
Index: testsuite/gcc.target/arm/wmul-1.c
===
--- testsuite/gcc.target/arm/wmul-1.c	(revision 224672)
+++ testsuite/gcc.target/arm/wmul-1.c	(working copy)
@@ -2,14 +2,14 @@