Re: [google gcc-4_9]: Backport trunk:r232727 fix for PR/69403.

2016-01-28 Thread Carrot Wei
LGTM

thanks
Carrot

On Thu, Jan 28, 2016 at 11:53 AM, Han Shen  wrote:
> Backport trunk:r232727 fix for PR/69403 - wrong
> thumb2_ior_scc_strict_it insn pattern.
>
> Note this only affect armv7-a tuned for armv8 arch, tested / booted
> affected ChromeOS book.
>
> Ok for google/gcc-4_9 branch?
>
> --
> Han Shen


[PATCH powerpc64] Add a new constraint to insn movdi_internal64

2014-08-06 Thread Carrot Wei
Hi

When compiling an internal application I got an ICE
due to an invalid instruction generated by reload.

Before IRA, I have following insns:

(insn 139 136 581 10 (set (reg:DI 567)
(const_int 0 [0])) ./strings/stringpiece.h:205 discrim 1 520
{*movdi_internal64}
 (expr_list:REG_EQUIV (const_int 0 [0])
(nil)))

...

(insn 231 1062 237 24 (set (reg:V2DI 401 [ vect_cst_.7586 ])
(vec_concat:V2DI (reg:DI 235 [ fprint$lo_ ])
(reg:DI 567))) 1066 {vsx_concat_v2di}
 (expr_list:REG_DEAD (reg:DI 235 [ fprint$lo_ ])
(expr_list:REG_EQUAL (vec_concat:V2DI (reg:DI 235 [ fprint$lo_ ])
(const_int 0 [0]))
(nil

IRA decides register r567 should be spilled into memory

   a48(r567,l0)  -- assign memory
48:r567 l0   mem

Later when reload pass try to reload the value 0 into VSX register it
calls the hook function rs6000_preferred_reload_class, this function
specifically check the case that reload 0 into a VSX register, then
the target reload register class is VSX register. Then function
gen_reload calls gen_move_insn to generate the reload instruction,
which actually generates a movdi_internal64 insn, and it doesn't
contain a constraint to handle the 0->VSX register case, and causes
ICE.

VSX instructions can't load a constant into VSX registers directly,
but we can use XOR instruction to generate a 0 value. This patch add a
new constraint to insn pattern movdi_internal64 to load 0 into VSX
register.

Passed regression test without failure. OK for trunk and 4.9 branch?

thanks
Guozhi Wei


2014-08-06  Guozhi Wei  

* config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint.


patch
Description: Binary data


Re: [PATCH powerpc64] Add a new constraint to insn movdi_internal64

2014-08-08 Thread Carrot Wei
Thank you for the comment, I've updated the patch.

OK for trunk and 4.9 branch?


2014-08-08  Guozhi Wei  

* config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint.


On Wed, Aug 6, 2014 at 7:28 PM, Segher Boessenkool
 wrote:
> On Wed, Aug 06, 2014 at 04:48:26PM -0700, Carrot Wei wrote:
>> -   mtvsrd %x0,%1"
>> -  [(set_attr "type" 
>> "store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr")
>> -   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4,4,4")])
>> +   mtvsrd %x0,%1
>> +   xxlxor %x0,%x0"
>> +  [(set_attr "type" 
>> "store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr,*")
>> +   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4,4,4,4")])
>
>
> "type" should be "vecsimple" for xxlxor, not "integer".
>
>
> Segher


patch
Description: Binary data


[Patch AArch64] Fix for PR62040

2014-08-19 Thread Carrot Wei
Hi

Current AArch64 backend can generate rtl expressions like
(vec_duplicate:DI (const_int 0 [0])), which causes ICE in
simplify_const_unary_operation because vec_duplicate should generate
vector mode only.

As suggested by Andrew in the bug entry, I split the original insn
patterns to avoid scalar mode vec_duplicate expression.

Passed regression tests on qemu without failure.
OK for trunk and 4.9 branch?

thanks
Guozhi Wei

2014-08-19  Guozhi Wei  

PR target/62040
* config/aarch64/iterators.md (VQ_NO2E, VQ_2E): New iterators.
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_): Split
it into two patterns.
(move_lo_quad_internal_be_): Likewise.


patch
Description: Binary data


Re: [Patch AArch64] Fix for PR62040

2014-08-20 Thread Carrot Wei
Good suggestion. Add the testcase.

thanks
Guozhi Wei

2014-08-20  Guozhi Wei  

PR target/62040
* gcc.target/aarch64/pr62040.c: New test.

Index: pr62040.c
===
--- pr62040.c (revision 0)
+++ pr62040.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-g -Os" } */
+
+#include "arm_neon.h"
+
+extern bar(int32x4_t);
+
+void foo() {
+  int32x4x4_t rows;
+  uint64x2x2_t row01;
+
+  row01.val[0] = vreinterpretq_u64_s32(rows.val[0]);
+  row01.val[1] = vreinterpretq_u64_s32(rows.val[1]);
+  uint64x1_t row3l = vget_low_u64(row01.val[0]);
+  row01.val[0] = vcombine_u64(vget_low_u64(row01.val[1]), row3l);
+  int32x4_t xxx = vreinterpretq_s32_u64(row01.val[0]);
+  int32x4_t out = vtrn1q_s32 (xxx, xxx);
+  bar(out);
+}

On Wed, Aug 20, 2014 at 4:26 AM, Kyrill Tkachov  wrote:
> Hi Carrot,
>
> cc'ing the aarch64 maintainers...
>
>
> On 20/08/14 00:43, Carrot Wei wrote:
>>
>> Hi
>>
>> Current AArch64 backend can generate rtl expressions like
>> (vec_duplicate:DI (const_int 0 [0])), which causes ICE in
>> simplify_const_unary_operation because vec_duplicate should generate
>> vector mode only.
>>
>> As suggested by Andrew in the bug entry, I split the original insn
>> patterns to avoid scalar mode vec_duplicate expression.
>
>
> The documentation does say that vec_concat can work on scalars, so it seems
> ok to me at a glance (but I can't approve it myself).
>
> Would be nice to have an addition to the testsuite though...
>
> Kyrill
>
>
>> Passed regression tests on qemu without failure.
>> OK for trunk and 4.9 branch?
>>
>> thanks
>> Guozhi Wei
>>
>> 2014-08-19  Guozhi Wei  
>>
>>  PR target/62040
>>  * config/aarch64/iterators.md (VQ_NO2E, VQ_2E): New iterators.
>>  * config/aarch64/aarch64-simd.md (move_lo_quad_internal_):
>> Split
>>  it into two patterns.
>>  (move_lo_quad_internal_be_): Likewise.
>
>
>


[Patch AArch64] Fix for PR62262

2014-08-26 Thread Carrot Wei
Hi

In insn pattern "*andim_ashift_bfiz", if the operands[2] is larger than
the size of register, gcc may generate invalid assembler code. If operands[2]
is larger than the size of the underlying type of INTVAL, the following insn
condition may also be undefined.

"exact_log2 ((INTVAL (operands[3]) >> INTVAL (operands[2])) + 1) >= 0
 && (INTVAL (operands[3]) & ((1 << INTVAL (operands[2])) - 1)) == 0"

It can be fixed by checking the value of operands[2] before using it.

Passed regression test without failure. OK for trunk and 4.9 branch?

thanks
Guozhi Wei


2014-08-26  Guozhi Wei  

PR target/62262
* config/aarch64/aarch64.md (*andim_ashift_bfiz): Check the shift
amount before using it.


2014-08-26  Guozhi Wei  

PR target/62262
* gcc.target/aarch64/pr62262.c: New test.


patch
Description: Binary data


Re: [Patch AArch64] Fix for PR62040

2014-08-28 Thread Carrot Wei
AArch64 maintainers, could you help to review following patches?

https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01966.html
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02060.html

thanks
Guozhi Wei


On Wed, Aug 20, 2014 at 12:51 PM, Carrot Wei  wrote:
> Good suggestion. Add the testcase.
>
> thanks
> Guozhi Wei
>
> 2014-08-20  Guozhi Wei  
>
> PR target/62040
> * gcc.target/aarch64/pr62040.c: New test.
>
> Index: pr62040.c
> ===
> --- pr62040.c (revision 0)
> +++ pr62040.c (revision 0)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -Os" } */
> +
> +#include "arm_neon.h"
> +
> +extern bar(int32x4_t);
> +
> +void foo() {
> +  int32x4x4_t rows;
> +  uint64x2x2_t row01;
> +
> +  row01.val[0] = vreinterpretq_u64_s32(rows.val[0]);
> +  row01.val[1] = vreinterpretq_u64_s32(rows.val[1]);
> +  uint64x1_t row3l = vget_low_u64(row01.val[0]);
> +  row01.val[0] = vcombine_u64(vget_low_u64(row01.val[1]), row3l);
> +  int32x4_t xxx = vreinterpretq_s32_u64(row01.val[0]);
> +  int32x4_t out = vtrn1q_s32 (xxx, xxx);
> +  bar(out);
> +}
>
> On Wed, Aug 20, 2014 at 4:26 AM, Kyrill Tkachov  
> wrote:
>> Hi Carrot,
>>
>> cc'ing the aarch64 maintainers...
>>
>>
>> On 20/08/14 00:43, Carrot Wei wrote:
>>>
>>> Hi
>>>
>>> Current AArch64 backend can generate rtl expressions like
>>> (vec_duplicate:DI (const_int 0 [0])), which causes ICE in
>>> simplify_const_unary_operation because vec_duplicate should generate
>>> vector mode only.
>>>
>>> As suggested by Andrew in the bug entry, I split the original insn
>>> patterns to avoid scalar mode vec_duplicate expression.
>>
>>
>> The documentation does say that vec_concat can work on scalars, so it seems
>> ok to me at a glance (but I can't approve it myself).
>>
>> Would be nice to have an addition to the testsuite though...
>>
>> Kyrill
>>
>>
>>> Passed regression tests on qemu without failure.
>>> OK for trunk and 4.9 branch?
>>>
>>> thanks
>>> Guozhi Wei
>>>
>>> 2014-08-19  Guozhi Wei  
>>>
>>>  PR target/62040
>>>  * config/aarch64/iterators.md (VQ_NO2E, VQ_2E): New iterators.
>>>  * config/aarch64/aarch64-simd.md (move_lo_quad_internal_):
>>> Split
>>>  it into two patterns.
>>>  (move_lo_quad_internal_be_): Likewise.
>>
>>
>>


Re: [Patch AArch64] Fix for PR62040

2014-09-03 Thread Carrot Wei
Changed the coding style.


2014-09-03  Guozhi Wei  

PR target/62040
* gcc.target/aarch64/pr62040.c: New test.


Index: pr62040.c
===
--- pr62040.c (revision 0)
+++ pr62040.c (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-g -Os" } */
+
+#include "arm_neon.h"
+
+extern void bar (int32x4_t);
+
+void
+foo ()
+{
+  int32x4x4_t rows;
+  uint64x2x2_t row01;
+
+  row01.val[0] = vreinterpretq_u64_s32 (rows.val[0]);
+  row01.val[1] = vreinterpretq_u64_s32 (rows.val[1]);
+  uint64x1_t row3l = vget_low_u64 (row01.val[0]);
+  row01.val[0] = vcombine_u64 (vget_low_u64 (row01.val[1]), row3l);
+  int32x4_t xxx = vreinterpretq_s32_u64 (row01.val[0]);
+  int32x4_t out = vtrn1q_s32 (xxx, xxx);
+  bar (out);
+}


On Wed, Sep 3, 2014 at 6:04 AM, Marcus Shawcroft
 wrote:
> On 20 August 2014 20:51, Carrot Wei  wrote:
>> Good suggestion. Add the testcase.
>>
>> thanks
>> Guozhi Wei
>>
>> 2014-08-20  Guozhi Wei  
>>
>> PR target/62040
>> * gcc.target/aarch64/pr62040.c: New test.
>>
>> Index: pr62040.c
>> ===
>> --- pr62040.c (revision 0)
>> +++ pr62040.c (revision 0)
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-g -Os" } */
>> +
>> +#include "arm_neon.h"
>> +
>> +extern bar(int32x4_t);
>> +
>> +void foo() {
>> +  int32x4x4_t rows;
>> +  uint64x2x2_t row01;
>> +
>> +  row01.val[0] = vreinterpretq_u64_s32(rows.val[0]);
>> +  row01.val[1] = vreinterpretq_u64_s32(rows.val[1]);
>> +  uint64x1_t row3l = vget_low_u64(row01.val[0]);
>> +  row01.val[0] = vcombine_u64(vget_low_u64(row01.val[1]), row3l);
>> +  int32x4_t xxx = vreinterpretq_s32_u64(row01.val[0]);
>> +  int32x4_t out = vtrn1q_s32 (xxx, xxx);
>> +  bar(out);
>> +}
>
>
> GNU coding style please.
>
> /Marcus


Re: [Google 4.9] Backport of r210828

2014-10-09 Thread Carrot Wei
LGTM.
Your description could be more detail, such as which tests on which target.

On Tue, Oct 7, 2014 at 2:06 PM, Sterling Augustine
 wrote:
> The enclosed patch for google 4.9 is a backport of r210828 from
> trunk.
>
> googleref:b/14623977
>
> The given tests now pass when run by hand, but timeout under dejagnu
> I will be sending a different change to fix that.
>
> OK for google 4.9?


[PATCH PR63530] Fix the pointer alignment in vectorization

2014-10-17 Thread Carrot Wei
Hi

In current vectorization pass, when a new vector pointer is created,
its alignment is not set correctly. We should use DR_MISALIGNMENT (dr)
since only this alignment is adjusted when loop peeling or multi
version is occurred.

This patch passed following tests:
x86_64 bootstrap.
x86_64 regression test.
armv7 regression test.

OK for trunk and 4.9 branch?

thanks
Guozhi Wei

2014-10-17  Guozhi Wei  

PR tree-optimization/63530
tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Set
pointer alignment according to DR_MISALIGNMENT.


patch
Description: Binary data


Re: [PATCH PR63530] Fix the pointer alignment in vectorization

2014-10-20 Thread Carrot Wei
Hi Richard

An arm testcase that can reproduce this bug is attached.

2014-10-20  Guozhi Wei  

PR tree-optimization/63530
gcc.target/arm/pr63530.c: New testcase.


Index: pr63530.c
===
--- pr63530.c (revision 0)
+++ pr63530.c (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon } */
+/* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2
-ftree-vectorize -funroll-loops --param
\"max-completely-peeled-insns=400\"" } */
+
+typedef struct {
+  unsigned char map[256];
+  int i;
+} A, *AP;
+
+void* calloc(int, int);
+
+AP foo (int n)
+{
+  AP b = (AP)calloc (1, sizeof (A));
+  int i;
+  for (i = n; i < 256; i++)
+b->map[i] = i;
+  return b;
+}
+
+/* { dg-final { scan-assembler-not "vst1.64" } } */

On Mon, Oct 20, 2014 at 1:19 AM, Richard Biener
 wrote:
> On Fri, Oct 17, 2014 at 7:58 PM, Carrot Wei  wrote:
>
> I miss a testcase.  I also miss a comment before this code explaining
> why DR_MISALIGNMENT if not -1 is valid and why it is not valid if

DR_MISALIGNMENT (dr) == -1 means some unknown misalignment, otherwise
it means some known misalignment.
See the usage in file tree-vect-stmts.c.

> 'offset' is supplied (what about 'byte_offset' btw?).  Also if peeling

It is for conservative, so it doesn't change the logic when offset is supplied.
I've checked that most of the passed in offset are caused by negative
step, its impact to DR_MISALIGNMENT should have already be considered
in function vect_update_misalignment_for_peel, but the comments of
vect_create_addr_base_for_vector_ref does not guarantee this usage of
offset.

The usage of byte_offset is quite broken, many direct or indirect
callers don't provide the parameters. So only the author can comment
this.

> for alignment aligned this ref (misalign == 0) you don't set the alignment.
>
I assume if no misalignment is specified, the natural alignment of the
vector type is used, and caused the wrong code in our case, is it
right?

> Thus you may fix a bug (not sure without a testcase) but the new code
> certainly doesn't look 100% correct.
>
> That said, I would have expected that we can unconditionally do
>
>  set_ptr_info_alignment (..., align, misalign)
>
> if misalign is != -1 and if we adjust misalign by offset * step + byte_offset
> (usually both are constants).
>
> Also we can still trust the alignment copied from addr_base modulo
> vector element size even if DR_MISALIGN is -1.  This may matter
> for targets that require element-alignment for vector accesses.
>


Re: [PATCH PR63530] Fix the pointer alignment in vectorization

2014-10-22 Thread Carrot Wei
Thanks for the review.
Following patch has been committed. I will port them to 4.9 branch
several days later.

2014-10-22  Guozhi Wei  

PR tree-optimization/63530
tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Set
pointer alignment according to DR_MISALIGNMENT.

2014-10-22  Guozhi Wei  

PR tree-optimization/63530
gcc.dg/vect/pr63530.c: New testcase.



On Tue, Oct 21, 2014 at 1:04 AM, Richard Biener
 wrote:
> On Mon, Oct 20, 2014 at 10:10 PM, Carrot Wei  wrote:
>> Hi Richard
>>
>> An arm testcase that can reproduce this bug is attached.
>>
>> 2014-10-20  Guozhi Wei  
>>
>> PR tree-optimization/63530
>> gcc.target/arm/pr63530.c: New testcase.
>>
>>
>> Index: pr63530.c
>> ===
>> --- pr63530.c (revision 0)
>> +++ pr63530.c (revision 0)
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_neon } */
>> +/* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2
>> -ftree-vectorize -funroll-loops --param
>> \"max-completely-peeled-insns=400\"" } */
>> +
>> +typedef struct {
>> +  unsigned char map[256];
>> +  int i;
>> +} A, *AP;
>> +
>> +void* calloc(int, int);
>> +
>> +AP foo (int n)
>> +{
>> +  AP b = (AP)calloc (1, sizeof (A));
>> +  int i;
>> +  for (i = n; i < 256; i++)
>> +b->map[i] = i;
>> +  return b;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vst1.64" } } */
>
> Can you make it a runtime testcase that fails?  This way it would be
> less target specific.
>
>> On Mon, Oct 20, 2014 at 1:19 AM, Richard Biener
>>  wrote:
>>> On Fri, Oct 17, 2014 at 7:58 PM, Carrot Wei  wrote:
>>>
>>> I miss a testcase.  I also miss a comment before this code explaining
>>> why DR_MISALIGNMENT if not -1 is valid and why it is not valid if
>>
>> DR_MISALIGNMENT (dr) == -1 means some unknown misalignment, otherwise
>> it means some known misalignment.
>> See the usage in file tree-vect-stmts.c.
>
> I know that.
>
>>> 'offset' is supplied (what about 'byte_offset' btw?).  Also if peeling
>>
>> It is for conservative, so it doesn't change the logic when offset is 
>> supplied.
>> I've checked that most of the passed in offset are caused by negative
>> step, its impact to DR_MISALIGNMENT should have already be considered
>> in function vect_update_misalignment_for_peel, but the comments of
>> vect_create_addr_base_for_vector_ref does not guarantee this usage of
>> offset.
>>
>> The usage of byte_offset is quite broken, many direct or indirect
>> callers don't provide the parameters. So only the author can comment
>> this.
>
> Well - please make it consistent at least, (offset || byte_offset).
>
>>> for alignment aligned this ref (misalign == 0) you don't set the alignment.
>>>
>> I assume if no misalignment is specified, the natural alignment of the
>> vector type is used, and caused the wrong code in our case, is it
>> right?
>
> No, DR_MISALIGNMENT == 0 means "aligned".
>
> OTOH it's quite unnecessary to do all the dance with the alignment
> part of the SSA name info (unnecessary for the actual memory references
> created by the vectorizer).  The type of the access ultimatively provides
> the larger alignment - the SSA name info only may enlarge it even
> further (thus it's dangerous to specify larger than valid there).
>
> So if you don't want to make it really optimal wrt offset/byte_offset please
> do
>
>if (offset || byte_offset || misalign == -1)
> mark_ptr_info_alignment_unknown (...)
>else
> set_ptr_info_alignment (..., align, misalign);
>
> The patch is ok with this change and the testcase turned into a runtime one
> and moved to gcc.dg/vect/
>
> Thanks,
> Richard.
>
>>> Thus you may fix a bug (not sure without a testcase) but the new code
>>> certainly doesn't look 100% correct.
>>>
>>> That said, I would have expected that we can unconditionally do
>>>
>>>  set_ptr_info_alignment (..., align, misalign)
>>>
>>> if misalign is != -1 and if we adjust misalign by offset * step + 
>>> byte_offset
>>> (usually both are constants).
>>>
>>> Also we can still trust the alignment copied from addr_base modulo
>>> vector element size even if DR_MISALIGN is -1.  This may matter
>>> for targets that require element-alignment for vector accesses.
>>>


patch
Description: Binary data


patch2
Description: Binary data


[PATCH, GOOGLE] Backport patch r212222 to google 4.9 branch

2015-05-13 Thread Carrot Wei
Hi

The more strict devirtualization condition in this patch helps to fix
google bug b/19872411.

Bootstraped and regression tested on x86-64.
OK for google 4.9 branch?


patch
Description: Binary data


[GOOGLE] Avoid calling walk_aliased_vdefs in O0 function

2015-03-23 Thread Carrot Wei
This patch fixes google internal bug b/19277289. It can only be
reproduced in google 4.9 branch.

In function param_change_prob, there is following function call

walk_aliased_vdefs (&refd, gimple_vuse (stmt), record_modified, &info, NULL);

If the source code is compiled with optimization, but cfun is compiled
with -O0, gimple_vuse (stmt) can be null, and walk_aliased_vdefs will
crash.

Previously we didn't reach walk_aliased_vdefs because bb->frequency is
0, and following code always return early.

  if (!bb->frequency)
return REG_BR_PROB_BASE;

Dehao's patch r210989 propagates some non-zero value into
bb->frequency, so now it doesn't return early and reaches the crash
point.

An obvious fix is skipping O0 functions in inline_generate_summary,
but many other places will access data structures created in
inline_analyze_function without check O0 for individual functions, and
will crash. So this patch simply checks if O0 is specified then return
early in function param_change_prob, same behavior as previous.

Boot strapped on x86-64, passed regression test on x86-64 and arm.

OK for google 4.9 branch?


patch
Description: Binary data


[PATCH, Google] Notify df framework when removing an insn in simplify-got

2015-06-09 Thread Carrot Wei
Hi

I forgot to notify df framework when I removed an insn, it caused df
verification failure described in google bug b/16155462.

The following patch passed regression test on arm qemu in both thumb
and arm modes.
OK for google 4.9 branch?


Index: simplify-got.c
===
--- simplify-got.c (revision 224174)
+++ simplify-got.c (working copy)
@@ -169,7 +169,10 @@

   /* Since there is no usage of pic_reg now, we can remove it.  */
   if (use)
-remove_insn (use);
+{
+  df_insn_delete (use);
+  remove_insn (use);
+}
   targetm.got_access.clear_pic_reg ();
   free (got_accesses);
   htab_delete (var_table);


Re: [PATCH, Google] Notify df framework when removing an insn in simplify-got

2015-06-10 Thread Carrot Wei
On Tue, Jun 9, 2015 at 11:43 PM, Richard Sandiford
 wrote:
> Carrot Wei  writes:
>> Index: simplify-got.c
>> ===
>> --- simplify-got.c (revision 224174)
>> +++ simplify-got.c (working copy)
>> @@ -169,7 +169,10 @@
>>
>>/* Since there is no usage of pic_reg now, we can remove it.  */
>>if (use)
>> -remove_insn (use);
>> +{
>> +  df_insn_delete (use);
>> +  remove_insn (use);
>> +}
>>targetm.got_access.clear_pic_reg ();
>>free (got_accesses);
>>htab_delete (var_table);
>
> Why not just use delete_insn ()?
>
> Thanks,
> Richard
>

Good suggestion, testing following patch,

Index: simplify-got.c
===
--- simplify-got.c (revision 224174)
+++ simplify-got.c (working copy)
@@ -169,7 +169,7 @@

   /* Since there is no usage of pic_reg now, we can remove it.  */
   if (use)
-remove_insn (use);
+delete_insn (use);
   targetm.got_access.clear_pic_reg ();
   free (got_accesses);
   htab_delete (var_table);


[PATCH, Google] Backport trunk patch r220860 to google/4.9 branch

2015-06-17 Thread Carrot Wei
Hi

In aarch64 backend of google/4.9 branch, the split pattern for insn
aarch64_lshr_sisd_or_int_3 destroys one of the source operands,
causes the later usage of the operand get a wrong value (google bug
17907351).

The bug has been fixed in trunk by r220860. This patch backports it to
google/4.9 branch. It passed regression test on aarch64-qemu.

OK for google/4.9?

thanks
Guozhi Wei


Index: config/aarch64/aarch64.md
===
--- config/aarch64/aarch64.md (revision 224524)
+++ config/aarch64/aarch64.md (working copy)
@@ -2786,7 +2786,7 @@

 ;; Logical right shift using SISD or Integer instruction
 (define_insn "*aarch64_lshr_sisd_or_int_3"
-  [(set (match_operand:GPI 0 "register_operand" "=w,w,r")
+  [(set (match_operand:GPI 0 "register_operand" "=w,&w,r")
 (lshiftrt:GPI
   (match_operand:GPI 1 "register_operand" "w,w,r")
   (match_operand:QI 2 "aarch64_reg_or_shift_imm_"
"Us,w,rUs")))]
@@ -2805,11 +2805,13 @@
(match_operand:DI 1 "aarch64_simd_register")
(match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
 (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
-(unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
-  ""
+(unspec:DI [(match_dup 1) (match_dup 3)] UNSPEC_SISD_USHL))]
+  {
+operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )

 (define_split
@@ -2818,11 +2820,13 @@
(match_operand:SI 1 "aarch64_simd_register")
(match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
 (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
-(unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))]
-  ""
+(unspec:SI [(match_dup 1) (match_dup 3)] UNSPEC_USHL_2S))]
+  {
+operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )

 ;; Arithmetic right shift using SISD or Integer instruction
Index: testsuite/gcc.target/aarch64/sisd-shft-neg_1.c
===
--- testsuite/gcc.target/aarch64/sisd-shft-neg_1.c (revision 0)
+++ testsuite/gcc.target/aarch64/sisd-shft-neg_1.c (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+extern void abort (void);
+
+#define force_simd_si(v) asm volatile ("mov %s0, %1.s[0]" :"=w" (v) :"w" (v) :)
+
+unsigned int
+shft_add (unsigned int a, unsigned int b)
+{
+  unsigned int c;
+
+  force_simd_si (a);
+  force_simd_si (b);
+  c = a >> b;
+  force_simd_si (c);
+
+  return c + b;
+}
+
+int
+main (void)
+{
+  unsigned int i = 0;
+  unsigned int a = 0xdeadbeef;
+
+  for (i = 0; i < 32; i++)
+  {
+unsigned int exp = (a / (1 << i) + i);
+unsigned int got = shft_add (a, i);
+
+if (exp != got)
+  abort ();
+  }
+
+  return 0;
+}
+


[Google] Port patch r215585 to Google/4.9 branch

2014-12-16 Thread Carrot Wei
Hi

In Google application we hit the same problem as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63341, so we also need
the patch r215585 for Google/4.9 branch.

It passed following tests:
bootstrap and regression test on x86-64.
regression test on ppc.

Google reference 18687126.

OK for Google/4.9 branch?


patch
Description: Binary data


Re: [Google] Port patch r215585 to Google/4.9 branch

2014-12-16 Thread Carrot Wei
Yes, it has been long time since last merge, so it is good idea to do
another merge.

On Tue, Dec 16, 2014 at 11:32 AM, Xinliang David Li  wrote:
> The fix is already in upstream gcc-4.9 branch? If yes, we just need a merge.
>
> David
>
> On Tue, Dec 16, 2014 at 11:30 AM, Carrot Wei  wrote:
>> Hi
>>
>> In Google application we hit the same problem as
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63341, so we also need
>> the patch r215585 for Google/4.9 branch.
>>
>> It passed following tests:
>> bootstrap and regression test on x86-64.
>> regression test on ppc.
>>
>> Google reference 18687126.
>>
>> OK for Google/4.9 branch?


[google][4.7] Move the building of gcov constructor function after initialization of gcov_info_var

2013-05-02 Thread Carrot Wei
This patch fixes google bug 8397853 and targets google 4.7 branch.

In LIPO mode, when coverage_obj_init is called, cgraph_state is
CGRAPH_STATE_FINISHED. The variable gcov_info_var is created but not
initialized. When cgraph_build_static_cdtor is called, the new function and
variables are expanded immediately since cgraph_state is CGRAPH_STATE_FINISHED.
It causes gcov_info_var into .bss section. But later in function
coverage_obj_finish we initialize gcov_info_var with non zero contents, so it
should not be put into .bss section.

In FDO mode we don't have this problem because when coverage_obj_init is called,
cgraph_state is CGRAPH_STATE_IPA_SSA. When cgraph_build_static_cdtor is called,
the new function is not immediately expanded. The variable will have been
properly initialized when it is expanded.

It can be fixed by moving the construction of gcov constructor after
initialization of gcov_info_var.

Tested with following testing:
x86-64 bootstrap.
x86-64 regression test.
power64 regression test on qemu.

The only regression for power64 is
FAIL: gcc.dg/torture/tls/tls-test.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
It is a flaky test case in our testing environment since all other executions
with different compiler options failed. All testing of tls-test.c pass native
power64 testing.

thanks
Carrot

2013-05-02  Guozhi Wei  

* coverage.c (gcov_info_type): New global variable.
(coverage_obj_init): Move the construction of gcov constructor to
(build_init_ctor): here.
(coverage_obj_finish): Call build_init_ctor after initialization of
gcov_info_var.


Index: coverage.c
===
--- coverage.c (revision 198425)
+++ coverage.c (working copy)
@@ -123,6 +123,7 @@

 /* Coverage info VAR_DECL and function info type nodes.  */
 static GTY(()) tree gcov_info_var;
+static GTY(()) tree gcov_info_type;
 static GTY(()) tree gcov_fn_info_type;
 static GTY(()) tree gcov_fn_info_ptr_type;

@@ -2478,14 +2479,12 @@
   return build_constructor (info_type, v1);
 }

-/* Create the gcov_info types and object.  Generate the constructor
-   function to call __gcov_init.  Does not generate the initializer
+/* Create the gcov_info types and object. Does not generate the initializer
for the object.  Returns TRUE if coverage data is being emitted.  */

 static bool
 coverage_obj_init (void)
 {
-  tree gcov_info_type, ctor, stmt, init_fn;
   unsigned n_counters = 0;
   unsigned ix;
   struct coverage_data *fn;
@@ -2531,24 +2530,6 @@
   ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 0);
   DECL_NAME (gcov_info_var) = get_identifier (name_buf);

-  /* Build a decl for __gcov_init.  */
-  init_fn = build_pointer_type (gcov_info_type);
-  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
-  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
- get_identifier ("__gcov_init"), init_fn);
-  TREE_PUBLIC (init_fn) = 1;
-  DECL_EXTERNAL (init_fn) = 1;
-  DECL_ASSEMBLER_NAME (init_fn);
-
-  /* Generate a call to __gcov_init(&gcov_info).  */
-  ctor = NULL;
-  stmt = build_fold_addr_expr (gcov_info_var);
-  stmt = build_call_expr (init_fn, 1, stmt);
-  append_to_statement_list (stmt, &ctor);
-
-  /* Generate a constructor to run it.  */
-  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
-
   return true;
 }

@@ -2570,6 +2551,32 @@
   return ctor;
 }

+/* Generate the constructor function to call __gcov_init.  */
+
+static void
+build_init_ctor ()
+{
+  tree ctor, stmt, init_fn;
+
+  /* Build a decl for __gcov_init.  */
+  init_fn = build_pointer_type (gcov_info_type);
+  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
+  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+ get_identifier ("__gcov_init"), init_fn);
+  TREE_PUBLIC (init_fn) = 1;
+  DECL_EXTERNAL (init_fn) = 1;
+  DECL_ASSEMBLER_NAME (init_fn);
+
+  /* Generate a call to __gcov_init(&gcov_info).  */
+  ctor = NULL;
+  stmt = build_fold_addr_expr (gcov_info_var);
+  stmt = build_call_expr (init_fn, 1, stmt);
+  append_to_statement_list (stmt, &ctor);
+
+  /* Generate a constructor to run it.  */
+  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
+}
+
 /* Finalize the coverage data.  Generates the array of pointers to
function objects from CTOR.  Generate the gcov_info initializer.  */

@@ -2589,9 +2596,12 @@
   DECL_NAME (fn_info_ary) = get_identifier (name_buf);
   DECL_INITIAL (fn_info_ary) = build_constructor (fn_info_ary_type, ctor);
   varpool_finalize_decl (fn_info_ary);
-
+
   DECL_INITIAL (gcov_info_var)
 = build_info (TREE_TYPE (gcov_info_var), fn_info_ary);
+
+  build_init_ctor ();
+
   varpool_finalize_decl (gcov_info_var);
 }


[PATCH] Refactor coverage.c, outline the construction of gcov constructor

2013-05-02 Thread Carrot Wei
This patch outline the construction of gcov constructor from coverage_obj_init
as a separate function build_init_ctor.

It passed bootstrap and regression test on x86-64.

OK for trunk and google 4.7 branch?

thanks
Carrot


2013-05-02  Guozhi Wei  

* coverage.c (gcov_info_type): New global variable.
(coverage_obj_init): Move the construction of gcov constructor to
(build_init_ctor): here.


Index: coverage.c
===
--- coverage.c (revision 198557)
+++ coverage.c (working copy)
@@ -99,6 +99,7 @@

 /* Coverage info VAR_DECL and function info type nodes.  */
 static GTY(()) tree gcov_info_var;
+static GTY(()) tree gcov_info_type;
 static GTY(()) tree gcov_fn_info_type;
 static GTY(()) tree gcov_fn_info_ptr_type;

@@ -967,6 +968,32 @@
   return build_constructor (info_type, v1);
 }

+/* Generate the constructor function to call __gcov_init.  */
+
+static void
+build_init_ctor ()
+{
+  tree ctor, stmt, init_fn;
+
+  /* Build a decl for __gcov_init.  */
+  init_fn = build_pointer_type (gcov_info_type);
+  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
+  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+ get_identifier ("__gcov_init"), init_fn);
+  TREE_PUBLIC (init_fn) = 1;
+  DECL_EXTERNAL (init_fn) = 1;
+  DECL_ASSEMBLER_NAME (init_fn);
+
+  /* Generate a call to __gcov_init(&gcov_info).  */
+  ctor = NULL;
+  stmt = build_fold_addr_expr (gcov_info_var);
+  stmt = build_call_expr (init_fn, 1, stmt);
+  append_to_statement_list (stmt, &ctor);
+
+  /* Generate a constructor to run it.  */
+  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
+}
+
 /* Create the gcov_info types and object.  Generate the constructor
function to call __gcov_init.  Does not generate the initializer
for the object.  Returns TRUE if coverage data is being emitted.  */
@@ -974,7 +1001,6 @@
 static bool
 coverage_obj_init (void)
 {
-  tree gcov_info_type, ctor, stmt, init_fn;
   unsigned n_counters = 0;
   unsigned ix;
   struct coverage_data *fn;
@@ -1020,24 +1046,8 @@
   ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 0);
   DECL_NAME (gcov_info_var) = get_identifier (name_buf);

-  /* Build a decl for __gcov_init.  */
-  init_fn = build_pointer_type (gcov_info_type);
-  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
-  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
- get_identifier ("__gcov_init"), init_fn);
-  TREE_PUBLIC (init_fn) = 1;
-  DECL_EXTERNAL (init_fn) = 1;
-  DECL_ASSEMBLER_NAME (init_fn);
+  build_init_ctor ();

-  /* Generate a call to __gcov_init(&gcov_info).  */
-  ctor = NULL;
-  stmt = build_fold_addr_expr (gcov_info_var);
-  stmt = build_call_expr (init_fn, 1, stmt);
-  append_to_statement_list (stmt, &ctor);
-
-  /* Generate a constructor to run it.  */
-  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
-
   return true;
 }


Re: [PATCH] Refactor coverage.c, outline the construction of gcov constructor

2013-05-03 Thread Carrot Wei
On Fri, May 3, 2013 at 1:03 AM, Richard Biener
 wrote:
> On Thu, May 2, 2013 at 10:41 PM, Carrot Wei  wrote:
>> This patch outline the construction of gcov constructor from 
>> coverage_obj_init
>> as a separate function build_init_ctor.
>>
>> It passed bootstrap and regression test on x86-64.
>>
>> OK for trunk and google 4.7 branch?
>
> Please pass gcov_info_type as parameter to build_init_ctor to avoid
> the new GC root.

Pass gcov_info_type as parameter to build_init_ctor may look more cleaner,
but it may limit build_init_ctor can only be called from coverage_obj_init.
In some situations build_init_ctor may need to be called by other functions,
such as my patch for lipo
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00078.html.

On the other hand, gcov_info_type is the only non-global variable in the
family of variables that have similar purpose. So it is not very bad to make
it also global.

thanks
Carrot

>
> Ok with that change.
>
> Thanks,
> Richard.
>
>> thanks
>> Carrot
>>
>>
>> 2013-05-02  Guozhi Wei  
>>
>> * coverage.c (gcov_info_type): New global variable.
>> (coverage_obj_init): Move the construction of gcov constructor to
>> (build_init_ctor): here.
>>
>>
>> Index: coverage.c
>> ===
>> --- coverage.c (revision 198557)
>> +++ coverage.c (working copy)
>> @@ -99,6 +99,7 @@
>>
>>  /* Coverage info VAR_DECL and function info type nodes.  */
>>  static GTY(()) tree gcov_info_var;
>> +static GTY(()) tree gcov_info_type;
>>  static GTY(()) tree gcov_fn_info_type;
>>  static GTY(()) tree gcov_fn_info_ptr_type;
>>
>> @@ -967,6 +968,32 @@
>>return build_constructor (info_type, v1);
>>  }
>>
>> +/* Generate the constructor function to call __gcov_init.  */
>> +
>> +static void
>> +build_init_ctor ()
>> +{
>> +  tree ctor, stmt, init_fn;
>> +
>> +  /* Build a decl for __gcov_init.  */
>> +  init_fn = build_pointer_type (gcov_info_type);
>> +  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
>> +  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> + get_identifier ("__gcov_init"), init_fn);
>> +  TREE_PUBLIC (init_fn) = 1;
>> +  DECL_EXTERNAL (init_fn) = 1;
>> +  DECL_ASSEMBLER_NAME (init_fn);
>> +
>> +  /* Generate a call to __gcov_init(&gcov_info).  */
>> +  ctor = NULL;
>> +  stmt = build_fold_addr_expr (gcov_info_var);
>> +  stmt = build_call_expr (init_fn, 1, stmt);
>> +  append_to_statement_list (stmt, &ctor);
>> +
>> +  /* Generate a constructor to run it.  */
>> +  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
>> +}
>> +
>>  /* Create the gcov_info types and object.  Generate the constructor
>> function to call __gcov_init.  Does not generate the initializer
>> for the object.  Returns TRUE if coverage data is being emitted.  */
>> @@ -974,7 +1001,6 @@
>>  static bool
>>  coverage_obj_init (void)
>>  {
>> -  tree gcov_info_type, ctor, stmt, init_fn;
>>unsigned n_counters = 0;
>>unsigned ix;
>>struct coverage_data *fn;
>> @@ -1020,24 +1046,8 @@
>>ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 0);
>>DECL_NAME (gcov_info_var) = get_identifier (name_buf);
>>
>> -  /* Build a decl for __gcov_init.  */
>> -  init_fn = build_pointer_type (gcov_info_type);
>> -  init_fn = build_function_type_list (void_type_node, init_fn, NULL);
>> -  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> - get_identifier ("__gcov_init"), init_fn);
>> -  TREE_PUBLIC (init_fn) = 1;
>> -  DECL_EXTERNAL (init_fn) = 1;
>> -  DECL_ASSEMBLER_NAME (init_fn);
>> +  build_init_ctor ();
>>
>> -  /* Generate a call to __gcov_init(&gcov_info).  */
>> -  ctor = NULL;
>> -  stmt = build_fold_addr_expr (gcov_info_var);
>> -  stmt = build_call_expr (init_fn, 1, stmt);
>> -  append_to_statement_list (stmt, &ctor);
>> -
>> -  /* Generate a constructor to run it.  */
>> -  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
>> -
>>return true;
>>  }


Re: [PATCH] Refactor coverage.c, outline the construction of gcov constructor

2013-05-03 Thread Carrot Wei
commited as 198591.

On Fri, May 3, 2013 at 11:51 AM, Xinliang David Li  wrote:
> Please do what Richard suggested. gcov_info_type can be obtained from
> gcov_info_var decl.
>
> David
>
>
> On Fri, May 3, 2013 at 11:31 AM, Carrot Wei  wrote:
>> On Fri, May 3, 2013 at 1:03 AM, Richard Biener
>>  wrote:
>>> On Thu, May 2, 2013 at 10:41 PM, Carrot Wei  wrote:
>>>> This patch outline the construction of gcov constructor from 
>>>> coverage_obj_init
>>>> as a separate function build_init_ctor.
>>>>
>>>> It passed bootstrap and regression test on x86-64.
>>>>
>>>> OK for trunk and google 4.7 branch?
>>>
>>> Please pass gcov_info_type as parameter to build_init_ctor to avoid
>>> the new GC root.
>>
>> Pass gcov_info_type as parameter to build_init_ctor may look more cleaner,
>> but it may limit build_init_ctor can only be called from coverage_obj_init.
>> In some situations build_init_ctor may need to be called by other functions,
>> such as my patch for lipo
>> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00078.html.
>>
>> On the other hand, gcov_info_type is the only non-global variable in the
>> family of variables that have similar purpose. So it is not very bad to make
>> it also global.
>>
>> thanks
>> Carrot
>>
>>>
>>> Ok with that change.
>>>
>>> Thanks,
>>> Richard.
>>>


Re: [google][4.7] Move the building of gcov constructor function after initialization of gcov_info_var

2013-05-06 Thread Carrot Wei
After the refactoring has been checked in, the bug fixing part is simply
a moving a function call.

Tested by running ./buildit with both x86-64 and power64 targets.
The last time regression of tls-tests.c disappeared. So it is really flaky
in our testing environment.

thanks
Carrot

2013-05-02  Guozhi Wei  

* coverage.c (coverage_obj_init): Move the call of build_init_ctor to
(coverage_obj_finish): here.


Index: coverage.c
===
--- coverage.c (revision 198654)
+++ coverage.c (working copy)
@@ -2504,8 +2504,7 @@
   cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
 }

-/* Create the gcov_info types and object.  Generate the constructor
-   function to call __gcov_init.  Does not generate the initializer
+/* Create the gcov_info types and object. Does not generate the initializer
for the object.  Returns TRUE if coverage data is being emitted.  */

 static bool
@@ -2557,8 +2556,6 @@
   ASM_GENERATE_INTERNAL_LABEL (name_buf, "LPBX", 0);
   DECL_NAME (gcov_info_var) = get_identifier (name_buf);

-  build_init_ctor (gcov_info_type);
-
   return true;
 }

@@ -2581,7 +2578,8 @@
 }

 /* Finalize the coverage data.  Generates the array of pointers to
-   function objects from CTOR.  Generate the gcov_info initializer.  */
+   function objects from CTOR.  Generate the gcov_info initializer.
+   Generate the constructor function to call __gcov_init.  */

 static void
 coverage_obj_finish (VEC(constructor_elt,gc) *ctor)
@@ -2599,9 +2597,12 @@
   DECL_NAME (fn_info_ary) = get_identifier (name_buf);
   DECL_INITIAL (fn_info_ary) = build_constructor (fn_info_ary_type, ctor);
   varpool_finalize_decl (fn_info_ary);
-
+
   DECL_INITIAL (gcov_info_var)
 = build_info (TREE_TYPE (gcov_info_var), fn_info_ary);
+
+  build_init_ctor (TREE_TYPE (gcov_info_var));
+
   varpool_finalize_decl (gcov_info_var);
 }



On Thu, May 2, 2013 at 11:15 AM, Xinliang David Li  wrote:
> I suggest submitting the refactoring part of the changes to GCC trunk first.
>
> thanks,
>
> David
>
> On Thu, May 2, 2013 at 11:06 AM, Carrot Wei  wrote:
>> This patch fixes google bug 8397853 and targets google 4.7 branch.
>>
>> In LIPO mode, when coverage_obj_init is called, cgraph_state is
>> CGRAPH_STATE_FINISHED. The variable gcov_info_var is created but not
>> initialized. When cgraph_build_static_cdtor is called, the new function and
>> variables are expanded immediately since cgraph_state is 
>> CGRAPH_STATE_FINISHED.
>> It causes gcov_info_var into .bss section. But later in function
>> coverage_obj_finish we initialize gcov_info_var with non zero contents, so it
>> should not be put into .bss section.
>>
>> In FDO mode we don't have this problem because when coverage_obj_init is 
>> called,
>> cgraph_state is CGRAPH_STATE_IPA_SSA. When cgraph_build_static_cdtor is 
>> called,
>> the new function is not immediately expanded. The variable will have been
>> properly initialized when it is expanded.
>>
>> It can be fixed by moving the construction of gcov constructor after
>> initialization of gcov_info_var.
>>
>> Tested with following testing:
>> x86-64 bootstrap.
>> x86-64 regression test.
>> power64 regression test on qemu.
>>
>> The only regression for power64 is
>> FAIL: gcc.dg/torture/tls/tls-test.c  -O2 -flto -fno-use-linker-plugin
>> -flto-partition=none  execution test
>> It is a flaky test case in our testing environment since all other executions
>> with different compiler options failed. All testing of tls-test.c pass native
>> power64 testing.
>>
>> thanks
>> Carrot
>>
>> 2013-05-02  Guozhi Wei  
>>
>> * coverage.c (gcov_info_type): New global variable.
>> (coverage_obj_init): Move the construction of gcov constructor to
>> (build_init_ctor): here.
>> (coverage_obj_finish): Call build_init_ctor after initialization of
>> gcov_info_var.
>>
>>
>> Index: coverage.c
>> ===
>> --- coverage.c (revision 198425)
>> +++ coverage.c (working copy)
>> @@ -123,6 +123,7 @@
>>
>>  /* Coverage info VAR_DECL and function info type nodes.  */
>>  static GTY(()) tree gcov_info_var;
>> +static GTY(()) tree gcov_info_type;
>>  static GTY(()) tree gcov_fn_info_type;
>>  static GTY(()) tree gcov_fn_info_ptr_type;
>>
>> @@ -2478,14 +2479,12 @@
>>return build_constructor (info_type, v1);
>>  }
>>
>> -/* Create the gcov_info types and object.  Generate the constructor
>> -   function to call __gcov_init.  Does not generate the initializer
>> +/* Create the gcov_info types and obj

Re: [Patch][google/gcc-4_8] Backport trunk@198344 into google/gcc-4_8

2013-05-07 Thread Carrot Wei
OK for google branch. Should it also be in gcc4.8 branch?

thanks
Carrot

On Tue, May 7, 2013 at 12:01 PM, Han Shen(沈涵)  wrote:
> Backport trunk@198344 - another fix to PR rtl-optimization/56847 - to
> google/gcc-4_8 branch.
>
> The first fix was trunk@198101 -
> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01152.html - which was
> backported to google/gcc-4_8 as gcc-4_8@198315
>
> Unfortunately, it resulted in some libstdc++ test failures, so another
> fix was submitted as trunk@198344 -
> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01632.html
>
> Ok for backporting trunk@198344 to google/gcc-4_8?
>
> -Han


[GOOGLE] Check conditions before calling varpool_node

2013-05-09 Thread Carrot Wei
This patch fixed google bug entry 6124850.

The usage of varpool_node has some restrictions on the corresponding var decl.
In LIPO mode function notice_global_symbol may call varpool_node with a decl
that doesn't satisfy these restrictions since the function notice_global_symbol
can be directly or indirectly called from anywhere. So we need to check if
a decl can be safely passed into varpoo_node before calling it.

Tested by ./buildit with targets x86-64 and power64 without regression.

OK for google branches?

thanks
Carrot


2013-05-09  Guozhi Wei  

varasm.c (notice_global_symbol): Check conditions before calling
varpool_node.


Index: varasm.c
===
--- varasm.c (revision 198726)
+++ varasm.c (working copy)
@@ -1515,13 +1515,29 @@
   || !MEM_P (DECL_RTL (decl)))
 return;

-  if (L_IPO_COMP_MODE
-  && ((TREE_CODE (decl) == FUNCTION_DECL
-   && cgraph_is_auxiliary (decl))
-  || (TREE_CODE (decl) == VAR_DECL
-  && varpool_is_auxiliary (varpool_node (decl)
-return;
+  if (L_IPO_COMP_MODE)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL && cgraph_is_auxiliary (decl))
+ return;

+  if (TREE_CODE (decl) == VAR_DECL)
+ {
+  /* Varpool_node can only accept var decl with flags
+ (TREE_STATIC (decl) || DECL_EXTERNAL (decl))
+ For decl without these flags, we need to
+ check if it is auxiliary manually.  */
+  if (!(TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
+{
+  /* If a new varpool_node can be created,
+ the module id is current_module_id.  */
+  if (current_module_id != primary_module_id)
+ return;
+}
+  else if (varpool_is_auxiliary (varpool_node (decl)))
+return;
+ }
+}
+
   /* We win when global object is found, but it is useful to know about weak
  symbol as well so we can produce nicer unique names.  */
   if (DECL_WEAK (decl) || DECL_ONE_ONLY (decl) || flag_shlib)


Re: [GOOGLE] Check conditions before calling varpool_node

2013-05-09 Thread Carrot Wei
On Thu, May 9, 2013 at 12:53 PM, Xinliang David Li  wrote:
> This is not correct. current_module_id is used only in FE parsing.
>
Suppose the var decl has correct flags and varpool_node can accept it,
a new varpool_node will be created for it, the module_id for the new node
is set to current_module_id. And in function varpool_is_auxiliary the new
node's module_id is compared with primary_module_id. So this code
exactly simulate the behavior of varpool_is_auxiliary (varpool_node (decl)).

> The real question is why the decl is created, neither static nor external?
>
The decl is created in function dw2_output_indirect_constant_1,
it has the following contents

(gdb) p debug_tree (decl)
 >
public unsigned type_6 DI
size 
unit size 
align 64 symtab 0 alias set 3 canonical type 0xf60907e0
pointer_to_this  reference_to_this
>
readonly asm_written public unsigned ignored weak DI file (null)
line 0 col 0 size  unit size 
align 64 initial 
(mem/f/c:DI (symbol_ref/i:DI ("DW.ref.__gxx_personality_v0")
[flags 0x2] ) [3
DW.ref.__gxx_personality_v0+0 S8 A64])>

Function dw2_output_indirect_constant_1 creates a new decl with property of
either PUBLIC or STATIC. Is a PUBLIC but not STATIC var decl legal?

The call chain of this failure is
dw2_output_indirect_constant_1 -> assemble_variable ->
notice_global_symbol -> varpool_node

The last call notice_global_symbol -> varpool_node is added by lipo, before that
these functions can't call into varpool_node. So could it because the original
implementation of these functions didn't consider the restrictions of
varpool_node
since it couldn't be called from there?

thanks
Carrot

> David
>
> On Thu, May 9, 2013 at 11:39 AM, Carrot Wei  wrote:
>> This patch fixed google bug entry 6124850.
>>
>> The usage of varpool_node has some restrictions on the corresponding var 
>> decl.
>> In LIPO mode function notice_global_symbol may call varpool_node with a decl
>> that doesn't satisfy these restrictions since the function 
>> notice_global_symbol
>> can be directly or indirectly called from anywhere. So we need to check if
>> a decl can be safely passed into varpoo_node before calling it.
>>
>> Tested by ./buildit with targets x86-64 and power64 without regression.
>>
>> OK for google branches?
>>
>> thanks
>> Carrot
>>
>>
>> 2013-05-09  Guozhi Wei  
>>
>> varasm.c (notice_global_symbol): Check conditions before calling
>> varpool_node.
>>
>>
>> Index: varasm.c
>> ===
>> --- varasm.c (revision 198726)
>> +++ varasm.c (working copy)
>> @@ -1515,13 +1515,29 @@
>>|| !MEM_P (DECL_RTL (decl)))
>>  return;
>>
>> -  if (L_IPO_COMP_MODE
>> -  && ((TREE_CODE (decl) == FUNCTION_DECL
>> -   && cgraph_is_auxiliary (decl))
>> -  || (TREE_CODE (decl) == VAR_DECL
>> -  && varpool_is_auxiliary (varpool_node (decl)
>> -return;
>> +  if (L_IPO_COMP_MODE)
>> +{
>> +  if (TREE_CODE (decl) == FUNCTION_DECL && cgraph_is_auxiliary (decl))
>> + return;
>>
>> +  if (TREE_CODE (decl) == VAR_DECL)
>> + {
>> +  /* Varpool_node can only accept var decl with flags
>> + (TREE_STATIC (decl) || DECL_EXTERNAL (decl))
>> + For decl without these flags, we need to
>> + check if it is auxiliary manually.  */
>> +  if (!(TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
>> +{
>> +  /* If a new varpool_node can be created,
>> + the module id is current_module_id.  */
>> +  if (current_module_id != primary_module_id)
>> + return;
>> +}
>> +  else if (varpool_is_auxiliary (varpool_node (decl)))
>> +return;
>> + }
>> +}
>> +
>>/* We win when global object is found, but it is useful to know about weak
>>   symbol as well so we can produce nicer unique names.  */
>>if (DECL_WEAK (decl) || DECL_ONE_ONLY (decl) || flag_shlib)


[RFC] Make the new var decl STATIC in function dw2_output_indirect_constant_1

2013-05-10 Thread Carrot Wei
Hi

In function dw2_output_indirect_constant_1 a new var decl is created. Only
When the variable is not PUBLIC it is allocated static storage. Does anybody
know why the variable is not allocated static storage by marking TREE_STATIC
when it is PUBLIC?

The following patch marks the STATIC flag in all cases. It can pass bootstrap
and regression test on x86-64.

Any comments?

thanks
Carrot


2013-05-09  Guozhi Wei  

* dwarf2asm.c (dw2_output_indirect_constant_1): Mark new decl STATIC.


Index: dwarf2asm.c
===
--- dwarf2asm.c (revision 198794)
+++ dwarf2asm.c (working copy)
@@ -906,6 +906,7 @@
   DECL_IGNORED_P (decl) = 1;
   DECL_INITIAL (decl) = decl;
   TREE_READONLY (decl) = 1;
+  TREE_STATIC (decl) = 1;

   if (TREE_PUBLIC (id))
 {
@@ -914,8 +915,6 @@
   if (USE_LINKONCE_INDIRECT)
  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
 }
-  else
-TREE_STATIC (decl) = 1;

   sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym);
   assemble_variable (decl, 1, 1, 1);


Re: [Patch][google/gcc-4_8] Backport trunk@198547 for pr target/56732

2013-05-13 Thread Carrot Wei
OK for google branches.

On Thu, May 9, 2013 at 1:40 PM, Han Shen(沈涵)  wrote:
> Hi, I'm to backport trunk patch @198547 for pr target/56732 to google
> branch google/gcc-4_8.
>
> This patch fixes arm ICE.
>
> Ok for google/gcc-4_8?
>
> [patch attached]
>
> H.


[GOOGLE] Remove records in powerpc64-grtev3-linux-gnu.xfail

2013-05-29 Thread Carrot Wei
Hi

Since b/8397853 has been fixed, the related tests now passed, so we can remove
them from powerpc64-grtev3-linux-gnu.xfail now.

Tested with ./buildit --run_tests.

OK for google 4.7 branch?

thanks
Carrot


2013-05-29  Guozhi Wei  

* powerpc64-grtev3-linux-gnu.xfail (*** g++): Remove passed records.



Index: powerpc64-grtev3-linux-gnu.xfail
===
--- powerpc64-grtev3-linux-gnu.xfail (revision 199411)
+++ powerpc64-grtev3-linux-gnu.xfail (working copy)
@@ -131,19 +131,8 @@
 FAIL: g++.dg/ext/cleanup-9.C -std=gnu++11 execution test
 FAIL: g++.dg/warn/Wself-assign-2.C -std=gnu++11  (test for warnings, line 12)
 FAIL: g++.dg/tree-prof/lipo/vcall1_0.C scan-ipa-dump-times profile
"Indirect call -> direct call" 2
-# b/8397853, a LIPO bug, causing compilation to fail.
-FAIL: g++.dg/tree-prof/mversn15.C compilation,  -fprofile-generate
-UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate
-#FAIL: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate
-UNRESOLVED: g++.dg/tree-prof/mversn15.C compilation,  -fprofile-use
-UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-use
 FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized "return 0"
 FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized "main_clone"
-FAIL: g++.dg/tree-prof/mversn15a.C compilation,  -fprofile-generate
-UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate
-#FAIL: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate
-UNRESOLVED: g++.dg/tree-prof/mversn15a.C compilation,  -fprofile-use
-UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-use

 # Fortran failures are not important to us so far.
 *** gfortran:


Re: [google/gcc-4_7]Add new validator file for native ppc toolchain

2013-06-05 Thread Carrot Wei
OK for google/gcc-4_7.

On Wed, Jun 5, 2013 at 2:45 PM, Jing Yu  wrote:
> Add new validator manifest xfail file for native powerpc64 toolchain.
> Ok for google/gcc-4_7?
>
> Tested:
> ./validate_failures.py
> --manifest=powerpc64-grtev3-linux-gnu-native.xfail --
> results="gcc/gcc.sum g++/g++.sum gfortran/gfortran.sum"
>
> 2013-06-05
>
>  * powerpc64-grtev3-linux-gnu-native.xfail: New.


[Patch AArch64] Fix extended register width

2014-09-22 Thread Carrot Wei
Hi

The extended register width in add/adds/sub/subs/cmp instructions is
not always the same as target register, it depends on both target
register width and extension type. But in current implementation the
extended register width is always the same as target register. We have
noticed it can generate following wrong assembler code when compiled
an internal application,

add x2, x20, x0, sxtw 3

The correct assembler should be

add x2, x20, w0, sxtw 3

On the other hand I noticed current gcc can only generate following
extension types: xtb, xth, xtw. In these cases the extended register
width can only be 'w'. So this patch changes the the extended register
size attribute to 'w'.

Passed regression tests on qemu without failure.
OK for trunk and 4.9 branch?

thanks
Guozhi Wei


2014-09-22  Guozhi Wei  

* config/aarch64/aarch64.md (*adds__):
Change the extended register width to w.
(*subs__): Likewise.
(*adds__multp2): Likewise.
(*subs__multp2): Likewise.
(*add__): Likewise.
(*add__shft_): Likewise.
(*add__mult_): Likewise.
(*add__multp2): Likewise.
(*add_uxt_multp2): Likewise.
(*sub__): Likewise.
(*sub__shft_): Likewise.
(*sub__multp2): Likewise.
(*sub_uxt_multp2): Likewise.
(*cmp_swp__reg): Likewise.
(*cmp_swp__shft_): Likewise.


2014-09-22  Guozhi Wei  

* gcc.target/aarch64/subs3.c: Change the extended register width to w.
* gcc.target/aarch64/adds3.c: Likewise.
* gcc.target/aarch64/cmp.c: Likewise.


patch1
Description: Binary data


patch2
Description: Binary data


Re: [Patch AArch64] Fix extended register width

2014-09-29 Thread Carrot Wei
Ping.

On Mon, Sep 22, 2014 at 11:41 AM, Carrot Wei  wrote:
> Hi
>
> The extended register width in add/adds/sub/subs/cmp instructions is
> not always the same as target register, it depends on both target
> register width and extension type. But in current implementation the
> extended register width is always the same as target register. We have
> noticed it can generate following wrong assembler code when compiled
> an internal application,
>
> add x2, x20, x0, sxtw 3
>
> The correct assembler should be
>
> add x2, x20, w0, sxtw 3
>
> On the other hand I noticed current gcc can only generate following
> extension types: xtb, xth, xtw. In these cases the extended register
> width can only be 'w'. So this patch changes the the extended register
> size attribute to 'w'.
>
> Passed regression tests on qemu without failure.
> OK for trunk and 4.9 branch?
>
> thanks
> Guozhi Wei
>
>
> 2014-09-22  Guozhi Wei  
>
> * config/aarch64/aarch64.md (*adds__):
> Change the extended register width to w.
> (*subs__): Likewise.
> (*adds__multp2): Likewise.
> (*subs__multp2): Likewise.
> (*add__): Likewise.
> (*add__shft_): Likewise.
> (*add__mult_): Likewise.
> (*add__multp2): Likewise.
> (*add_uxt_multp2): Likewise.
> (*sub__): Likewise.
> (*sub__shft_): Likewise.
> (*sub__multp2): Likewise.
> (*sub_uxt_multp2): Likewise.
> (*cmp_swp__reg): Likewise.
> (*cmp_swp__shft_): Likewise.
>
>
> 2014-09-22  Guozhi Wei  
>
> * gcc.target/aarch64/subs3.c: Change the extended register width to w.
> * gcc.target/aarch64/adds3.c: Likewise.
> * gcc.target/aarch64/cmp.c: Likewise.


[PATCH, AArch64] Fix for PR61202

2014-05-19 Thread Carrot Wei
Hi

The last operand of instruction sqdmulh can only be low fp registers,
so we should use constraint "x". But the intrinsic functions use "w".
This patch fixed the constrains in these intrinsics.

Passed dejagnu test on aarch64 qemu. OK for trunk, 4.9 and 4.8?

thanks
Guozhi Wei


2014-05-19  Guozhi Wei  

* config/aarch64/arm_neon.h (vqdmulh_n_s16): Change
the last operand's constraint.
(vqdmulh_n_s32): Likewise.
(vqdmulhq_n_s16): Likewise.
(vqdmulhq_n_s32): Likewise.


patch
Description: Binary data


Re: [PATCH, AArch64] Fix for PR61202

2014-05-20 Thread Carrot Wei
Hi James

Thank you for pointing this out. In the new patch I removed the
modification of vqdmulh_n_s32 and vqdmulhq_n_s32.

Passed dejagnu testing on aarch64 qemu again. OK for trunk, 4.9 and 4.8?

2014-05-20  Guozhi Wei  

* config/aarch64/arm_neon.h (vqdmulh_n_s16): Change
the last operand's constraint.
(vqdmulhq_n_s16): Likewise.

On Mon, May 19, 2014 at 11:50 PM, James Greenhalgh
 wrote:
> On Tue, May 20, 2014 at 07:18:40AM +0100, Carrot Wei wrote:
>> Hi
>
> Hi,
>
>> The last operand of instruction sqdmulh can only be low fp registers,
>> so we should use constraint "x". But the intrinsic functions use "w".
>> This patch fixed the constrains in these intrinsics.
>
> This restriction is only on the _s16 variants of the intrinsics. From the
> ARMv8 Architecture Reference Manual:
>
>Is the name of the second SIMD&FP source register,
>   [...]
>   Restricted to V0-V15 when element size  is H.
>
> The patch is correct (though I can't approve it) for vqdmulh_n_s16 and
> vqdmulhq_n_s16, but the two hunks for vqdmulh_n_s32 and vqdmulhq_n_s32 should
> be dropped as they are too restrictive.
>
> Thanks,
> James
>
>> 2014-05-19  Guozhi Wei  
>>
>> * config/aarch64/arm_neon.h (vqdmulh_n_s16): Change
>> the last operand's constraint.
>> (vqdmulh_n_s32): Likewise.
>> (vqdmulhq_n_s16): Likewise.
>> (vqdmulhq_n_s32): Likewise.
>


patch
Description: Binary data


Re: [PATCH, AArch64] Fix for PR61202

2014-05-21 Thread Carrot Wei
Committed to trunk, 4.9, and waiting for the release of 4.8.3.

OK for google/main and google/4.9?

thanks
Carrot

On Wed, May 21, 2014 at 1:34 AM, Richard Biener  wrote:
> On Wed, 21 May 2014, Marcus Shawcroft wrote:
>
>> On 21 May 2014 09:28, Marcus Shawcroft  wrote:
>> > On 20 May 2014 18:37, Carrot Wei  wrote:
>> >> Hi James
>> >>
>> >> Thank you for pointing this out. In the new patch I removed the
>> >> modification of vqdmulh_n_s32 and vqdmulhq_n_s32.
>> >>
>> >> Passed dejagnu testing on aarch64 qemu again. OK for trunk, 4.9 and 4.8?
>> >>
>> >> 2014-05-20  Guozhi Wei  
>> >>
>> >> * config/aarch64/arm_neon.h (vqdmulh_n_s16): Change
>> >> the last operand's constraint.
>> >> (vqdmulhq_n_s16): Likewise.
>> >
>> > Thank you.  This is OK to commit on trunk, 4.9 and 4.8
>> > /Marcus
>>
>> Actually, I've jumped the gun by saying OK for 4.8. Please hold off on
>> the 4.8 backport until one of the release maintainers says its OK.
>> Richie can we take this now or do you want us to hold off?
>
> Please wait for 4.8.3 to be released for non-regression fixes or
> regression fixes that are not against a working 4.8.2.
>
> Thanks,
> Richard.


[PATCH] Fix a typo in sparseset_pop

2014-02-23 Thread Carrot Wei
Hi

The following patch fixes an obvious wrong index used to access the
dense array. The patch has passed the bootstrap and regression tests
on x86-64.

OK for trunk?

thanks
Carrot


2014-02-23  Guozhi Wei  

* sparseset.h (sparseset_pop): Fix the wrong index.


Index: sparseset.h
===
--- sparseset.h (revision 208039)
+++ sparseset.h (working copy)
@@ -177,7 +177,7 @@
   gcc_checking_assert (mem != 0);

   s->members = mem - 1;
-  return s->dense[mem];
+  return s->dense[s->members];
 }

 static inline void


Re: [google/gcc-4_7] Backport arm hardfp patch from trunk

2012-08-14 Thread Carrot Wei
OK for google/gcc-4_7.

thanks
Carrot

On Tue, Aug 14, 2012 at 7:14 AM, Han Shen(沈涵)  wrote:
>
> Hi Carrot, could you take a look at this patch? Thanks!
>
> The modification is in upstream trunk patch revision - 186859.
>
> The same patch has been back ported to google/gcc-4_6
> (http://codereview.appspot.com/6206055/), this is to apply on
> google/gcc-4_7
>
> Regards,
> -Han
>
> 2012-08-13  Han Shen  
>
> Backport from mainline.
> 2012-05-01  Richard Earnshaw  
>
> * arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_DEFAULT): Avoid ifdef
> comparing enumeration values.  Update comments.
>
> 2012-04-26  Michael Hope  
> Richard Earnshaw  
>
> * config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT):
> Define.
> (GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define.
> (GLIBC_DYNAMIC_LINKER_DEFAULT): Define.
> (GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path.
>
> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
> index c0cfde3..142054f 100644
> --- a/gcc/config/arm/linux-eabi.h
> +++ b/gcc/config/arm/linux-eabi.h
> @@ -32,7 +32,8 @@
>while (false)
>
>  /* We default to a soft-float ABI so that binaries can run on all
> -   target hardware.  */
> +   target hardware.  If you override this to use the hard-float ABI then
> +   change the setting of GLIBC_DYNAMIC_LINKER_DEFAULT as well.  */
>  #undef  TARGET_DEFAULT_FLOAT_ABI
>  #define TARGET_DEFAULT_FLOAT_ABI ARM_FLOAT_ABI_SOFT
>
> @@ -59,10 +60,25 @@
>  #undef  SUBTARGET_EXTRA_LINK_SPEC
>  #define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION
>
> -/* Use ld-linux.so.3 so that it will be possible to run "classic"
> -   GNU/Linux binaries on an EABI system.  */
> +/* GNU/Linux on ARM currently supports three dynamic linkers:
> +   - ld-linux.so.2 - for the legacy ABI
> +   - ld-linux.so.3 - for the EABI-derived soft-float ABI
> +   - ld-linux-armhf.so.3 - for the EABI-derived hard-float ABI.
> +   All the dynamic linkers live in /lib.
> +   We default to soft-float, but this can be overridden by changing both
> +   GLIBC_DYNAMIC_LINKER_DEFAULT and TARGET_DEFAULT_FLOAT_ABI.  */
> +
>  #undef  GLIBC_DYNAMIC_LINKER
> -#define GLIBC_DYNAMIC_LINKER RUNTIME_ROOT_PREFIX "/lib/ld-linux.so.3"
> +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT \
> +  RUNTIME_ROOT_PREFIX "/lib/ld-linux.so.3"
> +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT \
> +  RUNTIME_ROOT_PREFIX "/lib/ld-linux-armhf.so.3"
> +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT
> +
> +#define GLIBC_DYNAMIC_LINKER \
> +   "%{mfloat-abi=hard:" GLIBC_DYNAMIC_LINKER_HARD_FLOAT "} \
> +%{mfloat-abi=soft*:" GLIBC_DYNAMIC_LINKER_SOFT_FLOAT "} \
> +%{!mfloat-abi=*:" GLIBC_DYNAMIC_LINKER_DEFAULT "}"
>
>  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
> use the GNU/Linux version, not the generic BPABI version.  */


Re: [PATCH] Prevent cselib substitution of FP, SP, SFP

2012-09-12 Thread Carrot Wei
Hi Jakub

The same problem also affects gcc4.6,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398. Could this be
ported to 4.6 branch?

thanks
Carrot

On Mon, Feb 13, 2012 at 11:54 AM, Jakub Jelinek  wrote:
>
> On Wed, Jan 04, 2012 at 05:21:38PM +, Marcus Shawcroft wrote:
> > Alias analysis by DSE based on CSELIB expansion assumes that
> > references to the stack frame from different base registers (ie FP, SP)
> > never alias.
> >
> > The comment block in cselib explains that cselib does not allow
> > substitution of FP, SP or SFP specifically in order not to break DSE.
>
> Looks reasonable, appart from coding style (no spaces around -> and
> no {} around return p->loc;), I just wonder if having a separate
> loop in expand_loc just for this isn't too expensive.  On sane targets
> IMHO hard frame pointer in the prologue should be initialized from sp, not
> the other way around, thus hard frame pointer based VALUEs should have
> hard frame pointer earlier in the locs list (when there is
> hfp = sp (+ optionally some const)
> insn, we first cselib_lookup_from_insn the rhs and add to locs
> of the new VALUE (plus (VALUE of sp) (const_int)), then process the
> lhs and add it to locs, moving the plus to locs->next).
> So I think the following patch could be enough (bootstrapped/regtested
> on x86_64-linux and i686-linux).
> There is AVR though, which has really weirdo prologue - PR50063,
> but I think it should just use UNSPEC for that or something similar,
> setting sp from hfp seems unnecessary and especially for values with long
> locs chains could make cselib more expensive.
>
> Richard, what do you think about this?
>
> 2012-02-13  Jakub Jelinek  
>
> * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right
> away if seen.
>
> --- gcc/cselib.c.jj 2012-02-13 11:07:15.0 +0100
> +++ gcc/cselib.c2012-02-13 18:15:17.531776145 +0100
> @@ -1372,8 +1372,18 @@ expand_loc (struct elt_loc_list *p, stru
>unsigned int regno = UINT_MAX;
>struct elt_loc_list *p_in = p;
>
> -  for (; p; p = p -> next)
> +  for (; p; p = p->next)
>  {
> +  /* Return these right away to avoid returning stack pointer based
> +expressions for frame pointer and vice versa, which is something
> +that would confuse DSE.  See the comment in
> cselib_expand_value_rtx_1
> +for more details.  */
> +  if (REG_P (p->loc)
> + && (REGNO (p->loc) == STACK_POINTER_REGNUM
> + || REGNO (p->loc) == FRAME_POINTER_REGNUM
> + || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
> + || REGNO (p->loc) == cfa_base_preserved_regno))
> +   return p->loc;
>/* Avoid infinite recursion trying to expand a reg into a
>  the same reg.  */
>if ((REG_P (p->loc))
>
>
> Jakub


Re: [PATCH] Prevent cselib substitution of FP, SP, SFP

2012-09-14 Thread Carrot Wei
Hi Jakub

I have run it on 4.6, it passes the following testing:

x86-64 bootstrap
x86-64 regression test
regression test on arm qemu

Is it OK for gcc4.6?

Ahmad, is it OK for google/gcc-4_6/ and google/gcc-4_6-mobile ?

thanks
Carrot

On Wed, Sep 12, 2012 at 2:01 PM, Carrot Wei  wrote:
> Hi Jakub
>
> The same problem also affects gcc4.6,
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398. Could this be
> ported to 4.6 branch?
>
> thanks
> Carrot
>
> On Mon, Feb 13, 2012 at 11:54 AM, Jakub Jelinek  wrote:
>>
>> On Wed, Jan 04, 2012 at 05:21:38PM +, Marcus Shawcroft wrote:
>> > Alias analysis by DSE based on CSELIB expansion assumes that
>> > references to the stack frame from different base registers (ie FP, SP)
>> > never alias.
>> >
>> > The comment block in cselib explains that cselib does not allow
>> > substitution of FP, SP or SFP specifically in order not to break DSE.
>>
>> Looks reasonable, appart from coding style (no spaces around -> and
>> no {} around return p->loc;), I just wonder if having a separate
>> loop in expand_loc just for this isn't too expensive.  On sane targets
>> IMHO hard frame pointer in the prologue should be initialized from sp, not
>> the other way around, thus hard frame pointer based VALUEs should have
>> hard frame pointer earlier in the locs list (when there is
>> hfp = sp (+ optionally some const)
>> insn, we first cselib_lookup_from_insn the rhs and add to locs
>> of the new VALUE (plus (VALUE of sp) (const_int)), then process the
>> lhs and add it to locs, moving the plus to locs->next).
>> So I think the following patch could be enough (bootstrapped/regtested
>> on x86_64-linux and i686-linux).
>> There is AVR though, which has really weirdo prologue - PR50063,
>> but I think it should just use UNSPEC for that or something similar,
>> setting sp from hfp seems unnecessary and especially for values with long
>> locs chains could make cselib more expensive.
>>
>> Richard, what do you think about this?
>>
>> 2012-02-13  Jakub Jelinek  
>>
>> * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right
>> away if seen.
>>
>> --- gcc/cselib.c.jj 2012-02-13 11:07:15.0 +0100
>> +++ gcc/cselib.c2012-02-13 18:15:17.531776145 +0100
>> @@ -1372,8 +1372,18 @@ expand_loc (struct elt_loc_list *p, stru
>>unsigned int regno = UINT_MAX;
>>struct elt_loc_list *p_in = p;
>>
>> -  for (; p; p = p -> next)
>> +  for (; p; p = p->next)
>>  {
>> +  /* Return these right away to avoid returning stack pointer based
>> +expressions for frame pointer and vice versa, which is something
>> +that would confuse DSE.  See the comment in
>> cselib_expand_value_rtx_1
>> +for more details.  */
>> +  if (REG_P (p->loc)
>> + && (REGNO (p->loc) == STACK_POINTER_REGNUM
>> + || REGNO (p->loc) == FRAME_POINTER_REGNUM
>> + || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
>> + || REGNO (p->loc) == cfa_base_preserved_regno))
>> +   return p->loc;
>>/* Avoid infinite recursion trying to expand a reg into a
>>  the same reg.  */
>>if ((REG_P (p->loc))
>>
>>
>> Jakub


[PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant

2012-06-18 Thread Carrot Wei
Hi

Could ARM maintainers review following patches?

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html
64bit add/sub constants.

http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html
64bit and with constants.

http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html
64bit xor with constants.

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html
64bit ior with constants.

thanks
Carrot


Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant

2012-06-20 Thread Carrot Wei
Hi Michael

It seems the wiki page describes 64bit operations on NEON only. My
patches improves 64bit operations on core registers only. I touched
the neon patterns simply because those DI mode operations are enabled
separately according to the TARGET_NEON value, so in the neon patterns
I duplicated the alternatives in normal cases.

thanks
Carrot

On Wed, Jun 20, 2012 at 9:58 AM, Michael Hope  wrote:
> On 18 June 2012 22:17, Carrot Wei  wrote:
>> Hi
>>
>> Could ARM maintainers review following patches?
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html
>> 64bit add/sub constants.
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html
>> 64bit and with constants.
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html
>> 64bit xor with constants.
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html
>> 64bit ior with constants.
>
> Hi Carrot.  Out of interest, how do these interact with the 64 bit in
> NEON patches that Andrew has been doing?  They seem to touch many of
> the same patterns and I'm concerned that they'd cause GCC to prefer
> core registers instead of NEON, especially as the constant values you
> can use in a vmov are limited.
>
> There's a (in progress) summary of the current state for the standard
> C operators here:
>  https://wiki.linaro.org/MichaelHope/Sandbox/64BitOperations
>
> -- Michael


Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant

2012-06-25 Thread Carrot Wei
ping^2

thanks
Carrot

On Mon, Jun 18, 2012 at 6:17 PM, Carrot Wei  wrote:
> Hi
>
> Could ARM maintainers review following patches?
>
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html
> 64bit add/sub constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html
> 64bit and with constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html
> 64bit xor with constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html
> 64bit ior with constants.
>
> thanks
> Carrot


Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-28 Thread Carrot Wei
Hi Ramana

Thanks for the review, please see my inlined comments.

On Thu, Jun 28, 2012 at 12:02 AM, Ramana Radhakrishnan
 wrote:
>
> On 8 June 2012 10:12, Carrot Wei  wrote:
> > Hi
> >
> > In rtl expression, substract a constant c is expressed as add a value -c, 
> > so it
> > is alse processed by adddi3, and I extend it more to handle a subtraction of
> > 64bit constant. I created an insn pattern arm_subdi3_immediate to 
> > specifically
> > represent substraction with 64bit constant while continue keeping the add 
> > rtl
> > expression.
> >
>
> Sorry about the time it has taken to review this patch -Thanks for
> tackling this but I'm not convinced that this patch is correct and
> definitely can be more efficient.
>
> The range of valid 64 bit constants allowed would be in my opinion are
> the following- obtained by dividing the 64 bit constant into 2 32 bit
> halves (upper32 and lower32 referred to as upper and lower below)
>
>  arm_not_operand (upper) && arm_add_operand (lower) which boils down
> to the valid combination of
>
>  adds lo : adc hi - both positive constants.
>  adds lo ; sbc hi  - lower positive, upper negative
I assume you mean "sbc -hi" or "sbc abs(hi)", similar for following instructions

>
>  subs lo ; sbc hi - lower negative, upper negative
>  subs lo ; adc hi  - lower negative, upper positive
>
My first version did the similar thing, but in some cases subs and
adds may generate different carry flag. Assume the low word is 0 and
high word is negative, your method will generate

adds r0, r0, 0
sbc   r1, r1, abs(hi)

My method generates

subs r0, r0, 0
sbc   r1, r1, abs(hi)

ARM's definition of subs is

(result, carry, overflow) = AddWithCarry(R[n], NOT(imm32), ‘1’);

So the subs instruction will set carry flag, but adds clear carry
flag, and finally generate different result in r1.

>
> Therefore I'd do the following -
>
> * Don't make *arm_adddi3 a named pattern - we don't need that.
> * Change the *addsi3_carryin_ pattern to be something like this :
>
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -1001,12 +1001,14 @@
>  )
>
>  (define_insn "*addsi3_carryin_"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r")
> -       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
> -                         (match_operand:SI 2 "arm_rhs_operand" "rI"))
> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> +       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r,r"
> +                         (match_operand:SI 2 "arm_not_operand" "rI,K"

Do you mean arm_add_operand?

>                 (LTUGEU:SI (reg: CC_REGNUM) (const_int 0]
>   "TARGET_32BIT"
> -  "adc%?\\t%0, %1, %2"
> +  "@
> +  adc%?\\t%0, %1, %2
> +  sbc%?\\t%0, %1, %#n2"
>   [(set_attr "conds" "use")]
>  )
>
> * I'd like a new const_ok_for_dimode_op function that dealt with each
> of these operations, thus your plus operation with a DImode constant
> would just be a check similar to what I've said above.

Good idea, it will make the interface cleaner. I will do it later.

> * You then don't need the new subdi3_immediate pattern and the split
> can happen after reload. Adjust predicates and constraints
> accordingly, delete it. Also please use CONST_INT_P instead of

Even if I delete subdi3_immediate pattern, we still need the
predicates and constraints to represent the negative di numbers in
other patterns.

thanks
Carrot


Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-28 Thread Carrot Wei
On Thu, Jun 28, 2012 at 5:37 PM, Ramana Radhakrishnan
 wrote:
> On 28 June 2012 10:03, Carrot Wei  wrote:
>> Hi Ramana
>>
>> Thanks for the review, please see my inlined comments.
>>
>> On Thu, Jun 28, 2012 at 12:02 AM, Ramana Radhakrishnan
>>  wrote:
>>>
>>> On 8 June 2012 10:12, Carrot Wei  wrote:
>>> > Hi
>>> >
>>> > In rtl expression, substract a constant c is expressed as add a value -c, 
>>> > so it
>>> > is alse processed by adddi3, and I extend it more to handle a subtraction 
>>> > of
>>> > 64bit constant. I created an insn pattern arm_subdi3_immediate to 
>>> > specifically
>>> > represent substraction with 64bit constant while continue keeping the add 
>>> > rtl
>>> > expression.
>>> >
>>>
>>> Sorry about the time it has taken to review this patch -Thanks for
>>> tackling this but I'm not convinced that this patch is correct and
>>> definitely can be more efficient.
>>>
>>> The range of valid 64 bit constants allowed would be in my opinion are
>>> the following- obtained by dividing the 64 bit constant into 2 32 bit
>>> halves (upper32 and lower32 referred to as upper and lower below)
>>>
>>>  arm_not_operand (upper) && arm_add_operand (lower) which boils down
>>> to the valid combination of
>>>
>>>  adds lo : adc hi - both positive constants.
>>>  adds lo ; sbc hi  - lower positive, upper negative
>
>> I assume you mean "sbc -hi" or "sbc abs(hi)", similar for following 
>> instructions
>
> hi = ~upper32
>
> lower = lower 32 bits of the constant
> hi =  ~ (upper32 bits) of the constant ( bitwise twiddle not a negate :) )
>
> For e.g.
>
> unsigned long long foo4 (unsigned long long x)
> {
>  return x - 0x25ULL;
> }
>
> should be
> subs r0, r0, #37
> sbc   r1, r1, #0
>
> Notice that it's #0 and not 1 . :)
>
>
>
>>
>>>
>>>  subs lo ; sbc hi - lower negative, upper negative
>>>  subs lo ; adc hi  - lower negative, upper positive
>>>

Thank you for the detailed explanation. So the four cases should be

 adds lo : adc hi - both positive constants.
 adds lo ; sbc ~hi  - lower positive, upper negative
 subs -lo ; sbc ~hi - lower negative, upper negative
 subs -lo ; adc hi  - lower negative, upper positive


>> My first version did the similar thing, but in some cases subs and
>> adds may generate different carry flag. Assume the low word is 0 and
>> high word is negative, your method will generate
>>
>> adds r0, r0, 0
>> sbc   r1, r1, abs(hi)
>
> No it will generate
>
> adds r0, r0, #0
> sbc    r1, r1, ~hi
>
> and not abs (hi)
>
>
>
>>
>> My method generates
>>
>> subs r0, r0, 0
>> sbc   r1, r1, abs(hi)
>>
>> ARM's definition of subs is
>>
>> (result, carry, overflow) = AddWithCarry(R[n], NOT(imm32), ‘1’);
>>
>> So the subs instruction will set carry flag, but adds clear carry
>> flag, and finally generate different result in r1.
>>
>>>
>>> Therefore I'd do the following -
>>>
>>> * Don't make *arm_adddi3 a named pattern - we don't need that.
>>> * Change the *addsi3_carryin_ pattern to be something like this :
>>>
>>> --- a/gcc/config/arm/arm.md
>>> +++ b/gcc/config/arm/arm.md
>>> @@ -1001,12 +1001,14 @@
>>>  )
>>>
>>>  (define_insn "*addsi3_carryin_"
>>> -  [(set (match_operand:SI 0 "s_register_operand" "=r")
>>> -       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
>>> -                         (match_operand:SI 2 "arm_rhs_operand" "rI"))
>>> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
>>> +       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r,r"
>>> +                         (match_operand:SI 2 "arm_not_operand" "rI,K"
>>
>> Do you mean arm_add_operand?
>
> No I mean arm_not_operand and it was a deliberate choice as explained above.
>
>>
>>>                 (LTUGEU:SI (reg: CC_REGNUM) (const_int 0]
>>>   "TARGET_32BIT"
>>> -  "adc%?\\t%0, %1, %2"
>>> +  "@
>>> +  adc%?\\t%0, %1, %2
>>> +  sbc%?\\t%0, %1, %#n2"

Since constraint "K" is logical not, not negative, should the last
line be following?

+  sbc%?\\t%0, %1, #%B2"

thanks
Carrot


Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-29 Thread Carrot Wei
Hi

So the following is updated patch. Tested on qemu with arm/thumb modes
without regression.

thanks
Carrot


2012-06-29  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-1.c: New testcase.
* gcc.target/arm/pr53447-2.c: New testcase.


2012-06-29  Wei Guozhi  

PR target/53447
* config/arm/arm-protos.h (const_ok_for_dimode_op): New prototype.
* config/arm/arm.c (const_ok_for_dimode_op): New function.
* config/arm/constraints.md (Dd): New constraint.
* config/arm/predicates.md (arm_adddi_operand): New predicate.
* config/arm/arm.md (adddi3): Extend it to handle constants.
(arm_adddi3): Likewise.
(addsi3_carryin_): Extend it to handle sbc case.
* config/arm/neon.md (adddi3_neon): Extend it to handle constants.


Index: testsuite/gcc.target/arm/pr53447-1.c
===
--- testsuite/gcc.target/arm/pr53447-1.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-1.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p += 0x10001;
+}
Index: testsuite/gcc.target/arm/pr53447-2.c
===
--- testsuite/gcc.target/arm/pr53447-2.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-2.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p -= 0x10008;
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 187751)
+++ config/arm/arm.c(working copy)
@@ -2497,6 +2497,28 @@
 }
 }

+/* Return true if I is a valid di mode constant for the operation CODE.  */
+int
+const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code code)
+{
+  HOST_WIDE_INT hi_val = (i >> 32) & 0x;
+  HOST_WIDE_INT lo_val = i & 0x;
+  rtx hi = GEN_INT (hi_val);
+  rtx lo = GEN_INT (lo_val);
+
+  if (TARGET_THUMB1)
+return 0;
+
+  switch (code)
+{
+case PLUS:
+  return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);
+
+default:
+  return 0;
+}
+}
+
 /* Emit a sequence of insns to handle a large constant.
CODE is the code of the operation required, it can be any of SET, PLUS,
IOR, AND, XOR, MINUS;
Index: config/arm/arm-protos.h
===
--- config/arm/arm-protos.h (revision 187751)
+++ config/arm/arm-protos.h (working copy)
@@ -49,6 +49,7 @@
 extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
+extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
   HOST_WIDE_INT, rtx, rtx, int);
 extern RTX_CODE arm_canonicalize_comparison (RTX_CODE, rtx *, rtx *);
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187751)
+++ config/arm/neon.md  (working copy)
@@ -588,9 +588,9 @@
 )

 (define_insn "adddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w")
-(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w")
- (match_operand:DI 2 "s_register_operand" "w,r,0,w")))
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
+(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
+ (match_operand:DI 2 "arm_adddi_operand"
"w,r,0,w,r,Dd,Dd")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_NEON"
 {
@@ -600,13 +600,16 @@
 case 3: return "vadd.i64\t%P0, %P1, %P2";
 case 1: return "#";
 case 2: return "#";
+case 4: return "#";
+case 5: return "#";
+case 6: return "#";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "conds" "*,clob,clob,*")
-   (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
+   (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
+   (set_attr "length" "*,8,8,*,8,8,8")
+   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
 )

 (define_insn "*sub3_neon"
Index: config/arm/constraints.md
===
--- config/arm/constraints.md   (revision 187751)
+++ config/arm/constraints.md   (working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, D

Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-07-01 Thread Carrot Wei
On Fri, Jun 29, 2012 at 9:57 PM, Ramana Radhakrishnan
 wrote:
>
> On 29 June 2012 12:23, Carrot Wei  wrote:
> > Hi
> >
> > So the following is updated patch. Tested on qemu with arm/thumb modes
>
> Assuming this testing was with and without neon ? Because the patterns
> changed are different whether you use Neon or not.
>
Now the patch has been tested with all combination of arm/thumb
neon/non-neon modes.

> > without regression.
>
> Can you add some tests for all 4 cases ? See comments inline below for
> some changes ?
>
New test cases added.

> Ok with those changes if no regressions for above mentioned testing.
>
> > Index: config/arm/arm.md
> > ===
> > --- config/arm/arm.md   (revision 187751)
> > +++ config/arm/arm.md   (working copy)
> > @@ -574,7 +574,7 @@
> >  [(parallel
> >    [(set (match_operand:DI           0 "s_register_operand" "")
> >          (plus:DI (match_operand:DI 1 "s_register_operand" "")
> > -                  (match_operand:DI 2 "s_register_operand" "")))
> > +                  (match_operand:DI 2 "arm_adddi_operand"  "")))
> >     (clobber (reg:CC CC_REGNUM))])]
> >   "TARGET_EITHER"
> >   "
> > @@ -609,10 +609,10 @@
> >   [(set_attr "length" "4")]
> >  )
> >
> > -(define_insn_and_split "*arm_adddi3"
> > -  [(set (match_operand:DI          0 "s_register_operand" "=&r,&r")
> > -       (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0")
> > -                (match_operand:DI 2 "s_register_operand" "r,  0")))
> > +(define_insn_and_split "arm_adddi3"
> > +  [(set (match_operand:DI          0 "s_register_operand" 
> > "=&r,&r,&r,&r,&r")
> > +       (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r")
> > +                (match_operand:DI 2 "arm_adddi_operand"  "r,  0, r, Dd, 
> > Dd")))
> >    (clobber (reg:CC CC_REGNUM))]
> >   "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_MAVERICK) && !TARGET_NEON"
> >   "#"
> > @@ -630,8 +630,17 @@
> >     operands[0] = gen_lowpart (SImode, operands[0]);
> >     operands[4] = gen_highpart (SImode, operands[1]);
> >     operands[1] = gen_lowpart (SImode, operands[1]);
> > -    operands[5] = gen_highpart (SImode, operands[2]);
> > -    operands[2] = gen_lowpart (SImode, operands[2]);
> > +    if (GET_CODE (operands[2]) == CONST_INT)
> > +      {
> > +       HOST_WIDE_INT v = INTVAL (operands[2]);
> > +       operands[5] = GEN_INT (ARM_SIGN_EXTEND ((v >> 32) & 0x));
> > +       operands[2] = GEN_INT (ARM_SIGN_EXTEND (v & 0x));
> > +      }
> > +    else
> > +      {
> > +       operands[5] = gen_highpart (SImode, operands[2]);
> > +       operands[2] = gen_lowpart (SImode, operands[2]);
> > +      }
>
> Instead
>
>
>  operands[5] = gen_highpart_mode (SImode, DImode, operands[2]);
>  operands[2] = gen_lowpart (SImode, operands[2]);
>
> So you don't need a check there.
>
A good method.
>
>
> >   }"
> >   [(set_attr "conds" "clob")
> >    (set_attr "length" "8")]
> > @@ -980,12 +989,14 @@
> >  )
> >
> >  (define_insn "*addsi3_carryin_"
> > -  [(set (match_operand:SI 0 "s_register_operand" "=r")
> > -       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
> > -                         (match_operand:SI 2 "arm_rhs_operand" "rI"))
> > +  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> > +       (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r,r")
> > +                         (match_operand:SI 2 "arm_not_operand" "rI,K"))
> >                 (LTUGEU:SI (reg: CC_REGNUM) (const_int 0]
> >   "TARGET_32BIT"
> > -  "adc%?\\t%0, %1, %2"
> > +  "@
> > +   adc%?\\t%0, %1, %2
> > +   sbc%?\\t%0, %1, #%B2"
> >   [(set_attr "conds" "use")]
> >  )
>
> Any reason why you didn't consider making these changes to the
> *addsi3_carryin_alt2 pattern ?
>
It's simply because of my ignorance.

thanks
Carrot

The actually committed patch is as following


2012-0

Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant

2012-07-03 Thread Carrot Wei
On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
 wrote:
> On 28 May 2012 11:08, Carrot Wei  wrote:
>> Hi
>>
>> This is the second part of the patches that deals with 64bit and. It directly
>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>> constant operands.
>>
>
> Comments about const_di_ok_for_op still apply from my review of your add 
> patch.
>
> However I don't see why and /ior / xor with constants that have either
> the low or high parts set can't be expanded directly into ands of
> subregs with moves of zero's or the original value especially if you
> aren't looking at doing 64 bit operations in neon .With Neon being
> used for 64 bit arithmetic it gets more interesting.
>
> Finally this should target PR target/53189.
>

Hi Ramana

Thanks for the review. Following is the updated patch according to
your comments.

Tested on arm qemu with all arm/thumb neon/non-neon mode combination
without regression.

thanks
Carrot


2012-07-03  Wei Guozhi  

PR target/53189
* gcc.target/arm/pr53189-1.c: New testcase.
* gcc.target/arm/pr53189-2.c: New testcase.
* gcc.target/arm/pr53189-3.c: New testcase.


2012-07-03  Wei Guozhi  

PR target/53189
* config/arm/arm.c (const_ok_for_dimode_op): Handle AND op.
* config/arm/constraints.md (De): New constraint.
* config/arm/predicates.md (arm_anddi_operand): New predicate.
(arm_immediate_anddi_operand): Likewise.
(anddi_operand): Likewise.
* config/arm/arm.md (arm_andsi3_insn): Optimization for special
constants.
(anddi3): Extend it to handle 64bit constants.
(anddi3_insn): Likewise.
* config/arm/neon.md (anddi3_neon): Likewise.



Index: testsuite/gcc.target/arm/pr53189-2.c
===
--- testsuite/gcc.target/arm/pr53189-2.c(revision 0)
+++ testsuite/gcc.target/arm/pr53189-2.c(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-times "mov" 1 } } */
+/* { dg-final { scan-assembler-times "and" 1 } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x1;
+}
Index: testsuite/gcc.target/arm/pr53189-3.c
===
--- testsuite/gcc.target/arm/pr53189-3.c(revision 0)
+++ testsuite/gcc.target/arm/pr53189-3.c(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler-times "and" 1 } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x1;
+}
Index: testsuite/gcc.target/arm/pr53189-1.c
===
--- testsuite/gcc.target/arm/pr53189-1.c(revision 0)
+++ testsuite/gcc.target/arm/pr53189-1.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x10002;
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 189107)
+++ config/arm/arm.c(working copy)
@@ -2524,6 +2524,10 @@
 case PLUS:
   return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);

+case AND:
+  return ((const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val))
+ && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val)));
+
 default:
   return 0;
 }
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 189107)
+++ config/arm/neon.md  (working copy)
@@ -776,9 +776,9 @@
 )

 (define_insn "anddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w")
-(and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0")
-   (match_operand:DI 2 "neon_inv_logic_op2" "w,DL,r,r,w,DL")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w,?&r,?&r")
+(and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0,0,r")
+   (match_operand:DI 2 "anddi_operand" "w,DL,r,r,w,DL,De,De")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
@@ -790,12 +790,14 @@
 DImode, 1, VALID_NEON_QREG_MODE (DImode));
 case 2: return "#";
 case 3: return "#";
+case 6: return "#";
+case 7: return "#";
 default: gcc_unreachable ()

Re: [ARM Patch 2/3]PR53189: optimizations of 64bit logic operation with constant

2012-07-05 Thread Carrot Wei
, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dg, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -248,6 +248,12 @@
  (and (match_code "const_int")
   (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, PLUS)")))

+(define_constraint "Dg"
+ "@internal
+  In ARM/Thumb-2 state a const_int that can be used by insn xordi."
+ (and (match_code "const_int")
+  (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, XOR)")))
+
 (define_constraint "Di"
  "@internal
   In ARM/Thumb-2 state a const_int or const_double where both the high
Index: config/arm/predicates.md
===
--- config/arm/predicates.md(revision 189245)
+++ config/arm/predicates.md(working copy)
@@ -104,6 +104,14 @@
   (and (match_code "const_int,const_double")
(match_test "arm_const_double_by_immediates (op)")))

+(define_predicate "arm_immediate_xordi_operand"
+  (and (match_code "const_int")
+   (match_test "const_ok_for_dimode_op (INTVAL (op), XOR)")))
+
+(define_predicate "arm_xordi_operand"
+  (ior (match_operand 0 "arm_immediate_xordi_operand")
+   (match_operand 0 "s_register_operand")))
+
 (define_predicate "arm_neg_immediate_operand"
   (and (match_code "const_int")
(match_test "const_ok_for_arm (-INTVAL (op))")))
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 189245)
+++ config/arm/arm.md   (working copy)
@@ -2968,17 +2968,30 @@
 (define_expand "xordi3"
   [(set (match_operand:DI 0 "s_register_operand" "")
(xor:DI (match_operand:DI 1 "s_register_operand" "")
-   (match_operand:DI 2 "s_register_operand" "")))]
+   (match_operand:DI 2 "arm_xordi_operand" "")))]
   "TARGET_32BIT"
   ""
 )

-(define_insn "*xordi3_insn"
-  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
-   (xor:DI (match_operand:DI 1 "s_register_operand"  "%0,r")
-   (match_operand:DI 2 "s_register_operand"   "r,r")))]
+(define_insn_and_split "*xordi3_insn"
+  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r,&r,&r")
+   (xor:DI (match_operand:DI 1 "s_register_operand" "%0, r, 0, r")
+   (match_operand:DI 2 "arm_xordi_operand"  "r,  r, Dg,Dg")))]
   "TARGET_32BIT && !TARGET_IWMMXT && !TARGET_NEON"
   "#"
+  "TARGET_32BIT && !TARGET_IWMMXT && reload_completed
+   && !(TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 0) (xor:SI (match_dup 1) (match_dup 2)))
+   (set (match_dup 3) (xor:SI (match_dup 4) (match_dup 5)))]
+  "
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[5] = gen_highpart_mode (SImode, DImode, operands[2]);
+    operands[2] = gen_lowpart (SImode, operands[2]);
+  }"
   [(set_attr "length" "8")
(set_attr "predicable" "yes")]
 )
@@ -3040,9 +3053,27 @@
(xor:SI (match_operand:SI 1 "s_register_operand" "%r,r")
(match_operand:SI 2 "reg_or_int_operand" "rI,?n")))]
   "TARGET_32BIT"
-  "@
-   eor%?\\t%0, %1, %2
-   #"
+  "*
+  {
+if (CONST_INT_P (operands[2]))
+  {
+   HOST_WIDE_INT i = INTVAL (operands[2]) & 0x;
+   if (i == 0x)
+ return \"mvn%?\\t%0, %1\";
+   if (i == 0)
+ {
+   if (!rtx_equal_p (operands[0], operands[1]))
+ return \"mov%?\\t%0, %1\";
+   else
+ return \"\";
+ }
+  }
+
+if (which_alternative == 0)
+  return \"eor%?\\t%0, %1, %2\";
+else
+  return \"#\";
+  }"
   "TARGET_32BIT
&& GET_CODE (operands[2]) == CONST_INT
&& !const_ok_for_arm (INTVAL (operands[2]))"


On Wed, May 30, 2012 at 5:22 PM, Carrot Wei  wrote:
> Hi
>
> This is the third part of the patches that deals with 64bit xor. It extends
> the patterns xordi3, xordi3_insn and xordi3_neon to handle 64bit constant
> operands.
>
> Tested on arm qemu without regression.
>
> OK for trunk?
>
&g

Re: [ARM Patch 3/3]PR53189: optimizations of 64bit logic operation with constant

2012-07-06 Thread Carrot Wei
 and
Index: gcc/config/arm/constraints.md
===
--- gcc/config/arm/constraints.md   (revision 189278)
+++ gcc/config/arm/constraints.md   (working copy)
@@ -31,7 +31,7 @@
 ;; 'H' was previously used for FPA.

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Df, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -248,6 +248,12 @@
  (and (match_code "const_int")
   (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, PLUS)")))

+(define_constraint "Df"
+ "@internal
+  In ARM/Thumb-2 state a const_int that can be used by iordi3_insn."
+ (and (match_code "const_int")
+  (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, IOR)")))
+
 (define_constraint "Di"
  "@internal
   In ARM/Thumb-2 state a const_int or const_double where both the high
Index: gcc/config/arm/predicates.md
===
--- gcc/config/arm/predicates.md(revision 189278)
+++ gcc/config/arm/predicates.md(working copy)
@@ -104,6 +104,14 @@
   (and (match_code "const_int,const_double")
(match_test "arm_const_double_by_immediates (op)")))

+(define_predicate "arm_immediate_iordi_operand"
+  (and (match_code "const_int")
+   (match_test "const_ok_for_dimode_op (INTVAL (op), IOR)")))
+
+(define_predicate "arm_iordi_operand"
+  (ior (match_operand 0 "arm_immediate_iordi_operand")
+   (match_operand 0 "s_register_operand")))
+
 (define_predicate "arm_neg_immediate_operand"
   (and (match_code "const_int")
(match_test "const_ok_for_arm (-INTVAL (op))")))
@@ -520,6 +528,10 @@
   (ior (match_operand 0 "imm_for_neon_logic_operand")
(match_operand 0 "s_register_operand")))

+(define_predicate "iordi_operand"
+  (ior (match_operand 0 "neon_logic_op2")
+   (match_operand 0 "arm_iordi_operand")))
+
 (define_predicate "neon_inv_logic_op2"
   (ior (match_operand 0 "imm_for_neon_inv_logic_operand")
(match_operand 0 "s_register_operand")))
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 189278)
+++ gcc/config/arm/arm.md   (working copy)
@@ -2828,17 +2828,30 @@
 (define_expand "iordi3"
   [(set (match_operand:DI 0 "s_register_operand" "")
(ior:DI (match_operand:DI 1 "s_register_operand" "")
-   (match_operand:DI 2 "neon_logic_op2" "")))]
+   (match_operand:DI 2 "iordi_operand" "")))]
   "TARGET_32BIT"
   ""
 )

-(define_insn "*iordi3_insn"
-  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
-   (ior:DI (match_operand:DI 1 "s_register_operand"  "%0,r")
-   (match_operand:DI 2 "s_register_operand"   "r,r")))]
+(define_insn_and_split "*iordi3_insn"
+  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r,&r,&r")
+   (ior:DI (match_operand:DI 1 "s_register_operand"  "%0,r, 0, r")
+   (match_operand:DI 2 "arm_iordi_operand"   "r, r, Df,Df")))]
   "TARGET_32BIT && !TARGET_IWMMXT && !TARGET_NEON"
   "#"
+  "TARGET_32BIT && !TARGET_IWMMXT && reload_completed
+   && !(TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 0) (ior:SI (match_dup 1) (match_dup 2)))
+   (set (match_dup 3) (ior:SI (match_dup 4) (match_dup 5)))]
+  "
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[5] = gen_highpart_mode (SImode, DImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+  }"
   [(set_attr "length" "8")
(set_attr "predicable" "yes")]
 )
@@ -2902,10 +2915,29 @@
(ior:SI (match_operand:SI 1 "s_register_operand" "%r,r,r")
(match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))]
   "TARGET_32BIT"
-  "@
-   orr%?\\t%0, %1, %2
-   orn%?\\t%0, %1, #%B2
-   #"
+  "*
+  {
+if (CONST_INT_P 

[Ping, ARM]PR53189: optimizations of 64bit logic operation with constant

2012-07-15 Thread Carrot Wei
Hi

The following patches implemented the optimizations suggested by
PR53189, optimizations of 64bit logic operation with constant. Could
any maintainer help to review it?

http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00087.html
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00169.html
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00226.html

thanks
Carrot


Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant

2012-07-18 Thread Carrot Wei
On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
 wrote:
> Carrot,
>
> Sorry about the delayed response.
>
> On 3 July 2012 12:28, Carrot Wei  wrote:
>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>  wrote:
>>> On 28 May 2012 11:08, Carrot Wei  wrote:
>>>> Hi
>>>>
>>>> This is the second part of the patches that deals with 64bit and. It 
>>>> directly
>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>> constant operands.
>>>>
>>>
>>> Comments about const_di_ok_for_op still apply from my review of your add 
>>> patch.
>>>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>>>
>>> Finally this should target PR target/53189.
>>>
>>
>> Hi Ramana
>>
>> Thanks for the review. Following is the updated patch according to
>> your comments.
>
> You've missed answering this part of my review :)
>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>
It has been handled by the const_ok_for_dimode_op and the output part
of corresponding SI mode insn. Let's take the IOR case as an example.

In the const_ok_for_dimode_op patch

--- arm.c   (revision 189278)
+++ arm.c   (working copy)
@@ -2524,6 +2524,16 @@
 case PLUS:
   return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);

+case IOR:
+  if ((const_ok_for_arm (hi_val) || (hi_val == 0x))
+ && (const_ok_for_arm (lo_val) || (lo_val == 0x)))
+   return 1;
+  if (TARGET_THUMB2
+ && (const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val))
+ && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val)))
+   return 1;
+  return 0;
+
 default:
   return 0;
 }

The 0x is not valid arm mode immediate, but ior 0X
results in all 1's, so it is also allowed in an iordi3 insn. And the
patch in iorsi3_insn pattern explicitly check the all 0's and all 1's
cases, and output either a simple register mov instruction or
instruction mov all 1's to the destination.

@@ -2902,10 +2915,29 @@
(ior:SI (match_operand:SI 1 "s_register_operand" "%r,r,r")
(match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))]
   "TARGET_32BIT"
-  "@
-   orr%?\\t%0, %1, %2
-   orn%?\\t%0, %1, #%B2
-   #"
+  "*
+  {
+if (CONST_INT_P (operands[2]))
+  {
+   HOST_WIDE_INT i = INTVAL (operands[2]) & 0x;
+   if (i == 0x)
+ return \"mvn%?\\t%0, #0\";
+   if (i == 0)
+ {
+   if (!rtx_equal_p (operands[0], operands[1]))
+ return \"mov%?\\t%0, %1\";
+   else
+ return \"\";
+ }
+  }
+
+switch (which_alternative)
+  {
+  case 0: return \"orr%?\\t%0, %1, %2\";
+  case 1: return \"orn%?\\t%0, %1, #%B2\";
+  case 2: return \"#\";
+  }
+  }"
   "TARGET_32BIT
&& GET_CODE (operands[2]) == CONST_INT
&& !(const_ok_for_arm (INTVAL (operands[2]))


> Is there any reason why we don't split such cases earlier into the
> constituent moves and the associated ands earlier than reload in the
> non-Neon case?
>
I referenced pattern arm_adddi3 which is split after reload_completed.
And the pattern arm_subdi3 is even not split. I guess they keep the
original constant longer may benefit some optimizations involving
constants. But it may also lose flexibility in other cases.

>  In addition, it would be good to have some tests for Thumb2 that deal
> with the replicated constants for Thumb2 . Can you have a look at
> creating some tests similar to the thumb2*replicated*.c tests in
> gcc.target/arm but for 64 bit constants ?
>

The new test cases involving thumb2 replicated constants are added as following.

thanks
Carrot



2012-07-18  Wei Guozhi  

PR target/53189

Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant

2012-07-18 Thread Carrot Wei
On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan
 wrote:
> On 18 July 2012 09:20, Carrot Wei  wrote:
>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
>>  wrote:
>>> Carrot,
>>>
>>> Sorry about the delayed response.
>>>
>>> On 3 July 2012 12:28, Carrot Wei  wrote:
>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>>>  wrote:
>>>>> On 28 May 2012 11:08, Carrot Wei  wrote:
>>>>>> Hi
>>>>>>
>>>>>> This is the second part of the patches that deals with 64bit and. It 
>>>>>> directly
>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>>> constant operands.
>>>>>>
>>>>>
>>>>> Comments about const_di_ok_for_op still apply from my review of your add 
>>>>> patch.
>>>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>>
>>>>> Finally this should target PR target/53189.
>>>>>
>>>>
>>>> Hi Ramana
>>>>
>>>> Thanks for the review. Following is the updated patch according to
>>>> your comments.
>>>
>>> You've missed answering this part of my review :)
>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>
>> It has been handled by the const_ok_for_dimode_op and the output part
>> of corresponding SI mode insn. Let's take the IOR case as an example.
>>
>
> I noticed that - If I wasn't clear enough, the question was more
> towards generating a subreg move at expand time rather than a split
> and handling while outputting asm if you see what I mean.
>
I see your point now. I don't know how much better if we handle it
earlier. Even if I generates subreg move for non-neon code at expand
time, the latter output handling is still necessary for neon insns. Do
you think it deserves the extra expand handling?

thanks
Carrot


Re: [google/gcc-4_6_2-mobile] Port of Android target support in i386 for google/gcc-4_6_2-mobile branch

2012-05-07 Thread Carrot Wei
OK for Google branches.

On Mon, May 7, 2012 at 12:21 PM, Jing Yu  wrote:
> I would like to port this patch to google/gcc-4_6 and also
> google/gcc-4_6_2-mobile.
>
> From reading the patch, it does not change config for non-Android target.
>
> bootstrap,crosstool tests finished successfully on google/gcc-4_6.
> Built ARM android toolchain successfully.
>
> OK?
>
> Thanks,
> Jing
>
> On Thu, May 3, 2012 at 1:51 AM, Ilya Enkovich  wrote:
>> Hi,
>>
>> here is a port of Android support patch
>> (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for
>> google/gcc-4_6_2-mobile branch. Is it OK?
>>
>> Bootstrapped on linux-x86_64. Successfullly used for NDK release build
>> and Android ICS build.
>>
>> Thanks,
>> Ilya
>> ---
>> 2012-05-03  Enkovich Ilya  
>>
>>        * config/linux-android.h (ANDROID_STARTFILE_SPEC): Fix
>>        shared case.
>>        (ANDROID_ENDFILE_SPEC): Likewise.
>>
>>        * config/i386/linux.h (TARGET_OS_CPP_BUILTINS): Add Android
>>        builtins.
>>        (LINUX_TARGET_CC1_SPEC): New.
>>        (CC1_SPEC): Support Android.
>>        (LINUX_TARGET_LINK_SPEC): New.
>>        (LINK_SPEC): Support Android.
>>        (LIB_SPEC): New.
>>        (STARTFILE_SPEC): New.
>>        (LINUX_TARGET_ENDFILE_SPEC): New.
>>        (ENDFILE_SPEC): Support Android.
>>        * config/i386/linux64.h: Likewise.


[ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-05-26 Thread Carrot Wei
Hi,

As described in PR53447, many 64bit ALU operations with constant can be
optimized to use corresponding 32bit instructions with immediate operands.

This is the first part of the patches that deals with 64bit add. It directly
extends the patterns adddi3, arm_adddi3 and adddi3_neon to handle constant
operands.

Tested on arm qemu without regression.

OK for trunk?

thanks
Carrot

2012-05-26  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-1.c: New testcase.


2012-05-26  Wei Guozhi  

PR target/53447
* config/arm/arm-protos.h (const_ok_for_adddi): New prototype.
* config/arm/arm.c (const_ok_for_adddi): New function.
* config/arm/constraints.md (Dd): New constraint.
* config/arm/arm.md (adddi3): Extend it to handle constants.
(arm_adddi3): Likewise.
* config/arm/neon.md (adddi3_neon): Likewise.


Index: testsuite/gcc.target/arm/pr53447-1.c
===
--- testsuite/gcc.target/arm/pr53447-1.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-1.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p += 0x10001;
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 187751)
+++ config/arm/arm.c(working copy)
@@ -2497,6 +2497,17 @@
 }
 }

+/* Return TRUE if int I is a valid immediate constant used by pattern
+   arm_adddi3.  */
+int
+const_ok_for_adddi (HOST_WIDE_INT i)
+{
+  HOST_WIDE_INT high = (i >> 32) & 0x;
+  HOST_WIDE_INT low = i & 0x;
+  return (const_ok_for_arm (high)
+ && (const_ok_for_arm (low) || const_ok_for_arm (-low)));
+}
+
 /* Emit a sequence of insns to handle a large constant.
CODE is the code of the operation required, it can be any of SET, PLUS,
IOR, AND, XOR, MINUS;
Index: config/arm/arm-protos.h
===
--- config/arm/arm-protos.h (revision 187751)
+++ config/arm/arm-protos.h (working copy)
@@ -47,6 +47,7 @@
 extern bool arm_small_register_classes_for_mode_p (enum machine_mode);
 extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode);
+extern int const_ok_for_adddi (HOST_WIDE_INT);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187751)
+++ config/arm/neon.md  (working copy)
@@ -588,9 +588,9 @@
 )

 (define_insn "adddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w")
-(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w")
- (match_operand:DI 2 "s_register_operand" "w,r,0,w")))
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
+(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
+ (match_operand:DI 2 "reg_or_int_operand" "w,r,0,w,r,Dd,Dd")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_NEON"
 {
@@ -600,13 +600,16 @@
 case 3: return "vadd.i64\t%P0, %P1, %P2";
 case 1: return "#";
 case 2: return "#";
+case 4: return "#";
+case 5: return "#";
+case 6: return "#";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "conds" "*,clob,clob,*")
-   (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
+   (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
+   (set_attr "length" "*,8,8,*,*,*,*")
+   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
 )

 (define_insn "*sub3_neon"
Index: config/arm/constraints.md
===
--- config/arm/constraints.md   (revision 187751)
+++ config/arm/constraints.md   (working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -251,6 +251,12 @@
   (match_test "TARGET_32BIT && arm_const_double_inline_cost (op) == 4
   && !(optimize_size || arm_ld_sched)")))

+(define_constraint "Dd"
+ "@internal
+ In ARM/Thumb-2 state a const_int that can be used by insn adddi."
+ (and (match_code "const_int")
+  (match_test "TARGET_32BIT && const_ok_for_adddi (ival)")))
+
 (define_constraint "Di"
  "@internal
 

[ARM Patch 2/n]PR53447: optimizations of 64bit ALU operation with constant

2012-05-28 Thread Carrot Wei
Hi

This is the second part of the patches that deals with 64bit and. It directly
extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
constant operands.

Tested on arm qemu without regression.

OK for trunk?

thanks
Carrot

2012-05-28  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-2.c: New testcase.


2012-05-28  Wei Guozhi  

PR target/53447
* config/arm/arm-protos.h (const_ok_for_anddi): New prototype.
* config/arm/arm.c (const_ok_for_anddi): New function.
* config/arm/constraints.md (De): New constraint.
* config/arm/predicates.md (arm_anddi_operand): New predicate.
(arm_immediate_anddi_operand): Likewise.
(anddi_operand): Likewise.
* config/arm/arm.md (anddi3): Extend it to handle 64bit constants.
(anddi3_insn): Likewise.
* config/arm/neon.md (anddi3_neon): Likewise.



Index: testsuite/gcc.target/arm/pr53447-2.c
===
--- testsuite/gcc.target/arm/pr53447-2.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-2.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x10002;
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 187927)
+++ config/arm/arm.c(working copy)
@@ -2497,6 +2497,18 @@
 }
 }

+/* Return TRUE if int I is a valid immediate constant used by pattern
+   anddi3_insn.  */
+int
+const_ok_for_anddi (HOST_WIDE_INT i)
+{
+  HOST_WIDE_INT high = ARM_SIGN_EXTEND ((i >> 32) & 0x);
+  HOST_WIDE_INT low = ARM_SIGN_EXTEND (i & 0x);
+
+  return (TARGET_32BIT && (const_ok_for_arm (low) || const_ok_for_arm (~low))
+ && (const_ok_for_arm (high) || const_ok_for_arm (~high)));
+}
+
 /* Emit a sequence of insns to handle a large constant.
CODE is the code of the operation required, it can be any of SET, PLUS,
IOR, AND, XOR, MINUS;
Index: config/arm/arm-protos.h
===
--- config/arm/arm-protos.h (revision 187927)
+++ config/arm/arm-protos.h (working copy)
@@ -47,6 +47,7 @@
 extern bool arm_small_register_classes_for_mode_p (enum machine_mode);
 extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode);
+extern int const_ok_for_anddi (HOST_WIDE_INT);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187927)
+++ config/arm/neon.md  (working copy)
@@ -774,9 +774,9 @@
 )

 (define_insn "anddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w")
-(and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0")
-   (match_operand:DI 2 "neon_inv_logic_op2" "w,DL,r,r,w,DL")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w,?&r,?&r")
+(and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0,0,r")
+   (match_operand:DI 2 "anddi_operand" "w,DL,r,r,w,DL,De,De")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
@@ -788,12 +788,14 @@
 DImode, 1, VALID_NEON_QREG_MODE (DImode));
 case 2: return "#";
 case 3: return "#";
+case 6: return "#";
+case 7: return "#";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
-   (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,8,8,*,*,8,8")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8,*,*")]
 )

 (define_insn "orn3_neon"
Index: config/arm/constraints.md
===
--- config/arm/constraints.md   (revision 187927)
+++ config/arm/constraints.md   (working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, De, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -251,6 +251,12 @@
   (match_test "TARGET_32BIT && arm_const_double_inline_cost (op) == 4
   && !(optimize_size || arm_ld_sched)")))

+(define_constraint "De"
+ "@internal
+  In ARM/Thumb-2 state a const_int that can be used by anddi3_insn.  "
+ (and (match_code "const_int")
+  (match_tes

[ARM Patch 3/n]PR53447: optimizations of 64bit ALU operation with constant

2012-05-30 Thread Carrot Wei
Hi

This is the third part of the patches that deals with 64bit xor. It extends
the patterns xordi3, xordi3_insn and xordi3_neon to handle 64bit constant
operands.

Tested on arm qemu without regression.

OK for trunk?

thanks
Carrot

2012-05-30  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-3.c: New testcase.


2012-05-30  Wei Guozhi  

PR target/53447
* config/arm/arm.md (xordi3): Extend it to handle 64bit constants.
(xordi3_insn): Likewise.
* config/arm/neon.md (xordi3_neon): Likewise.




Index: testsuite/gcc.target/arm/pr53447-3.c
===
--- testsuite/gcc.target/arm/pr53447-3.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-3.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p ^= 0x10003;
+}
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187998)
+++ config/arm/neon.md  (working copy)
@@ -878,18 +878,20 @@
 )

 (define_insn "xordi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w")
-(xor:DI (match_operand:DI 1 "s_register_operand" "%w,0,r,w")
-   (match_operand:DI 2 "s_register_operand" "w,r,r,w")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r")
+(xor:DI (match_operand:DI 1 "s_register_operand" "%w,0,r,w,0,r")
+   (match_operand:DI 2 "arm_di_operand" "w,r,r,w,Di,Di")))]
   "TARGET_NEON"
   "@
veor\t%P0, %P1, %P2
#
#
-   veor\t%P0, %P1, %P2"
-  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   veor\t%P0, %P1, %P2
+   #
+   #"
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*")
+   (set_attr "length" "*,8,8,*,8,8")
+   (set_attr "arch" "nota8,*,*,onlya8,*,*")]
 )

 (define_insn "one_cmpl2"
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 187998)
+++ config/arm/arm.md   (working copy)
@@ -2994,17 +2994,38 @@
 (define_expand "xordi3"
   [(set (match_operand:DI 0 "s_register_operand" "")
(xor:DI (match_operand:DI 1 "s_register_operand" "")
-   (match_operand:DI 2 "s_register_operand" "")))]
+   (match_operand:DI 2 "arm_di_operand" "")))]
   "TARGET_32BIT"
   ""
 )

-(define_insn "*xordi3_insn"
-  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
-   (xor:DI (match_operand:DI 1 "s_register_operand"  "%0,r")
-   (match_operand:DI 2 "s_register_operand"   "r,r")))]
+(define_insn_and_split "*xordi3_insn"
+  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r,&r,&r")
+   (xor:DI (match_operand:DI 1 "s_register_operand"  "%0,r,0,r")
+   (match_operand:DI 2 "arm_di_operand"   "r,r,Di,Di")))]
   "TARGET_32BIT && !TARGET_IWMMXT && !TARGET_NEON"
   "#"
+  "TARGET_32BIT && !TARGET_IWMMXT && reload_completed"
+  [(set (match_dup 0) (xor:SI (match_dup 1) (match_dup 2)))
+   (set (match_dup 3) (xor:SI (match_dup 4) (match_dup 5)))]
+  "
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+if (GET_CODE (operands[2]) == CONST_INT)
+  {
+   HOST_WIDE_INT v = INTVAL (operands[2]);
+   operands[5] = GEN_INT (ARM_SIGN_EXTEND ((v >> 32) & 0x));
+   operands[2] = GEN_INT (ARM_SIGN_EXTEND (v & 0x));
+  }
+else
+  {
+   operands[5] = gen_highpart (SImode, operands[2]);
+   operands[2] = gen_lowpart (SImode, operands[2]);
+  }
+  }"
   [(set_attr "length" "8")
(set_attr "predicable" "yes")]
 )


Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-04 Thread Carrot Wei
2 state a const_int that can be used by insn adddi."
+ (and (match_code "const_int")
+  (match_test "TARGET_32BIT && const_ok_for_adddi (ival)")))
+
 (define_constraint "Di"
  "@internal
   In ARM/Thumb-2 state a const_int or const_double where both the high
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 187751)
+++ config/arm/arm.md   (working copy)
@@ -574,10 +574,21 @@
  [(parallel
[(set (match_operand:DI   0 "s_register_operand" "")
  (plus:DI (match_operand:DI 1 "s_register_operand" "")
-  (match_operand:DI 2 "s_register_operand" "")))
+  (match_operand:DI 2 "reg_or_int_operand" "")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
+  if (GET_CODE (operands[2]) == CONST_INT)
+{
+  if (TARGET_32BIT && const_ok_for_adddi (INTVAL (operands[2])))
+   {
+ emit_insn (gen_arm_adddi3 (operands[0], operands[1], operands[2]));
+ DONE;
+   }
+  else
+   operands[2] = force_reg (DImode, operands[2]);
+}
+
   if (TARGET_HARD_FLOAT && TARGET_MAVERICK)
 {
   if (!cirrus_fp_register (operands[0], DImode))
@@ -609,10 +620,10 @@
   [(set_attr "length" "4")]
 )

-(define_insn_and_split "*arm_adddi3"
-  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r")
-   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0")
-(match_operand:DI 2 "s_register_operand" "r,  0")))
+(define_insn_and_split "arm_adddi3"
+  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r,&r,&r,&r")
+   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r")
+(match_operand:DI 2 "reg_or_int_operand" "r,  0, r, Dd,Dd")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_MAVERICK) && !TARGET_NEON"
   "#"
@@ -630,8 +641,17 @@
 operands[0] = gen_lowpart (SImode, operands[0]);
 operands[4] = gen_highpart (SImode, operands[1]);
 operands[1] = gen_lowpart (SImode, operands[1]);
-operands[5] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
+if (GET_CODE (operands[2]) == CONST_INT)
+  {
+   HOST_WIDE_INT v = INTVAL (operands[2]);
+   operands[5] = GEN_INT (ARM_SIGN_EXTEND ((v >> 32) & 0x));
+   operands[2] = GEN_INT (ARM_SIGN_EXTEND (v & 0x));
+  }
+else
+  {
+   operands[5] = gen_highpart (SImode, operands[2]);
+   operands[2] = gen_lowpart (SImode, operands[2]);
+  }
   }"
   [(set_attr "conds" "clob")
(set_attr "length" "8")]



On Sat, May 26, 2012 at 9:42 PM, Carrot Wei  wrote:
> Hi,
>
> As described in PR53447, many 64bit ALU operations with constant can be
> optimized to use corresponding 32bit instructions with immediate operands.
>
> This is the first part of the patches that deals with 64bit add. It directly
> extends the patterns adddi3, arm_adddi3 and adddi3_neon to handle constant
> operands.
>
> Tested on arm qemu without regression.
>
> OK for trunk?
>
> thanks
> Carrot
>
> 2012-05-26  Wei Guozhi  
>
>        PR target/53447
>        * gcc.target/arm/pr53447-1.c: New testcase.
>
>
> 2012-05-26  Wei Guozhi  
>
>        PR target/53447
>        * config/arm/arm-protos.h (const_ok_for_adddi): New prototype.
>        * config/arm/arm.c (const_ok_for_adddi): New function.
>        * config/arm/constraints.md (Dd): New constraint.
>        * config/arm/arm.md (adddi3): Extend it to handle constants.
>        (arm_adddi3): Likewise.
>        * config/arm/neon.md (adddi3_neon): Likewise.
>
>
> Index: testsuite/gcc.target/arm/pr53447-1.c
> ===
> --- testsuite/gcc.target/arm/pr53447-1.c        (revision 0)
> +++ testsuite/gcc.target/arm/pr53447-1.c        (revision 0)
> @@ -0,0 +1,8 @@
> +/* { dg-options "-O2" }  */
> +/* { dg-require-effective-target arm32 } */
> +/* { dg-final { scan-assembler-not "mov" } } */
> +
> +void t0p(long long * p)
> +{
> +  *p += 0x10001;
> +}
> Index: config/arm/arm.c
> ===
> --- config/arm/arm.c    (revision 187751)
> +++ config/arm/arm.c    (working copy)
> @@ -2497,6 +2497,17 @@
>     }
>  }
>
> +/* Return TRUE if int I is a valid immediate const

[ARM Patch 4/n]PR53447: optimizations of 64bit ALU operation with constant

2012-06-05 Thread Carrot Wei
Hi

This is the fourth part of the patches that deals with 64bit ior. It directly
extends the patterns iordi3, iordi3_insn and iordi3_neon to handle 64bit
constant operands.

Tested on arm qemu without regression.

OK for trunk?

thanks
Carrot

2012-06-05  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-4.c: New testcase.


2012-06-05  Wei Guozhi  

PR target/53447
* config/arm/arm-protos.h (const_ok_for_iordi): New prototype.
* config/arm/arm.c (const_ok_for_iordi): New function.
* config/arm/constraints.md (Df): New constraint.
* config/arm/predicates.md (arm_iordi_operand): New predicate.
(arm_immediate_iordi_operand): Likewise.
(iordi_operand): Likewise.
* config/arm/arm.md (iordi3): Extend it to handle 64bit constants.
(iordi3_insn): Likewise.
* config/arm/neon.md (iordi3_neon): Likewise.


Index: testsuite/gcc.target/arm/pr53447-4.c
===
--- testsuite/gcc.target/arm/pr53447-4.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-4.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p |= 0x10008;
+}


Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 188048)
+++ config/arm/arm.c(working copy)
@@ -2496,6 +2496,24 @@
 }
 }

+/* Return TRUE if int I is a valid immediate constant used by pattern
+   iordi3_insn.  */
+int
+const_ok_for_iordi (HOST_WIDE_INT i)
+{
+  HOST_WIDE_INT high = ARM_SIGN_EXTEND ((i >> 32) & 0x);
+  HOST_WIDE_INT low = ARM_SIGN_EXTEND (i & 0x);
+
+  if (TARGET_32BIT && const_ok_for_arm (low) && const_ok_for_arm (high))
+return 1;
+
+  if (TARGET_THUMB2 && (const_ok_for_arm (low) || const_ok_for_arm (~low))
+  && (const_ok_for_arm (high) || const_ok_for_arm (~high)))
+return 1;
+
+  return 0;
+}
+
 /* Emit a sequence of insns to handle a large constant.
CODE is the code of the operation required, it can be any of SET, PLUS,
IOR, AND, XOR, MINUS;
Index: config/arm/arm-protos.h
===
--- config/arm/arm-protos.h (revision 188048)
+++ config/arm/arm-protos.h (working copy)
@@ -47,6 +47,7 @@
 extern bool arm_small_register_classes_for_mode_p (enum machine_mode);
 extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode);
+extern int const_ok_for_iordi (HOST_WIDE_INT);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx,
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 188048)
+++ config/arm/neon.md  (working copy)
@@ -729,9 +729,9 @@
 )

 (define_insn "iordi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w")
-(ior:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0")
-   (match_operand:DI 2 "neon_logic_op2" "w,Dl,r,r,w,Dl")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w,?&r,?&r")
+(ior:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0,0,r")
+   (match_operand:DI 2 "iordi_operand" "w,Dl,r,r,w,Dl,Df,Df")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
@@ -743,12 +743,14 @@
 DImode, 0, VALID_NEON_QREG_MODE (DImode));
 case 2: return "#";
 case 3: return "#";
+case 6: return "#";
+case 7: return "#";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
-   (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,8,8,*,*,8,8")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8,*,*")]
 )

 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
Index: config/arm/constraints.md
===
--- config/arm/constraints.md   (revision 188048)
+++ config/arm/constraints.md   (working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Df, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -251,6 +251,12 @@
   (match_test "TARGET_32BIT && arm_const_double_inline_cost (op) == 4
   && !(optimize_size || arm_ld_sched)"))

Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-06 Thread Carrot Wei
In the original patch, if "add r0, c" is not possible, but "sub r0,
-c" is possible, it will use the sub instruction. Although they
generate same result, but they may generate different CF flag, and
cause subsequent adc to compute out wrong result. So I updated the
patch to avoid using sub instruction.

Tested on arm qemu with both arm/thumb mode.

thanks
Carrot


2012-06-06  Wei Guozhi  

PR target/53447
* gcc.target/arm/pr53447-1.c: New testcase.


2012-06-06  Wei Guozhi  

PR target/53447
* config/arm/arm.md (adddi3): Extend it to handle constants.
(arm_adddi3): Likewise.
* config/arm/neon.md (adddi3_neon): Likewise.


Index: testsuite/gcc.target/arm/pr53447-1.c
===
--- testsuite/gcc.target/arm/pr53447-1.c(revision 0)
+++ testsuite/gcc.target/arm/pr53447-1.c(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p += 0x10001;
+}
Index: config/arm/neon.md
===
--- config/arm/neon.md  (revision 187751)
+++ config/arm/neon.md  (working copy)
@@ -588,9 +588,9 @@
 )

 (define_insn "adddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w")
-(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w")
- (match_operand:DI 2 "s_register_operand" "w,r,0,w")))
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
+(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
+ (match_operand:DI 2 "reg_or_int_operand" "w,r,0,w,r,Di,Di")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_NEON"
 {
@@ -600,13 +600,16 @@
 case 3: return "vadd.i64\t%P0, %P1, %P2";
 case 1: return "#";
 case 2: return "#";
+case 4: return "#";
+case 5: return "#";
+case 6: return "#";
 default: gcc_unreachable ();
 }
 }
-  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "conds" "*,clob,clob,*")
-   (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
+   (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
+   (set_attr "length" "*,8,8,*,8,8,8")
+   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
 )

 (define_insn "*sub3_neon"
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 187751)
+++ config/arm/arm.md   (working copy)
@@ -574,10 +574,21 @@
  [(parallel
[(set (match_operand:DI   0 "s_register_operand" "")
  (plus:DI (match_operand:DI 1 "s_register_operand" "")
-  (match_operand:DI 2 "s_register_operand" "")))
+  (match_operand:DI 2 "reg_or_int_operand" "")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
+  if (GET_CODE (operands[2]) == CONST_INT)
+{
+  if (TARGET_32BIT && arm_const_double_by_immediates (operands[2]))
+   {
+ emit_insn (gen_arm_adddi3 (operands[0], operands[1], operands[2]));
+ DONE;
+   }
+  else
+   operands[2] = force_reg (DImode, operands[2]);
+}
+
   if (TARGET_HARD_FLOAT && TARGET_MAVERICK)
 {
   if (!cirrus_fp_register (operands[0], DImode))
@@ -609,10 +620,10 @@
   [(set_attr "length" "4")]
 )

-(define_insn_and_split "*arm_adddi3"
-  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r")
-   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0")
-(match_operand:DI 2 "s_register_operand" "r,  0")))
+(define_insn_and_split "arm_adddi3"
+  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r,&r,&r,&r")
+   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r")
+(match_operand:DI 2 "reg_or_int_operand" "r,  0, r, Di,Di")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_MAVERICK) && !TARGET_NEON"
   "#"
@@ -630,8 +641,17 @@
 operands[0] = gen_lowpart (SImode, operands[0]);
 operands[4] = 

Re: [ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant

2012-06-08 Thread Carrot Wei
md(revision 187751)
+++ config/arm/predicates.md(working copy)
@@ -117,6 +117,10 @@
   (and (match_code "const_int,const_double")
(match_test "arm_const_double_by_immediates (op)")))

+(define_predicate "arm_neg_immediate_di_operand"
+  (and (match_code "const_int")
+   (match_test "arm_const_double_by_immediates (GEN_INT (-INTVAL (op)))")))
+
 (define_predicate "arm_neg_immediate_operand"
   (and (match_code "const_int")
(match_test "const_ok_for_arm (-INTVAL (op))")))
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 187751)
+++ config/arm/arm.md   (working copy)
@@ -574,10 +574,29 @@
  [(parallel
[(set (match_operand:DI   0 "s_register_operand" "")
  (plus:DI (match_operand:DI 1 "s_register_operand" "")
-  (match_operand:DI 2 "s_register_operand" "")))
+  (match_operand:DI 2 "reg_or_int_operand" "")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
+  if (GET_CODE (operands[2]) == CONST_INT)
+{
+  rtx neg_val = GEN_INT (-INTVAL (operands[2]));
+  if (TARGET_32BIT && arm_const_double_by_immediates (operands[2]))
+   {
+ emit_insn (gen_arm_adddi3 (operands[0], operands[1], operands[2]));
+ DONE;
+   }
+  else if (TARGET_32BIT && arm_const_double_by_immediates (neg_val))
+   {
+ emit_insn (gen_arm_subdi3_immediate (operands[0],
+  operands[1],
+  operands[2]));
+ DONE;
+   }
+  else
+   operands[2] = force_reg (DImode, operands[2]);
+}
+
   if (TARGET_HARD_FLOAT && TARGET_MAVERICK)
 {
   if (!cirrus_fp_register (operands[0], DImode))
@@ -609,10 +628,10 @@
   [(set_attr "length" "4")]
 )

-(define_insn_and_split "*arm_adddi3"
-  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r")
-   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0")
-(match_operand:DI 2 "s_register_operand" "r,  0")))
+(define_insn_and_split "arm_adddi3"
+  [(set (match_operand:DI  0 "s_register_operand" "=&r,&r,&r,&r,&r")
+   (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r")
+(match_operand:DI 2 "arm_di_operand" "r,  0, r, Di,Di")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_MAVERICK) && !TARGET_NEON"
   "#"
@@ -630,8 +649,17 @@
 operands[0] = gen_lowpart (SImode, operands[0]);
 operands[4] = gen_highpart (SImode, operands[1]);
 operands[1] = gen_lowpart (SImode, operands[1]);
-operands[5] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
+if (GET_CODE (operands[2]) == CONST_INT)
+  {
+   HOST_WIDE_INT v = INTVAL (operands[2]);
+   operands[5] = GEN_INT (ARM_SIGN_EXTEND ((v >> 32) & 0x));
+   operands[2] = GEN_INT (ARM_SIGN_EXTEND (v & 0x));
+  }
+else
+  {
+   operands[5] = gen_highpart (SImode, operands[2]);
+   operands[2] = gen_lowpart (SImode, operands[2]);
+  }
   }"
   [(set_attr "conds" "clob")
(set_attr "length" "8")]
@@ -1122,6 +1150,25 @@
(set_attr "length" "8")]
 )

+(define_insn "arm_subdi3_immediate"
+  [(set (match_operand:DI  0 "s_register_operand"   "=&r,&r")
+(plus:DI (match_operand:DI 1 "s_register_operand"   "0, r")
+ (match_operand:DI 2 "arm_neg_immediate_di_operand" "Dd,Dd")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_32BIT"
+  "*
+  {
+HOST_WIDE_INT v = -INTVAL (operands[2]);
+operands[3] = GEN_INT (ARM_SIGN_EXTEND ((v >> 32) & 0x));
+operands[2] = GEN_INT (ARM_SIGN_EXTEND (v & 0x));
+output_asm_insn (\"subs\\t%Q0, %Q1, %2\", operands);
+output_asm_insn (\"sbc\\t%R0, %R1, %3\", operands);
+return \"\";
+  }"
+  [(set_attr "conds" "clob")
+   (set_attr "length" "8")]
+)
+
 (define_insn "*thumb_subdi3"
   [(set (match_operand:DI   0 "register_operand" "=l")
(minus:DI (match_operand:DI 1 "register_operand"  "0")



On Wed, Jun 6, 2012 at 7:16 PM, Carrot Wei  wrote:
> In the original patch, if "add r0, c" is not possible, but "

Re: [google] Backport r174965 from trunk to google/gcc-4_6 (issue4852046)

2011-08-15 Thread Carrot Wei
ping

On Mon, Aug 8, 2011 at 11:00 AM, Guozhi Wei  wrote:
>
> Hi
>
> I want to backport r174965 from trunk to google/gcc-4_6, which fixed vect-72.c
> failure in target arm, as described in
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00927.html
>
> Tested with buildit and regression test on arm qemu.
>
> OK for google/gcc-4_6 ?
>
> thanks
> Carrot
>
>
> 2011-08-08  Guozhi Wei  
>
>        Backport r174965 from trunk.
>
>        2011-06-12  Ira Rosen  
>
>                * tree-vect-data-refs.c (vect_peeling_hash_get_most_frequent):
>                Take number of iterations to peel into account for equally 
> frequent
>                misalignment values.
>
>
> Index: tree-vect-data-refs.c
> ===
> --- tree-vect-data-refs.c       (revision 177320)
> +++ tree-vect-data-refs.c       (working copy)
> @@ -1250,7 +1250,9 @@ vect_peeling_hash_get_most_frequent (voi
>   vect_peel_info elem = (vect_peel_info) *slot;
>   vect_peel_extended_info max = (vect_peel_extended_info) data;
>
> -  if (elem->count > max->peel_info.count)
> +  if (elem->count > max->peel_info.count
> +      || (elem->count == max->peel_info.count
> +          && max->peel_info.npeel > elem->npeel))
>     {
>       max->peel_info.npeel = elem->npeel;
>       max->peel_info.count = elem->count;
>
>
> --
> This patch is available for review at http://codereview.appspot.com/4852046


[PATCH] PR49452, prevent post reload combining cross blockage insn

2011-09-19 Thread Carrot Wei
Hi

We should not combine insns cross volatile insns. Combine.c already does this
check, postreload should also do so.

Bootstrapped and regtested on x86_64-unknown-linux-gnu.
Regtested on arm qemu.

thanks
Carrot


ChangeLog:
2011-09-19  Wei Guozhi  

PR 49452
* postreload.c (reload_combine): Invalidate use information when across
volatile insn.


Index: postreload.c
===
--- postreload.c(revision 178938)
+++ postreload.c(working copy)
@@ -1312,7 +1312,8 @@ reload_combine (void)
 is and then later disable any optimization that would cross it.  */
   if (LABEL_P (insn))
last_label_ruid = reload_combine_ruid;
-  else if (BARRIER_P (insn))
+  else if (BARRIER_P (insn)
+  || (INSN_P (insn) && volatile_insn_p (PATTERN (insn
for (r = 0; r < FIRST_PSEUDO_REGISTER; r++)
  if (! fixed_regs[r])
  reg_state[r].use_index = RELOAD_COMBINE_MAX_USES;


Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Carrot Wei
Hi Tom

What's the behavior of your patch to the following case

typedef int int_unaligned __attribute__((aligned(1)));
int foo (int_unaligned *p)
{
  return *p;
}

thanks
Carrot

On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries  wrote:
> Hi Richard,
>
> I have a patch for PR43814. It introduces an option that assumes that function
> arguments of pointer type are aligned, and uses that information in
> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.
>
> I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
> only
> builds).
>
> I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
> generated wrong code for uselocale.c:
> ...
> glibc/locale/locale.h:
> ...
> /* This value can be passed to `uselocale' and may be returned by
>   it. Passing this value to any other function has undefined behavior.  */
> # define LC_GLOBAL_LOCALE       ((__locale_t) -1L)
> ...
> glibc/locale/uselocale.c:
> ...
> locale_t
> __uselocale (locale_t newloc)
> {
>  locale_t oldloc = _NL_CURRENT_LOCALE;
>
>  if (newloc != NULL)
>    {
>      const locale_t locobj
>        = newloc == LC_GLOBAL_LOCALE ? &_nl_global_locale : newloc;
>
> ...
> The assumption that function arguments of pointer type are aligned, allowed 
> the
> test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
> But the usage of ((__locale_t) -1L) as function argument in uselocale violates
> that assumption.
>
> Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without
> regressions for ARM.
>
> Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
> discussed here:
> - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
> - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html
>
> But, since glibc uses this construct currently, the option is off-by-default 
> for
> now.
>
> OK for trunk?
>
> Thanks,
> - Tom
>
> 2011-09-20  Tom de Vries 
>
>        PR target/43814
>        * tree-ssa-ccp.c (get_align_value): New function, factored out of
>        get_value_from_alignment.
>        (get_value_from_alignment): Use get_align_value.
>        (get_value_for_expr): Use get_align_value to handle alignment of
>        function argument pointers.
>        * common.opt (faligned-pointer-argument): New option.
>        * doc/invoke.texi (Optimization Options): Add
>        -faligned-pointer-argument.
>        (-faligned-pointer-argument): New item.
>
>        * gcc/testsuite/gcc.dg/pr43814.c: New test.
>        * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
>


Re: RFA: Avoid unnecessary clearing in union initialisers

2011-12-08 Thread Carrot Wei
Since it also affects 4.6 branch, can this and r176270 also be ported to gcc4.6?

thanks
Carrot

On Wed, Jul 13, 2011 at 12:34 AM, Richard Sandiford
 wrote:
> PR 48183 is caused by the fact that we don't really support integers
> (or least integer constants) wider than 2*HOST_BITS_PER_WIDE_INT:
>
>   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01220.html
>
> However, such constants shouldn't be needed in normal use.
> They came from an unnecessary zero-initialisation of a union such as:
>
>   union { a f1; b f2; } u = { init_f1 };
>
> where f1 and f2 are the full width of the union.  The zero-initialisation
> gets optimised away for "real" insns, but persists in debug insns:
>
>   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01585.html
>
> This patch takes up Richard's idea here:
>
>   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01987.html
>
> categorize_ctor_elements currently tries to work out how many scalars a
> constructor initialises (IE) and how many of those scalars are zero (ZE).
> Callers can then call count_type_elements to find out how many scalars (TE)
> ought to be initialised if the constructor is "complete" (i.e. if it
> explicitly initialises every meaningful byte, rather than relying on
> default zero-initialisation).  The constructor is complete if TE == ZE,
> except as noted in [A] below.
>
> However, count_type_elements can't return the required TE for unions,
> because it would need to know which of the union's fields was initialised
> by the constructor (if any).  This choice of field is reflected in IE and
> ZE, so would need to be reflected in TE as well.
>
> count_type_elements therefore punts on unions.  However, the caller
> can't easily tell whether it punts because of that, because of overflow,
> of because of variable-sized types.
>
> [A] One particular case of interest is when a union constructor initialises
> a field that is shorter than the union.  In this case, the rest of the
> union must be zeroed in order to ensure that the other fields have
> predictable values.  categorize_ctor_elements has a special out-parameter
> to reccord this situation.
>
> This leads to quite a complicated interface.  The patch tries to
> simplify it by making categorize_ctor_elements keep track of whether
> a constructor is complete.  This also has the minor advantage of
> avoiding double recursion: first through the constructor,
> then through its type tree.
>
> After this change, ZE and IE are only needed when deciding how best to
> implement "complete" initialisers (such as whether to do a bulk zero
> initialisation anyway, and just write the nonzero elements individually).
> For cases where a "leaf" constructor element is itself an aggregate with
> a union, we can therefore estimate the number of scalars in the union,
> and hopefully make the heuristic a bit more accurate than the current 1:
>
>            HOST_WIDE_INT tc = count_type_elements (TREE_TYPE (value), true);
>            if (tc < 1)
>              tc = 1;
>
> cp/typeck2.c also wants to check whether the variable parts of a
> constructor are complete.  The patch uses the approach to completeness
> there.  This should make it a bit more general than the current code,
> which only deals with non-nested constructors.
>
> Tested on x86_64-linux-gnu (all languages, including Ada), and on
> arm-linux-gnueabi.  OK to install?
>
> Richard
>
>
> gcc/
>        * tree.h (categorize_ctor_elements): Remove comment.  Fix long line.
>        (count_type_elements): Delete.
>        (complete_ctor_at_level_p): Declare.
>        * expr.c (flexible_array_member_p): New function, split out from...
>        (count_type_elements): ...here.  Make static.  Replace allow_flexarr
>        parameter with for_ctor_p.  When for_ctor_p is true, return the
>        number of elements that should appear in the top-level constructor,
>        otherwise return an estimate of the number of scalars.
>        (categorize_ctor_elements): Replace p_must_clear with p_complete.
>        (categorize_ctor_elements_1): Likewise.  Use complete_ctor_at_level_p.
>        (complete_ctor_at_level_p): New function, borrowing union logic
>        from old categorize_ctor_elements_1.
>        (mostly_zeros_p): Return true if the constructor is not complete.
>        (all_zeros_p): Update call to categorize_ctor_elements.
>        * gimplify.c (gimplify_init_constructor): Update call to
>        categorize_ctor_elements.  Don't call count_type_elements.
>        Unconditionally prevent clearing for variable-sized types,
>        otherwise rely on categorize_ctor_elements to detect
>        incomplete initializers.
>
> gcc/cp/
>        * typeck2.c (split_nonconstant_init_1): Pass the initializer directly,
>        rather than a pointer to it.  Return true if the whole of the value
>        was initialized by the generated statements.  Use
>        complete_ctor_at_level_p instead of count_type_elements.
>
> gcc/testsuite/
> 2011-07-12  Chung-Lin Tang  
>
>        * gcc.target/a

Re: RFA: Avoid unnecessary clearing in union initialisers

2011-12-10 Thread Carrot Wei
On Fri, Dec 9, 2011 at 4:56 PM, Richard Sandiford
 wrote:
> Carrot Wei  writes:
>> Since it also affects 4.6 branch, can this and r176270 also be ported
>> to gcc4.6?
>
> Always worth asking, but in this case, I'm not sure it's appropriate.
> The patch is pretty invasive, and I don't think the bug is a regression.
>
> Also, 4.6 generates really lousy code for these intrinsics, so I think
> anyone who's serious about using them would need 4.7 anyway.
>

Sounds reasonable.

thanks
Carrot


Re: [PATCH] Fix sibcall argument overlap checking if pretend_args_size (PR target/52129)

2012-02-06 Thread Carrot Wei
Hi Jakub

Instead of disabling the sibcall, it could also be a valid tail call
optimization by moving the str after ldmia, and change the used
register(It should be handled by RA automatically), as following

 ...
  add r4, r1, r4, lsl #2
  ldmia   r2, {r1, r2}
  str r4, [sp, #48]
  ...

thanks
Carrot

On Mon, Feb 6, 2012 at 9:01 PM, Jakub Jelinek  wrote:
> Hi!
>
> The attached testcase is miscompiled on arm*, by doing a sibcall when setup
> of one argument overwrites incoming arguments used to setup parameters in
> later insns.
> The reason why
> mem_overlaps_already_clobbered_arg_p/check_sibcall_argument_overlap
> fails to detect is that the caller has non-zero
> crtl->args.pretend_args_size, and in that case the base:
>      /* The argument block when performing a sibling call is the
>         incoming argument block.  */
>      if (pass == 0)
>        {
>          argblock = crtl->args.internal_arg_pointer;
>          argblock
> #ifdef STACK_GROWS_DOWNWARD
>            = plus_constant (argblock, crtl->args.pretend_args_size);
> #else
>            = plus_constant (argblock, -crtl->args.pretend_args_size);
> #endif
>          stored_args_map = sbitmap_alloc (args_size.constant);
>          sbitmap_zero (stored_args_map);
>        }
> apparently isn't virtual-incoming-rtx, but that plus pretend_args_size
> (8 in this case).  When we store bits into stored_args_map sbitmap,
> we use arg->locate.slot_offset.constant based values (or something different
> for ARGS_GROW_DOWNWARD, but when mem_overlaps_already_clobbered_arg_p is
> testing those bits, it uses just virtual-incoming-rtx offsets (or something
> different for ARGS_GROW_DOWNWARD).  This patch fixes it by adjusting the
> virtual-incoming-rtx relative offset to be actually argblock relative
> offset.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux and tested on the
> testcase on arm cross.  Ok for trunk?
>
> 2012-02-06  Jakub Jelinek  
>
>        PR target/52129
>        * calls.c (mem_overlaps_already_clobbered_arg_p): If val is
>        CONST_INT_P, subtract resp. add crtl->args.pretend_args_size to it.
>
>        * gcc.c-torture/execute/pr52129.c: New test.
>
> --- gcc/calls.c.jj      2012-02-01 14:44:27.0 +0100
> +++ gcc/calls.c 2012-02-06 10:19:12.112132905 +0100
> @@ -1808,6 +1808,11 @@ mem_overlaps_already_clobbered_arg_p (rt
>     return true;
>   else
>     i = INTVAL (val);
> +#ifdef STACK_GROWS_DOWNWARD
> +  i -= crtl->args.pretend_args_size;
> +#else
> +  i += crtl->args.pretend_args_size;
> +#endif
>
>  #ifdef ARGS_GROW_DOWNWARD
>   i = -i - size;
> --- gcc/testsuite/gcc.c-torture/execute/pr52129.c.jj    2012-02-06 
> 10:27:50.988876791 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr52129.c       2012-02-06 
> 10:25:26.0 +0100
> @@ -0,0 +1,28 @@
> +/* PR target/52129 */
> +
> +extern void abort (void);
> +struct S { void *p; unsigned int q; };
> +struct T { char a[64]; char b[64]; } t;
> +
> +__attribute__((noinline, noclone)) int
> +foo (void *x, struct S s, void *y, void *z)
> +{
> +  if (x != &t.a[2] || s.p != &t.b[5] || s.q != 27 || y != &t.a[17] || z != 
> &t.b[17])
> +    abort ();
> +  return 29;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +bar (void *x, void *y, void *z, struct S s, int t, struct T *u)
> +{
> +  return foo (x, s, &u->a[t], &u->b[t]);
> +}
> +
> +int
> +main ()
> +{
> +  struct S s = { &t.b[5], 27 };
> +  if (bar (&t.a[2], (void *) 0, (void *) 0, s, 17, &t) != 29)
> +    abort ();
> +  return 0;
> +}
>
>        Jakub


Re: [PATCH] Fix sibcall argument overlap checking if pretend_args_size (PR target/52129)

2012-02-09 Thread Carrot Wei
Hi Richard and Jakub

Since 4.6 contains the same bug, I would like to back port it to 4.6
branch. Could you approve it for 4.6?

Jing and Doug

Could you approve it for google/gcc-4_6-mobile branch?

thanks
Carrot

On Mon, Feb 6, 2012 at 9:14 PM, Richard Guenther
 wrote:
> On Mon, Feb 6, 2012 at 2:01 PM, Jakub Jelinek  wrote:
>> Hi!
>>
>> The attached testcase is miscompiled on arm*, by doing a sibcall when setup
>> of one argument overwrites incoming arguments used to setup parameters in
>> later insns.
>> The reason why
>> mem_overlaps_already_clobbered_arg_p/check_sibcall_argument_overlap
>> fails to detect is that the caller has non-zero
>> crtl->args.pretend_args_size, and in that case the base:
>>      /* The argument block when performing a sibling call is the
>>         incoming argument block.  */
>>      if (pass == 0)
>>        {
>>          argblock = crtl->args.internal_arg_pointer;
>>          argblock
>> #ifdef STACK_GROWS_DOWNWARD
>>            = plus_constant (argblock, crtl->args.pretend_args_size);
>> #else
>>            = plus_constant (argblock, -crtl->args.pretend_args_size);
>> #endif
>>          stored_args_map = sbitmap_alloc (args_size.constant);
>>          sbitmap_zero (stored_args_map);
>>        }
>> apparently isn't virtual-incoming-rtx, but that plus pretend_args_size
>> (8 in this case).  When we store bits into stored_args_map sbitmap,
>> we use arg->locate.slot_offset.constant based values (or something different
>> for ARGS_GROW_DOWNWARD, but when mem_overlaps_already_clobbered_arg_p is
>> testing those bits, it uses just virtual-incoming-rtx offsets (or something
>> different for ARGS_GROW_DOWNWARD).  This patch fixes it by adjusting the
>> virtual-incoming-rtx relative offset to be actually argblock relative
>> offset.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux and tested on the
>> testcase on arm cross.  Ok for trunk?
>
> Ok.
>
> Thanks,
> Richard.
>
>> 2012-02-06  Jakub Jelinek  
>>
>>        PR target/52129
>>        * calls.c (mem_overlaps_already_clobbered_arg_p): If val is
>>        CONST_INT_P, subtract resp. add crtl->args.pretend_args_size to it.
>>
>>        * gcc.c-torture/execute/pr52129.c: New test.
>>
>> --- gcc/calls.c.jj      2012-02-01 14:44:27.0 +0100
>> +++ gcc/calls.c 2012-02-06 10:19:12.112132905 +0100
>> @@ -1808,6 +1808,11 @@ mem_overlaps_already_clobbered_arg_p (rt
>>     return true;
>>   else
>>     i = INTVAL (val);
>> +#ifdef STACK_GROWS_DOWNWARD
>> +  i -= crtl->args.pretend_args_size;
>> +#else
>> +  i += crtl->args.pretend_args_size;
>> +#endif
>>
>>  #ifdef ARGS_GROW_DOWNWARD
>>   i = -i - size;
>> --- gcc/testsuite/gcc.c-torture/execute/pr52129.c.jj    2012-02-06 
>> 10:27:50.988876791 +0100
>> +++ gcc/testsuite/gcc.c-torture/execute/pr52129.c       2012-02-06 
>> 10:25:26.0 +0100
>> @@ -0,0 +1,28 @@
>> +/* PR target/52129 */
>> +
>> +extern void abort (void);
>> +struct S { void *p; unsigned int q; };
>> +struct T { char a[64]; char b[64]; } t;
>> +
>> +__attribute__((noinline, noclone)) int
>> +foo (void *x, struct S s, void *y, void *z)
>> +{
>> +  if (x != &t.a[2] || s.p != &t.b[5] || s.q != 27 || y != &t.a[17] || z != 
>> &t.b[17])
>> +    abort ();
>> +  return 29;
>> +}
>> +
>> +__attribute__((noinline, noclone)) int
>> +bar (void *x, void *y, void *z, struct S s, int t, struct T *u)
>> +{
>> +  return foo (x, s, &u->a[t], &u->b[t]);
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  struct S s = { &t.b[5], 27 };
>> +  if (bar (&t.a[2], (void *) 0, (void *) 0, s, 17, &t) != 29)
>> +    abort ();
>> +  return 0;
>> +}
>>
>>        Jakub


Re: [PATCH] Fix sibcall argument overlap checking if pretend_args_size (PR target/52129)

2012-02-10 Thread Carrot Wei
On Fri, Feb 10, 2012 at 2:13 PM, Jing Yu  wrote:
> On Thu, Feb 9, 2012 at 12:54 AM, Carrot Wei  wrote:
>> Hi Richard and Jakub
>>
>> Since 4.6 contains the same bug, I would like to back port it to 4.6
>> branch. Could you approve it for 4.6?
>>
>> Jing and Doug
>>
>> Could you approve it for google/gcc-4_6-mobile branch?
>>
>
> OK for google/gcc-4_6-mobile and gcc-4_6_2-mobile
>
> Jing
>

Bootstrapped/regtested on x86_64-linux and regtested on arm qemu.
Committed to both google/gcc-4_6-mobile and google/gcc-4_6_2-mobile.

thanks
Carrot


[PING] 3 ARM patches

2011-05-23 Thread Carrot Wei
Hi

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01973.html
Use ldrd and strd to access two consecutive words

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00490.html
Compute attr length for thumb2 insns

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01092.html
Replace 32 bit instructions with 16 bit instructions in thumb2

thanks
Carrot


Re: [google] Disable getpagesize() for Android toolchain (issue4515131)

2011-05-26 Thread Carrot Wei
Hi

I've tested the

#ifndef __ANDROID__

on arm qemu without regression. And also built Android toolchain
without this error.

thanks
Carrot


2011-05-26  Jing Yu  

* ChangeLog.google-main: New file.
* getpagesize.c(getpagesize): Disable it for bionic.


Index: ChangeLog.google-main
===
--- ChangeLog.google-main   (revision 0)
+++ ChangeLog.google-main   (revision 0)
@@ -0,0 +1,5 @@
+Copyright (C) 2011 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
Index: getpagesize.c
===
--- getpagesize.c   (revision 174099)
+++ getpagesize.c   (working copy)
@@ -60,11 +60,13 @@ BUGS
 # endif /* PAGESIZE */
 #endif /* GNU_OUR_PAGESIZE */

+#ifndef __ANDROID__
 int
 getpagesize (void)
 {
   return (GNU_OUR_PAGESIZE);
 }
+#endif

 #else /* VMS */


On Wed, May 25, 2011 at 2:07 AM, Doug Kwan (關振德)  wrote:
> Shouldn't we test
>
> ifndef __ANDROID__
>
> instead?
>
> -Doug
>
> On Tue, May 24, 2011 at 2:39 AM, Guozhi Wei  wrote:
>> Hi
>>
>> This patch is for google/main.
>>
>> In order to be compatible with current bionic and sysroot, we need to disable
>> getpagesize(). After getpagesize() in bionic is changed and ndk contains that
>> change, we can reenable it.
>>
>> Jing can give more details about it.
>>
>> This patch has been tested on arm qemu without regression.
>>
>> thanks
>> Carrot
>>
>> 2011-05-24  Jing Yu  
>>
>>        * ChangeLog.google-main: New file.
>>        * getpagesize.c(getpagesize): Disable it for bionic.
>>
>>
>> Index: ChangeLog.google-main
>> ===
>> --- ChangeLog.google-main       (revision 0)
>> +++ ChangeLog.google-main       (revision 0)
>> @@ -0,0 +1,5 @@
>> +Copyright (C) 2011 Free Software Foundation, Inc.
>> +
>> +Copying and distribution of this file, with or without modification,
>> +are permitted in any medium without royalty provided the copyright
>> +notice and this notice are preserved.
>> Index: getpagesize.c
>> ===
>> --- getpagesize.c       (revision 174099)
>> +++ getpagesize.c       (working copy)
>> @@ -60,11 +60,13 @@ BUGS
>>  # endif /* PAGESIZE */
>>  #endif /* GNU_OUR_PAGESIZE */
>>
>> +#if DEFAULT_LIBC != LIBC_BIONIC
>>  int
>>  getpagesize (void)
>>  {
>>   return (GNU_OUR_PAGESIZE);
>>  }
>> +#endif
>>
>>  #else /* VMS */
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/4515131
>>
>


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-06-01 Thread Carrot Wei
Hi H.J.

This patch is also important to ChromeOS toolchain. Could you also try
to update and test it for google/main?

thanks
Carrot

On Wed, May 18, 2011 at 11:57 PM, H.J. Lu  wrote:
> On Tue, Apr 26, 2011 at 6:05 AM, H.J. Lu  wrote:
>> On Thu, Mar 31, 2011 at 7:57 AM, H.J. Lu  wrote:
>>> On Mon, Mar 21, 2011 at 11:40 AM, H.J. Lu  wrote:
 On Mon, Mar 14, 2011 at 12:28 PM, H.J. Lu  wrote:
> On Thu, Jan 27, 2011 at 2:40 AM, Richard Guenther
>  wrote:
>> On Thu, Jan 27, 2011 at 12:12 AM, H.J. Lu  wrote:
>>> On Tue, Dec 14, 2010 at 05:20:48PM -0800, H.J. Lu wrote:
 This patch uses .init_array/.fini_array sections instead of
 .ctors/.dtors sections if mixing .init_array/.fini_array and
 .ctors/.dtors sections with init_priority works.

 It removes .ctors/.ctors sections from executables and DSOes, which 
 will
 remove one function call at startup time from each executable and DSO.
 It should reduce image size and improve system startup time.

 If a platform with a working .init_array/.fini_array support needs a
 different .init_array/.fini_array implementation, it can set
 use_initfini_array to no.

 Since .init_array/.fini_array is a target feature. 
 --enable-initfini-array
 is default to no unless the native run-time test is passed.

 To pass the native run-time test, a linker with SORT_BY_INIT_PRIORITY
 support is required.  The binutils patch is available at

 http://sourceware.org/ml/binutils/2010-12/msg00466.html
>>>
>>> Linker patch has been checked in.
>>>

 This patch passed 32bit/64bit regression test on Linux/x86-64.  Any
 comments?

>>>
>>> This updated patch fixes build on Linux/ia64 and should work on others.
>>> Any comments?
>>
>> Yes.  This is stage1 material.
>>
>
> Here is the updated patch.  OK for trunk?
>
> Thanks.
>
>
> --
> H.J.
> 
> 2011-03-14  H.J. Lu  
>
>        PR target/46770
>        * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Removed.
>
>        * config.gcc (use_initfini_array): New variable.
>        Use initfini-array.o if supported.
>
>        * crtstuff.c: Don't generate .ctors nor .dtors sections if
>        NO_CTORS_DTORS_SECTIONS is defined.
>
>        * configure.ac: Remove gcc_AC_INITFINI_ARRAY.  Add
>        --enable-initfini-array and check if .init_array can be used with
>        .ctors.
>
>        * configure: Regenerated.
>
>        * config/initfini-array.c: New.
>        * config/initfini-array.h: Likewise.
>        * config/t-initfini-array: Likewise.
>
>        * config/arm/arm.c (arm_asm_init_sections): Call
>        elf_initfini_array_init_sections if NO_CTORS_DTORS_SECTIONS
>        is defined.
>        * config/avr/avr.c (avr_asm_init_sections): Likewise.
>        * config/ia64/ia64.c (ia64_asm_init_sections): Likewise.
>        * config/mep/mep.c (mep_asm_init_sections): Likewise.
>        * config/microblaze/microblaze.c 
> (microblaze_elf_asm_init_sections):
>        Likewise.
>        * config/rs6000/rs6000.c (rs6000_elf_asm_init_sections): Likewise.
>        * config/stormy16/stormy16.c (xstormy16_asm_init_sections):
>        Likewise.
>        * config/v850/v850.c (v850_asm_init_sections): Likewise.
>

 PING:

 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00760.html

>>>
>>> Any comments?  Any objections?
>>>
>>
>> Here is the patch updated for the current trunk.  OK for trunk?
>>
>
> PING,.
>
> --
> H.J.
>


Re: [google]Backport r174549 Fix 3 test cases incorrectly run in Thumb/Xscale (issue4524090)

2011-06-02 Thread Carrot Wei
OK for google/main.

thanks
Carrot

On Thu, Jun 2, 2011 at 12:51 PM, Jing Yu  wrote:
> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00134.html
> Backport r174549 to fix three testcases that are specific to ARM mode
> and therefore should be skipped when compiling for thumb.
>
> Thanks,
> Jing
>
> 2011-06-01  Jing Yu  
>        Backport r174549
>
>        2011-06-01  Sofiane Naci  
>
>        * gcc.target/arm/mmx-1.c: Skip test in -mthumb.
>        * gcc.target/arm/g2.c: Skip test in -mthumb.
>        Skip test unless cpu is xscale.
>        * gcc.target/arm/scd42-2.c: Likewise.
>
> Index: gcc.target/arm/mmx-1.c
> ===
> --- gcc.target/arm/mmx-1.c      (revision 174299)
> +++ gcc.target/arm/mmx-1.c      (working copy)
> @@ -4,6 +4,7 @@
>  /* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } 
> { "-mcpu=iwmmxt" } } */
>  /* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } 
> { "-mabi=iwmmxt" } } */
>  /* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" } 
> { "-march=iwmmxt" } } */
> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { 
> "" } } */
>  /* { dg-options "-O -mno-apcs-frame -mcpu=iwmmxt -mabi=iwmmxt" } */
>  /* { dg-require-effective-target arm32 } */
>  /* { dg-require-effective-target arm_iwmmxt_ok } */
> Index: gcc.target/arm/g2.c
> ===
> --- gcc.target/arm/g2.c (revision 174299)
> +++ gcc.target/arm/g2.c (working copy)
> @@ -2,6 +2,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mcpu=xscale -O2" } */
>  /* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-march=*" } 
> { "-march=xscale" } } */
> +/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } 
> { "-mcpu=xscale" } } */
> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { 
> "" } } */
>  /* { dg-require-effective-target arm32 } */
>
>  /* Brett Gaines' test case. */
> Index: gcc.target/arm/scd42-2.c
> ===
> --- gcc.target/arm/scd42-2.c    (revision 174299)
> +++ gcc.target/arm/scd42-2.c    (working copy)
> @@ -2,6 +2,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mcpu=xscale -O" } */
>  /* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-march=*" } 
> { "-march=xscale" } } */
> +/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } 
> { "-mcpu=xscale" } } */
> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { 
> "" } } */
>  /* { dg-require-effective-target arm32 } */
>
>  unsigned load2(void) __attribute__ ((naked));
>
> --
> This patch is available for review at http://codereview.appspot.com/4524090
>


Re: [PING] 3 ARM patches

2011-06-03 Thread Carrot Wei
Hi ARM maintainers

Could you help to review the following patches?

thanks
Carrot

On Tue, May 24, 2011 at 9:31 AM, Carrot Wei  wrote:
> Hi
>
> http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01973.html
> Use ldrd and strd to access two consecutive words
>
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00490.html
> Compute attr length for thumb2 insns
>
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01092.html
> Replace 32 bit instructions with 16 bit instructions in thumb2
>
> thanks
> Carrot
>


Re: [google]Skip target-libiberty for arm*-*-linux-androideabi (issue4564050)

2011-06-06 Thread Carrot Wei
OK.

thanks
Carrot

On Tue, Jun 7, 2011 at 1:09 AM,   wrote:
> The trunk version has been approved and committed as r174710. Backport
> it to google/main. The google/main version has the same logic but is
> slightly different since trunk has a different code structure here. OK
> for google/main?
>
> 2011-06-06  Jing Yu  
>
>        Backport trunk r174710:
>
>        * configure.ac: Skip target-libiberty for arm*-*-linux-androideabi.
>        * configure: Regenerated.
>
> http://codereview.appspot.com/4564050/
>


Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-09 Thread Carrot Wei
It also breaks arm backend.

../trunk/configure '--build=x86_64-build_pc-linux-gnu'
'--host=x86_64-build_pc-linux-gnu'
'--target=arm-unknown-linux-gnueabi'
'--with-sysroot=/home/carrot/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sys-root'
'--disable-multilib' '--with-float=soft' '--disable-sjlj-exceptions'
'--enable-__cxa_atexit' '--disable-nls' '--enable-threads=posix'
'--enable-symvers=gnu' '--enable-c99' '--enable-long-long'
'--enable-target-optspace' '--disable-bootstrap'
'build_alias=x86_64-build_pc-linux-gnu'
'host_alias=x86_64-build_pc-linux-gnu'
'target_alias=arm-unknown-linux-gnueabi'
'--enable-languages=c,c++,lto'

make

...

/bin/sh ../libtool --tag CXX   --mode=compile
/usr/local/google/home/carrot/armobj1/./gcc/xgcc -shared-libgcc
-B/usr/local/google/home/carrot/armobj1/./gcc -nostdinc++
-L/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/src
-L/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/src/.libs
-B/usr/local/arm-unknown-linux-gnueabi/bin/
-B/usr/local/arm-unknown-linux-gnueabi/lib/ -isystem
/usr/local/arm-unknown-linux-gnueabi/include -isystem
/usr/local/arm-unknown-linux-gnueabi/sys-include
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/arm-unknown-linux-gnueabi
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include
-I/usr/local/google/home/carrot/trunk/libstdc++-v3/libsupc++
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual
-fdiagnostics-show-location=once  -ffunction-sections -fdata-sections
-g -Os 
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/backward
-Wno-deprecated -c ../../../../trunk/libstdc++-v3/src/strstream.cc
libtool: compile:  /usr/local/google/home/carrot/armobj1/./gcc/xgcc
-shared-libgcc -B/usr/local/google/home/carrot/armobj1/./gcc
-nostdinc++ 
-L/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/src
-L/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/src/.libs
-B/usr/local/arm-unknown-linux-gnueabi/bin/
-B/usr/local/arm-unknown-linux-gnueabi/lib/ -isystem
/usr/local/arm-unknown-linux-gnueabi/include -isystem
/usr/local/arm-unknown-linux-gnueabi/sys-include
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/arm-unknown-linux-gnueabi
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include
-I/usr/local/google/home/carrot/trunk/libstdc++-v3/libsupc++
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual
-fdiagnostics-show-location=once -ffunction-sections -fdata-sections
-g -Os 
-I/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/backward
-Wno-deprecated -c ../../../../trunk/libstdc++-v3/src/strstream.cc
-fPIC -DPIC -o .libs/strstream.o
In file included from ../../../../trunk/libstdc++-v3/src/strstream.cc:45:0:
/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/backward/strstream:
In member function 'void*
std::strstream::_ZTv0_n12_NSt9strstreamD1Ev()':
/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/include/backward/strstream:171:13:
internal compiler error: in verify_curr_properties, at passes.c:1660
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[4]: *** [strstream.lo] Error 1
make[4]: Leaving directory
`/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
`/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/usr/local/google/home/carrot/armobj1/arm-unknown-linux-gnueabi/libstdc++-v3'
make[1]: *** [all-target-libstdc++-v3] Error 2
make[1]: Leaving directory `/usr/local/google/home/carrot/armobj1'
make: *** [all] Error 2


On Fri, Jun 10, 2011 at 6:05 AM, H.J. Lu  wrote:
> On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li  wrote:
>> Please review the attached two patches.
>>
>> In the first patch, gate functions are cleaned up. All the per
>> function legality checks are moved into the executor and the
>> optimization heuristic checks (optimize for size) remain in the
>> gators. These allow the the following overriding order:
>>
>>    common flags (O2, -ftree-vrp, -fgcse etc)   <---  compiler
>> heuristic (optimize for size/speed) <--- -fdisable/enable forcing pass
>> options  <--- legality check
>>
>> Testing under going. Ok for trunk?
>>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350
>
> --
> H.J.
>


Re: [PATCH, ARM] PR47855 Compute attr length for thumb2 insns, 3/3 (issue4475042)

2011-07-07 Thread Carrot Wei
Thanks for the review.

Richard, what's the situation of unaligned memory access and how does
it conflict with this patch?

thanks
Carrot

On Tue, Jun 7, 2011 at 6:42 PM, Nick Clifton  wrote:
> Hi Carrot,
>
>> 2011-05-06  Guozhi Wei  
>>
>>        PR target/47855
>>        * config/arm/thumb2.md (thumb2_movsi_insn): Add length addtribute.
>>        (thumb2_shiftsi3_short and peephole2): Remove 3-register case.
>>        (thumb2_cbz): Refine length computation.
>>        (thumb2_cbnz): Likewise.
>
> Not approved - yet.
>
> The problem is the change to thumb2_movsi_insn.  You are still adding in the
> support for the STM instruction despite the fact that Richard is still
> researching how this will work with unaligned addresses.  Given the fact
> that this change is not mentioned in the ChangeLog entry, I will assume that
> you intended to remove it and just forgot.
>
> I have no issues with the rest of your patch, so if you could submit an
> updated patch I will be happy to review it again.
>
> One small point - when/if you do resubmit the STM part of the patch, you
> could make the code slightly cleaner by enclosing it in curly parentheses,
> thus avoiding the need to escape the double quote marks.  Ie:
>
> +  {
> +  switch (which_alternative)
> +    {
> +    case 0:
> +    case 1:
> +      return "mov%?\t%0, %1";
> +
> +    case 2:
> +      return "mvn%?\t%0, #%B1";
> +
> +    case 3:
> +      return "movw%?\t%0, %1";
> +
> +    case 4:
> +      if (GET_CODE (XEXP (operands[1], 0)) == POST_INC)
> +       {
> +         operands[1] = XEXP (XEXP (operands[1], 0), 0);
> +         return "ldm%(ia%)\t%1!, {%0}";
> +       }
> +     /* Fall through.  */
> +    case 5:
> +      return "ldr%?\t%0, %1";
> +
> +    case 6:
> +      if (GET_CODE (XEXP (operands[0], 0)) == POST_INC)
> +       {
> +         operands[0] = XEXP (XEXP (operands[0], 0), 0);
> +         return "stm%(ia%)\t%0!, {%1}";
> +       }
> +      /* Fall through.  */
> +    case 7:
> +      return "str%?\t%1, %0";
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +  }
>
> Cheers
>  Nick
>
>


Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)

2011-07-13 Thread Carrot Wei
Hi Diego

The previous patch was done with svn merge.

This new version is done with svnmerge.py. Again tested with

make check-g++ RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
dg.exp=anon-ns1.C"
make check-g++ RUNTESTFLAGS="dg.exp=anon-ns1.C"

BTW, there are some unexpected property changes after merge, I don't
how did they come out and how should I deal with them?

thanks
Carrot


2011-07-14   Guozhi Wei  

Backport r175881 from gcc-4_6-branch

2011-07-05  Jason Merrill  

PR testsuite/49643
* g++.dg/rtti/anon-ns1.C: Allow for null-termination.


Property changes on: .
___
Modified: svnmerge-integrated
   - /branches/gcc-4_6-branch:1-175849
/branches/google/integration:1-170988,173923,173959
/branches/google/main:1-175816
   + /branches/gcc-4_6-branch:1-175849,175881
/branches/google/integration:1-170988,173923,173959
/branches/google/main:1-175816
Modified: svn:mergeinfo
   Merged /branches/gcc-4_6-branch:r175881


Property changes on: libjava/classpath
___
Modified: svn:mergeinfo
   Merged /branches/gcc-4_6-branch/libjava/classpath:r175881


Property changes on: gcc/testsuite/gcc.target/powerpc/ppc-round.c
___
Modified: svn:mergeinfo
   Merged 
/branches/gcc-4_6-branch/gcc/testsuite/gcc.target/powerpc/ppc-round.c:r175881

Index: gcc/testsuite/g++.dg/rtti/anon-ns1.C
===
--- gcc/testsuite/g++.dg/rtti/anon-ns1.C(revision 176259)
+++ gcc/testsuite/g++.dg/rtti/anon-ns1.C(working copy)
@@ -2,7 +2,7 @@
 // The typeinfo name for A should start with * so we compare
 // it by address rather than contents.

-// { dg-final { scan-assembler "\"\*N\[^\"\]+1AE\"" } }
+// { dg-final { scan-assembler "\"\*N\[^\"\]+1AE" } }

 namespace
 {

Property changes on: gcc/config/rs6000/rs6000.c
___
Modified: svn:mergeinfo
   Merged /branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c:r175881


Property changes on: gcc/config/rs6000/rs6000.h
___
Modified: svn:mergeinfo
   Merged /branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.h:r175881



On Wed, Jul 13, 2011 at 7:37 PM, Diego Novillo  wrote:
> On Wed, Jul 13, 2011 at 03:12, Guozhi Wei  wrote:
>> Hi
>>
>> This patch fixes a testing error on arm backend. It has been tested on both
>> x86 and arm target with following commands.
>>
>> make check-g++ RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a 
>> dg.exp=anon-ns1.C"
>> make check-g++ RUNTESTFLAGS="dg.exp=anon-ns1.C"
>
> Carrot, did you backport this patch with svnmerge.py?
>
>
> Thanks.  Diego.
>


[testcase, arm] Adjust the negative offset of fp memory access in vfp-1.c

2011-07-20 Thread Carrot Wei
Hi

The patch r169271 conservatively limits the offset of fp memory access to
(-256..1024), but didn't adjust the related test case, so vfp-1.c fails in
thumb2 mode after the patch. This patch modifies test case vfp-1.c accordingly.

Tested with
make check-gcc RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
arm.exp=vfp-1.c"

OK for trunk and 4.6 branch?

thanks
Carrot


2011-07-20  Wei Guozhi  

* gcc.target/arm/vfp-1.c (test_ldst): Adjust negative offset.


Index: gcc.target/arm/vfp-1.c
===
--- gcc.target/arm/vfp-1.c  (revision 176495)
+++ gcc.target/arm/vfp-1.c  (working copy)
@@ -127,13 +127,13 @@ void test_convert () {

 void test_ldst (float f[], double d[]) {
   /* { dg-final { scan-assembler "flds.+ \\\[r0, #1020\\\]" } } */
-  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-1020\\\]" } } */
+  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-252\\\]" } } */
   /* { dg-final { scan-assembler "add.+ r0, #1024" } } */
   /* { dg-final { scan-assembler "fsts.+ \\\[r0, #0\\\]\n" } } */
-  f[256] = f[255] + f[-255];
+  f[256] = f[255] + f[-63];

   /* { dg-final { scan-assembler "fldd.+ \\\[r1, #1016\\\]" } } */
-  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-1016\\\]" } } */
+  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-248\\\]" } } */
   /* { dg-final { scan-assembler "fstd.+ \\\[r1, #256\\\]" } } */
-  d[32] = d[127] + d[-127];
+  d[32] = d[127] + d[-31];
 }


Re: [testcase, arm] Adjust the negative offset of fp memory access in vfp-1.c

2011-07-20 Thread Carrot Wei
Oops, the ChangeLog should be

2011-07-20  Wei Guozhi  

* gcc.target/arm/vfp-1.c (test_ldst): Adjust negative offset.


thanks
Carrot

On Wed, Jul 20, 2011 at 4:30 PM, Carrot Wei  wrote:
> Hi
>
> The patch r169271 conservatively limits the offset of fp memory access to
> (-256..1024), but didn't adjust the related test case, so vfp-1.c fails in
> thumb2 mode after the patch. This patch modifies test case vfp-1.c 
> accordingly.
>
> Tested with
> make check-gcc RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
> arm.exp=vfp-1.c"
>
> OK for trunk and 4.6 branch?
>
> thanks
> Carrot
>
>
> 2011-07-20  Wei Guozhi  
>
>        * gcc.target/arm/vfp-1.c (test_ldst): Adjust negative offset.
>
>
> Index: gcc.target/arm/vfp-1.c
> ===
> --- gcc.target/arm/vfp-1.c      (revision 176495)
> +++ gcc.target/arm/vfp-1.c      (working copy)
> @@ -127,13 +127,13 @@ void test_convert () {
>
>  void test_ldst (float f[], double d[]) {
>   /* { dg-final { scan-assembler "flds.+ \\\[r0, #1020\\\]" } } */
> -  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-1020\\\]" } } */
> +  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-252\\\]" } } */
>   /* { dg-final { scan-assembler "add.+ r0, #1024" } } */
>   /* { dg-final { scan-assembler "fsts.+ \\\[r0, #0\\\]\n" } } */
> -  f[256] = f[255] + f[-255];
> +  f[256] = f[255] + f[-63];
>
>   /* { dg-final { scan-assembler "fldd.+ \\\[r1, #1016\\\]" } } */
> -  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-1016\\\]" } } */
> +  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-248\\\]" } } */
>   /* { dg-final { scan-assembler "fstd.+ \\\[r1, #256\\\]" } } */
> -  d[32] = d[127] + d[-127];
> +  d[32] = d[127] + d[-31];
>  }
>


[PATCH] PR49799: Don't generate illegal bit field extraction instruction

2011-07-28 Thread Carrot Wei
Hi

In function combine.c:make_compound_operation, it tries to transforms the
expression
 (ashiftrt (ashift foo C1) C2) with C2 >= C1
into SIGN_EXTRACT.

It works pretty well in usual cases. But for the test case in PR49799, there is
an expression
 (X << (tmp-1)) >> 16
tmp is an uninitialized variable, only after init-regs pass, it is set to 0.
Then after several successful combine, it will see following expression

(ashiftrt:SI (ashift:SI (reg:SI 145 [ *K_2(D) ])
(const_int -1 [0x]))
(const_int 16 [0x10]))

and change it to an illegal bit field extraction instruction(sbfx).

Add a check to ensure the bit field is valid before applying the change, so the
wrong sbfx will not be generated.

Bootstrapped and regtested on x86_64-unknown-linux-gnu.
Regtested on arm qemu.

OK for trunk and 4.6?

thanks
Carrot


ChangeLog:
2011-07-28  Wei Guozhi  

PR rtl-optimization/49799
* combine.c (make_compound_operation): Check if the bit field is valid
before change it to bit field extraction.


Index: gcc/combine.c
===
--- gcc/combine.c   (revision 176733)
+++ gcc/combine.c   (working copy)
@@ -7787,6 +7787,7 @@ make_compound_operation (rtx x, enum rtx
  && GET_CODE (lhs) == ASHIFT
  && CONST_INT_P (XEXP (lhs, 1))
  && INTVAL (rhs) >= INTVAL (XEXP (lhs, 1))
+ && INTVAL (XEXP (lhs, 1)) >= 0
  && INTVAL (rhs) < mode_width)
{
  new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);


Re: [PATCH] PR49799: Don't generate illegal bit field extraction instruction

2011-07-28 Thread Carrot Wei
Test case added.

Tested with
make check-gcc RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
arm.exp=pr49799.c"
make check-gcc RUNTESTFLAGS="--target_board=arm-sim/arch=armv7-a
arm.exp=pr49799.c"

It fails without this patch and passes with this patch.

OK for trunk and 4.6 now?

thanks
Carrot

ChangeLog:
2011-07-28  Wei Guozhi  

PR rtl-optimization/49799
* combine.c (make_compound_operation): Check if the bit field is valid
before change it to bit field extraction.


ChangeLog:
2011-07-28  Wei Guozhi  

PR rtl-optimization/49799
* pr49799.c : New test case.


Index: gcc/combine.c
===
--- gcc/combine.c   (revision 176733)
+++ gcc/combine.c   (working copy)
@@ -7787,6 +7787,7 @@ make_compound_operation (rtx x, enum rtx
  && GET_CODE (lhs) == ASHIFT
  && CONST_INT_P (XEXP (lhs, 1))
  && INTVAL (rhs) >= INTVAL (XEXP (lhs, 1))
+ && INTVAL (XEXP (lhs, 1)) >= 0
  && INTVAL (rhs) < mode_width)
{
  new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);


Index: pr49799.c
===
--- pr49799.c   (revision 0)
+++ pr49799.c   (revision 0)
@@ -0,0 +1,25 @@
+/* PR rtl-optimization/49799 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -w -march=armv7-a" } */
+
+static __inline int bar(int a)
+{
+int tmp;
+
+if (a <= 0) a ^= 0x;
+
+return tmp - 1;
+}
+
+void foo(short *K)
+{
+short tmp;
+short *pptr, P[14];
+
+pptr = P;
+tmp = bar(*K);
+*pptr = (*K << tmp) >> 16;
+
+if (*P < tmp)
+*K++ = 0;
+}


On Thu, Jul 28, 2011 at 4:04 PM, Jakub Jelinek  wrote:
> On Thu, Jul 28, 2011 at 03:38:07PM +0800, Carrot Wei wrote:
>> OK for trunk and 4.6?
>>
>> ChangeLog:
>> 2011-07-28  Wei Guozhi  
>>
>>         PR rtl-optimization/49799
>>         * combine.c (make_compound_operation): Check if the bit field is 
>> valid
>>         before change it to bit field extraction.
>
> Looks good to me, handling SHIFT_COUNT_TRUNCATED here isn't IMHO necessary
> and the checking whether shift count is in the right range matches other rtx
> simplifications (e.g. in simplify-rtx.c).
> Though, you should add a testcase, probably
> /* PR rtl-optimization/49799 */
> /* { dg-do assemble } */
> /* { dg-options "-O2 -w" } */
>
> plus not sure if for arm you don't want to force this -march=armv7-a
> into dg-options too or just leave it as is.
>
>        Jakub
>


Re: [PATCH] PR49799: Don't generate illegal bit field extraction instruction

2011-07-28 Thread Carrot Wei
According to Richard, -march=armv7-a is not required.

So the finally installed is:

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 176910)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2011-07-29  Wei Guozhi  
+
+   PR rtl-optimization/49799
+   * combine.c (make_compound_operation): Check if the bit field is valid
+   before change it to bit field extraction.
+
 2011-07-29  Bernd Schmidt  

PR rtl-optimization/49891
Index: gcc/testsuite/gcc.dg/pr49799.c
===
--- gcc/testsuite/gcc.dg/pr49799.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr49799.c  (revision 0)
@@ -0,0 +1,25 @@
+/* PR rtl-optimization/49799 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -w" } */
+
+static __inline int bar(int a)
+{
+int tmp;
+
+if (a <= 0) a ^= 0x;
+
+return tmp - 1;
+}
+
+void foo(short *K)
+{
+short tmp;
+short *pptr, P[14];
+
+pptr = P;
+tmp = bar(*K);
+*pptr = (*K << tmp) >> 16;
+
+if (*P < tmp)
+*K++ = 0;
+}
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 176910)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2011-07-29  Wei Guozhi  
+
+   PR rtl-optimization/49799
+   * gcc.dg/pr49799.c: New test case.
+
 2011-07-22  Sebastian Pop  

PR middle-end/48648
Index: gcc/combine.c
===
--- gcc/combine.c   (revision 176910)
+++ gcc/combine.c   (working copy)
@@ -7787,6 +7787,7 @@ make_compound_operation (rtx x, enum rtx
  && GET_CODE (lhs) == ASHIFT
  && CONST_INT_P (XEXP (lhs, 1))
  && INTVAL (rhs) >= INTVAL (XEXP (lhs, 1))
+ && INTVAL (XEXP (lhs, 1)) >= 0
  && INTVAL (rhs) < mode_width)
{
  new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);

thanks a lot.
Carrot

On Thu, Jul 28, 2011 at 4:47 PM, Jakub Jelinek  wrote:
> On Thu, Jul 28, 2011 at 04:40:53PM +0800, Carrot Wei wrote:
>> ChangeLog:
>> 2011-07-28  Wei Guozhi  
>>
>>         PR rtl-optimization/49799
>>         * pr49799.c : New test case.
>
> Space shouldn't be between .c and :.  And the filename should be
> relative to gcc/testsuite/ dir, so either gcc.target/arm/pr49799.c, or
> better gcc.dg/pr49799.c.
>
> Putting the testcase just into gcc.target/arm means it won't be tested
> on other targets, while there is nothing arm specific about the testcase
> except that you force -march in dg-options for arm.
> You can do that with
> /* PR rtl-optimization/49799 */
> /* { dg-do assemble } */
> /* { dg-options "-O2 -w" } */
> /* { dg-options "-O2 -w -march=armv7-a" { target arm*-*-* } } */
> or similar.
>
> Ok with those changes.
>
>        Jakub
>


Re: [testcase, arm] Adjust the negative offset of fp memory access in vfp-1.c

2011-07-31 Thread Carrot Wei
Ping

On Wed, Jul 20, 2011 at 4:33 PM, Carrot Wei  wrote:
> Oops, the ChangeLog should be
>
> 2011-07-20  Wei Guozhi  
>
>        * gcc.target/arm/vfp-1.c (test_ldst): Adjust negative offset.
>
>
> thanks
> Carrot
>
> On Wed, Jul 20, 2011 at 4:30 PM, Carrot Wei  wrote:
>> Hi
>>
>> The patch r169271 conservatively limits the offset of fp memory access to
>> (-256..1024), but didn't adjust the related test case, so vfp-1.c fails in
>> thumb2 mode after the patch. This patch modifies test case vfp-1.c 
>> accordingly.
>>
>> Tested with
>> make check-gcc RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
>> arm.exp=vfp-1.c"
>>
>> OK for trunk and 4.6 branch?
>>
>> thanks
>> Carrot
>>
>>
>> 2011-07-20  Wei Guozhi  
>>
>>        * gcc.target/arm/vfp-1.c (test_ldst): Adjust negative offset.
>>
>>
>> Index: gcc.target/arm/vfp-1.c
>> ===
>> --- gcc.target/arm/vfp-1.c      (revision 176495)
>> +++ gcc.target/arm/vfp-1.c      (working copy)
>> @@ -127,13 +127,13 @@ void test_convert () {
>>
>>  void test_ldst (float f[], double d[]) {
>>   /* { dg-final { scan-assembler "flds.+ \\\[r0, #1020\\\]" } } */
>> -  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-1020\\\]" } } */
>> +  /* { dg-final { scan-assembler "flds.+ \\\[r0, #-252\\\]" } } */
>>   /* { dg-final { scan-assembler "add.+ r0, #1024" } } */
>>   /* { dg-final { scan-assembler "fsts.+ \\\[r0, #0\\\]\n" } } */
>> -  f[256] = f[255] + f[-255];
>> +  f[256] = f[255] + f[-63];
>>
>>   /* { dg-final { scan-assembler "fldd.+ \\\[r1, #1016\\\]" } } */
>> -  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-1016\\\]" } } */
>> +  /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-248\\\]" } } */
>>   /* { dg-final { scan-assembler "fstd.+ \\\[r1, #256\\\]" } } */
>> -  d[32] = d[127] + d[-127];
>> +  d[32] = d[127] + d[-31];
>>  }
>>
>


Re: [PATCH, ARM] PR47855 Compute attr "length" for some thumb2 insns, 2/3

2011-04-14 Thread Carrot Wei
On Fri, Apr 8, 2011 at 6:51 PM, Ramana Radhakrishnan
 wrote:
> On 08/04/11 10:57, Carrot Wei wrote:
>>
>> Hi
>>
>> This is the second part of the fixing for
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47855
>>
>> This patch contains the length computation for insn patterns
>> "*arm_movqi_insn"
>> and "*arm_addsi3". Since the alternatives and encodings are much more
>> complex,
>> the attribute length is computed in separate C functions.

> Sorry, no. This is potentially a maintenance pain. It hardcodes alternatives
> from a pattern elsewhere in the C file. I don't like doing this unless we
> have to with the sync primitives or with push_multi. In this case I'm not
> convinced we need such functions in the .c file.
>
> Why can't we use the "enabled" attribute here with appropriate constraints
> for everything other than the memory cases (even there we might be able to
> invent some new constraints) ?
>
> Also a note about programming style. There are the helper macros like REG_P,
> CONST_INT_P and MEM_P which remove the necessity for checks like
>
> GET_CODE (x) == y where y E { REG, CONST_INT, MEM}

Hi Ramana

As you suggested I created several new constraints, and use the
"enabled" attribute to split the current alternatives in this new
patch. It has been tested on arm qemu without regression.

thanks
Carrot


ChangeLog:
2011-04-14  Wei Guozhi  

PR target/47855
* config/arm/arm-protos.h (thumb1_legitimate_address_p): New prototype.
* config/arm/arm.c (thumb1_legitimate_address_p): Remove the static
linkage.
* config/arm/constraints.md (Pq, Pr, Pz, Uu): New constraints.
(Pd): Also apply to thumb2.
* config/arm/arm.md (*arm_movqi_insn): Compute attr "length".
(*arm_addsi3): Change "length" computation by splitting alternatives.


Index: arm.c
===
--- arm.c   (revision 172353)
+++ arm.c   (working copy)
@@ -5772,7 +5772,7 @@ thumb1_index_register_rtx_p (rtx x, int
addresses based on the frame pointer or arg pointer until the
reload pass starts.  This is so that eliminating such addresses
into stack based ones won't produce impossible code.  */
-static int
+int
 thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
 {
   /* ??? Not clear if this is right.  Experiment.  */
Index: arm-protos.h
===
--- arm-protos.h(revision 172353)
+++ arm-protos.h(working copy)
@@ -58,6 +58,7 @@ extern bool arm_legitimize_reload_addres
   int);
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
+extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
Index: constraints.md
===
--- constraints.md  (revision 172353)
+++ constraints.md  (working copy)
@@ -30,12 +30,14 @@

 ;; The following multi-letter normal constraints have been used:
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz
-;; in Thumb-1 state: Pa, Pb, Pc, Pd
-;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px, Py
+;; in Thumb-1 state: Pa, Pb, Pc
+;; in Thumb-2 state: Pq, Pr, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
+;; in Thumb state: Pd

 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
+;; in Thumb state: Uu


 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -155,9 +157,23 @@
&& ival > 1020 && ival <= 1275")))

 (define_constraint "Pd"
-  "@internal In Thumb-1 state a constant in the range 0 to 7"
+  "@internal In Thumb state a constant in the range 0 to 7"
   (and (match_code "const_int")
-   (match_test "TARGET_THUMB1 && ival >= 0 && ival <= 7")))
+   (match_test "TARGET_THUMB && ival >= 0 && ival <= 7")))
+
+(define_constraint "Pq"
+  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
+   range 0-1020"
+  (and (match_code "const_int")
+   (match_test "TARGET_THUMB2
+   && ival >= 0 && ival <= 508 && (ival & 3) == 0")))
+
+(define_constraint "Pr"
+  "@internal In Thumb-2 state a constant that is a multiple of 4 in the
+   range 0-508"
+  (and (match_code "const_int")
+   (mat

Re: [PATCH, ARM] PR47855 Compute attr "length" for some thumb2 insns, 2/3

2011-04-15 Thread Carrot Wei
Hi Richard

Thank you for the detailed explanation. It sounds like an inherent
difficulty of rtl passes. Then the only opportunity is ldrb/strb
instructions because they never affect cc registers.

thanks
Carrot

On Fri, Apr 15, 2011 at 9:34 PM, Richard Earnshaw  wrote:
>
> On Thu, 2011-04-14 at 21:19 +0800, Carrot Wei wrote:
>> On Fri, Apr 8, 2011 at 6:51 PM, Ramana Radhakrishnan
>>  wrote:
>> > On 08/04/11 10:57, Carrot Wei wrote:
>> >>
>> >> Hi
>> >>
>> >> This is the second part of the fixing for
>> >>
>> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47855
>> >>
>> >> This patch contains the length computation for insn patterns
>> >> "*arm_movqi_insn"
>> >> and "*arm_addsi3". Since the alternatives and encodings are much more
>> >> complex,
>> >> the attribute length is computed in separate C functions.
>>
>> > Sorry, no. This is potentially a maintenance pain. It hardcodes 
>> > alternatives
>> > from a pattern elsewhere in the C file. I don't like doing this unless we
>> > have to with the sync primitives or with push_multi. In this case I'm not
>> > convinced we need such functions in the .c file.
>> >
>> > Why can't we use the "enabled" attribute here with appropriate constraints
>> > for everything other than the memory cases (even there we might be able to
>> > invent some new constraints) ?
>> >
>> > Also a note about programming style. There are the helper macros like 
>> > REG_P,
>> > CONST_INT_P and MEM_P which remove the necessity for checks like
>> >
>> > GET_CODE (x) == y where y E { REG, CONST_INT, MEM}
>>
>> Hi Ramana
>>
>> As you suggested I created several new constraints, and use the
>> "enabled" attribute to split the current alternatives in this new
>> patch. It has been tested on arm qemu without regression.
>>
>> thanks
>> Carrot
>
>
> Sorry, I don't think this approach can work.  Certainly not with the way
> the compiler currently works, and especially for mov and add insns.
>
> These instructions are only 2 bytes long if either:
> 1) They clobber the condition code register or
> 2) They occur inside an IT block.
>
> We can't tell either of these from the pattern, so you're
> underestimating the length of the instruction in some circumstances by
> claiming that they are only 2 bytes long.  That /will/ lead to broken
> code someday.
>
> We can't add potential clobbers to mov and add patterns because that
> will break reload which relies on these patterns being simple-set insns
> with no added baggage.  It *might* be possible to add clobbers to other
> operations, but that will then most-likely upset instruction scheduling
> (I think the scheduler treats two insns that clobber the same hard reg
> as being strongly ordered).  Putting in the clobber too early will
> certainly affect cond-exec generation.
>
> In short, I'm not aware of a simple way to address this problem so that
> we get accurate length information, but minimal impact on other passes
> in the compiler.
>
> R.
>
>
>


Re: [PATCH, ARM] PR47855 Compute attr "length" for some thumb2 insns, 2/3

2011-04-18 Thread Carrot Wei
On Mon, Apr 18, 2011 at 9:33 PM, Richard Earnshaw  wrote:
>
> On Sat, 2011-04-16 at 12:34 +0800, Carrot Wei wrote:
>> Hi Richard
>>
>> Thank you for the detailed explanation. It sounds like an inherent
>> difficulty of rtl passes. Then the only opportunity is ldrb/strb
>> instructions because they never affect cc registers.
>
> There are also some comparison operations that are also known to be 2
> bytes (because they are known to set the condition codes).  But yes, the
> scope here is quite limited.
>
> R.

So now this version only computes the correct length of ldrd/strb in insn
arm_movqi_insn. Tested on arm qemu without regression.

thanks
Carrot


ChangeLog:
2011-04-18  Wei Guozhi  

PR target/47855
* config/arm/arm-protos.h (thumb1_legitimate_address_p): New prototype.
* config/arm/arm.c (thumb1_legitimate_address_p): Remove the static
linkage.
* config/arm/constraints.md (Uu): New constraint.
* config/arm/arm.md (*arm_movqi_insn): Compute attr "length".


Index: arm.c
===
--- arm.c   (revision 172353)
+++ arm.c   (working copy)
@@ -5772,7 +5772,7 @@ thumb1_index_register_rtx_p (rtx x, int
addresses based on the frame pointer or arg pointer until the
reload pass starts.  This is so that eliminating such addresses
into stack based ones won't produce impossible code.  */
-static int
+int
 thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
 {
   /* ??? Not clear if this is right.  Experiment.  */
Index: arm-protos.h
===
--- arm-protos.h(revision 172353)
+++ arm-protos.h(working copy)
@@ -58,6 +58,7 @@ extern bool arm_legitimize_reload_addres
   int);
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
+extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
Index: constraints.md
===
--- constraints.md  (revision 172353)
+++ constraints.md  (working copy)
@@ -36,6 +36,7 @@
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
+;; in Thumb state: Uu


 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -332,6 +333,14 @@
  (and (match_code "mem")
   (match_test "REG_P (XEXP (op, 0))")))

+(define_memory_constraint "Uu"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding."
+ (and (match_code "mem")
+  (match_test "TARGET_THUMB
+  && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+  0)")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.

Index: arm.md
===
--- arm.md  (revision 172353)
+++ arm.md  (working copy)
@@ -5946,8 +5946,8 @@


 (define_insn "*arm_movqi_insn"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m")
-   (match_operand:QI 1 "general_operand" "rI,K,m,r"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,l,Uu,r,m")
+   (match_operand:QI 1 "general_operand" "rI,K,Uu,l,m,r"))]
   "TARGET_32BIT
&& (   register_operand (operands[0], QImode)
|| register_operand (operands[1], QImode))"
@@ -5955,10 +5955,14 @@
mov%?\\t%0, %1
mvn%?\\t%0, #%B1
ldr%(b%)\\t%0, %1
+   str%(b%)\\t%1, %0
+   ldr%(b%)\\t%0, %1
str%(b%)\\t%1, %0"
-  [(set_attr "type" "*,*,load1,store1")
-   (set_attr "insn" "mov,mvn,*,*")
-   (set_attr "predicable" "yes")]
+  [(set_attr "type" "*,*,load1,store1,load1,store1")
+   (set_attr "insn" "mov,mvn,*,*,*,*")
+   (set_attr "predicable" "yes")
+   (set_attr "arch" "any,any,t2,t2,any,any")
+   (set_attr "length" "4,4,2,2,4,4")]
 )

 (define_insn "*thumb1_movqi_insn"


Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-19 Thread Carrot Wei
On Tue, Apr 19, 2011 at 5:57 PM, Richard Guenther
 wrote:
> On Tue, Apr 19, 2011 at 11:41 AM, Guozhi Wei  wrote:
>> Reload pass tries to determine the stack frame, so it needs to check the
>> push/pop lr optimization opportunity. One of the criteria is if there is any
>> far jump inside the function. Unfortunately at this time gcc can't decide 
>> each
>> instruction's length and basic block layout, so it can't know the offset of
>> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> in a function will prevent this push/pop lr optimization.
>>
>> To enable the push/pop lr optimization in reload pass, I compute the possible
>> maximum length of the function body. If the length is not large enough, far
>> jump is not necessary, so we can safely do push/pop lr optimization.
>
> What about hot/cold partitioning?  That might cause jumps to different
> sections.
>
> Richard.
>

The hot/cold partitioning is disabled in arm backend.
http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00671.html.

thanks
Carrot


Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-20 Thread Carrot Wei
On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw  wrote:
>
> On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
>> Reload pass tries to determine the stack frame, so it needs to check the
>> push/pop lr optimization opportunity. One of the criteria is if there is any
>> far jump inside the function. Unfortunately at this time gcc can't decide 
>> each
>> instruction's length and basic block layout, so it can't know the offset of
>> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> in a function will prevent this push/pop lr optimization.
>>
>> To enable the push/pop lr optimization in reload pass, I compute the possible
>> maximum length of the function body. If the length is not large enough, far
>> jump is not necessary, so we can safely do push/pop lr optimization.
>>
>> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
>>
>> This patch is for google/main.
>>
>> 2011-04-19  Guozhi Wei  
>>
>>       Google ref 40255.
>>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>>       (estimate_function_length): New function.
>>       (thumb_far_jump_used_p): No far jump is needed in short function.
>>
>
> Setting aside for the moment Richi's issue with hot/cold sections, this
> isn't safe.  Firstly get_attr_length() doesn't return the worst case
> length; and secondly, it doesn't take into account the size of reload
> insns that are still on the reloads stack -- these are only emitted
> right at the end of the reload pass.  Both of these would need to be
> addressed before this can be safely done.
>
> It's worth noting here that in the dim and distant past we used to try
> to estimate the size of the function and eliminate redundant saves of
> R14, but the code had to be removed because it was too fragile; but it
> looks like some vestiges of the code are still in the compiler.
>
> A slightly less optimistic approach, but one that is much safer is to
> scan the function after reload has completed and see if we can avoid
> having to push LR.  We can do this if:
>
I guess "less optimistic" is relative to the ideal optimization
situation, I believe it is still much better than current result. Do
you think if arm_reorg() is appropriate place to do this?

thanks
Carrot


Re: [google] remove redundant push {lr} for -mthumb (issue4441050)

2011-04-20 Thread Carrot Wei
I will try this method for trunk later.

thanks
Carrot

On Wed, Apr 20, 2011 at 4:48 PM, Richard Earnshaw  wrote:
>
> On Wed, 2011-04-20 at 16:26 +0800, Carrot Wei wrote:
>> On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw  wrote:
>> >
>> > On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
>> >> Reload pass tries to determine the stack frame, so it needs to check the
>> >> push/pop lr optimization opportunity. One of the criteria is if there is 
>> >> any
>> >> far jump inside the function. Unfortunately at this time gcc can't decide 
>> >> each
>> >> instruction's length and basic block layout, so it can't know the offset 
>> >> of
>> >> a jump. To be conservative it assumes every jump is a far jump. So any 
>> >> jump
>> >> in a function will prevent this push/pop lr optimization.
>> >>
>> >> To enable the push/pop lr optimization in reload pass, I compute the 
>> >> possible
>> >> maximum length of the function body. If the length is not large enough, 
>> >> far
>> >> jump is not necessary, so we can safely do push/pop lr optimization.
>> >>
>> >> Tested on arm qemu with options -march=armv5te -mthumb, without 
>> >> regression.
>> >>
>> >> This patch is for google/main.
>> >>
>> >> 2011-04-19  Guozhi Wei  
>> >>
>> >>       Google ref 40255.
>> >>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>> >>       (estimate_function_length): New function.
>> >>       (thumb_far_jump_used_p): No far jump is needed in short function.
>> >>
>> >
>> > Setting aside for the moment Richi's issue with hot/cold sections, this
>> > isn't safe.  Firstly get_attr_length() doesn't return the worst case
>> > length; and secondly, it doesn't take into account the size of reload
>> > insns that are still on the reloads stack -- these are only emitted
>> > right at the end of the reload pass.  Both of these would need to be
>> > addressed before this can be safely done.
>> >
>> > It's worth noting here that in the dim and distant past we used to try
>> > to estimate the size of the function and eliminate redundant saves of
>> > R14, but the code had to be removed because it was too fragile; but it
>> > looks like some vestiges of the code are still in the compiler.
>> >
>> > A slightly less optimistic approach, but one that is much safer is to
>> > scan the function after reload has completed and see if we can avoid
>> > having to push LR.  We can do this if:
>> >
>> I guess "less optimistic" is relative to the ideal optimization
>> situation, I believe it is still much better than current result. Do
>> you think if arm_reorg() is appropriate place to do this?
>>
>
> Making the decision in a single pass would certainly be the best
> approach; and arm_reorg is certainly going to come after all other major
> code re-arrangements.  Indeed, you should probably do this after the
> minipool placement so that you can be sure that these don't bulk up the
> body of the function too much.
>
> As you are doing the elimination late on in the compilation you can do a
> better job of estimation by calling shorten_branches() to work out the
> precise length of each insn.  Then you can simply scan over the insns to
> work out if there is a branch that still needs r14.
>
> R.
>
>
>
>


Re: [google] Use R_ARM_GOT_PREL to simplify global address loading from GOT (issue4433079)

2011-04-28 Thread Carrot Wei
On Thu, Apr 28, 2011 at 10:08 PM,   wrote:
> I only have some stylistic comments for this patch.  The new pass looks
> OK to me, but I do not know this area well enough to do a good review.
>
> In your ChangeLog entries, please remove the directory prefix from the
> file names.
>
done.

>
> http://codereview.appspot.com/4433079/diff/1/gcc/hooks.c
> File gcc/hooks.c (right):
>
> http://codereview.appspot.com/4433079/diff/1/gcc/hooks.c#newcode287
> gcc/hooks.c:287: return NULL;
> +hook_rtx_void_null (void)
> +{
> +  return NULL;
>
> s/NULL/NULL_RTX/
>
done.

> http://codereview.appspot.com/4433079/diff/1/gcc/simplify-got.c
> File gcc/simplify-got.c (right):
>
> http://codereview.appspot.com/4433079/diff/1/gcc/simplify-got.c#newcode83
> gcc/simplify-got.c:83: return (optimize > 0) &&
> targetm.got_access.get_pic_reg ();
> +{
> +  return (optimize > 0) && targetm.got_access.get_pic_reg ();
>
> s/(optimize > 0)/optimize/
>
done.

> http://codereview.appspot.com/4433079/diff/1/gcc/simplify-got.c#newcode118
> gcc/simplify-got.c:118: if (!(set && (SET_DEST (set) == pic_reg)))
> +         /* If an insn both set and use pic_reg, it is in the process of
> +            constructing the value of pic_reg. We should also ignore it.
>  */
> +         rtx set = single_set (insn);
> +         if (!(set && (SET_DEST (set) == pic_reg)))
>
> Extra ( ) around SET_DEST are not needed.
>
done.

> http://codereview.appspot.com/4433079/
>

The revised patch is attached.

thanks
Carrot


patch.diff
Description: Binary data


Re: [google] Use R_ARM_GOT_PREL to simplify global address loading from GOT (issue4433079)

2011-04-28 Thread Carrot Wei
Yes, after porting it to google/main.

Carrot

On Thu, Apr 28, 2011 at 10:26 PM, Diego Novillo  wrote:
> Will you be proposing this patch for trunk as well?
>
>
> Diego.
>


Re: [google] Use R_ARM_GOT_PREL to simplify global address loading from GOT (issue4433079)

2011-04-28 Thread Carrot Wei
On Fri, Apr 29, 2011 at 11:17 AM, Carrot Wei  wrote:
> On Thu, Apr 28, 2011 at 10:08 PM,   wrote:
>> I only have some stylistic comments for this patch.  The new pass looks
>> OK to me, but I do not know this area well enough to do a good review.
>>
>> In your ChangeLog entries, please remove the directory prefix from the
>> file names.
>>
> done.
>
>>
>> http://codereview.appspot.com/4433079/diff/1/gcc/hooks.c
>> File gcc/hooks.c (right):
>>
>> http://codereview.appspot.com/4433079/diff/1/gcc/hooks.c#newcode287
>> gcc/hooks.c:287: return NULL;
>> +hook_rtx_void_null (void)
>> +{
>> +  return NULL;
>>
>> s/NULL/NULL_RTX/
>>
> done.
>
Oops. File hooks.c doesn't include rtl.h, so either rtl.h should be
included, or return NULL instead, like other functions in this file.


Re: [PATCH, ARM] PR47855 Compute attr length for thumb2 insns, 3/3 (issue4475042)

2011-05-06 Thread Carrot Wei
On Thu, May 5, 2011 at 5:42 PM, Richard Earnshaw  wrote:
>
> On Thu, 2011-05-05 at 14:51 +0800, Guozhi Wei wrote:
> > Hi
> >
> > This is the third part of the fixing for
> >
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47855
> >
> > This patch contains the length computation/refinement for insn patterns
> > "*thumb2_movsi_insn", "*thumb2_cbz" and "*thumb2_cbnz".
> >
> > At the same time this patch revealed two bugs. The first is the maximum 
> > offset
> > of cbz/cbnz, it should be 126, but it is 128 in patterns "*thumb2_cbz" and
> > "*thumb2_cbnz". The second is that only 2-register form of shift 
> > instructions
> > can be 16 bit, but 3-register form is allowed in "*thumb2_shiftsi3_short" 
> > and
> > related peephole2. The fix is also contained in this patch.
> >
> > The patch has been tested on arm qemu.
> >
> > thanks
> > Carrot
> >
> >
> > 2011-05-05  Guozhi Wei  
> >
> >       PR target/47855
> >       * config/arm/thumb2.md (thumb2_movsi_insn): Add length addtribute.
> >       (thumb2_shiftsi3_short and peephole2): Remove 3-register case.
> >       (thumb2_cbz): Refine length computation.
> >       (thumb2_cbnz): Likewise.
> >
>
> Hmm, although these changes are all related to length calculations, they
> are really three patches that are unrelated to each other.  It would be
> easier to review this if they were kept separate.
>
> 1) thumb2_shiftsi3_short
> This appears to be a straight bug.  We are putting out a 32-bit
> instruction when we are claiming it to be only 16 bits.  This is OK.
>
> 2) thumb2_movsi_insn
> There are two things here.
> a) Thumb2 has a 16-bit move instruction for all core
> register-to-register transfers, so the separation of alternatives 1 and
> 2 is unnecessary -- just code these as "rk".

done.

>
> b) The ldm form does not support unaligned memory accesses.  I'm aware
> that work is being done to add unaligned support to GCC for ARM, so I
> need to find out whether this patch will interfere with those changes.
> I'll try to find out what the situation is here and get back to you.
>
> 3) thumb2_cbz and thumb2_cbnz
> The range calculations look wrong here.  Remember that the 'pc' as far
> as GCC is concerned is the address of the start of the insn.  So for a
> backwards branch you need to account for all the bytes in the insn
> pattern that occur before the branch instruction itself, and secondly
> you also have to remember that the 'pc' that the CPU uses is the address
> of the branch instruction plus 4.  All these conspire to reduce the
> backwards range of a short branch to several bytes less than the 256
> that you currently have coded.

The usage of 'pc' is more complex than I thought. I understood it after
reading the comment in file arm.md. And the description at
http://gcc.gnu.org/onlinedocs/gccint/Insn-Lengths.html#Insn-Lengths is not
right for forward branch cases. Now the ranges are modified accordingly.

It has been tested on arm qemu in thumb2 mode.

thanks
Carrot


2011-05-06  Guozhi Wei  

PR target/47855
* config/arm/thumb2.md (thumb2_movsi_insn): Add length addtribute.
(thumb2_shiftsi3_short and peephole2): Remove 3-register case.
(thumb2_cbz): Refine length computation.
(thumb2_cbnz): Likewise.


Index: config/arm/thumb2.md
===
--- config/arm/thumb2.md(revision 173350)
+++ config/arm/thumb2.md(working copy)
@@ -165,23 +165,46 @@
 ;; regs.  The high register alternatives are not taken into account when
 ;; choosing register preferences in order to reflect their expense.
 (define_insn "*thumb2_movsi_insn"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,l ,*hk,m,*m")
-   (match_operand:SI 1 "general_operand"  "rk ,I,K,j,mi,*mi,l,*hk"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,l ,*rk,Uu,*m")
+   (match_operand:SI 1 "general_operand"  "rk ,I,K,j,Uu,*mi,l ,*rk"))]
   "TARGET_THUMB2 && ! TARGET_IWMMXT
&& !(TARGET_HARD_FLOAT && TARGET_VFP)
&& (   register_operand (operands[0], SImode)
|| register_operand (operands[1], SImode))"
-  "@
-   mov%?\\t%0, %1
-   mov%?\\t%0, %1
-   mvn%?\\t%0, #%B1
-   movw%?\\t%0, %1
-   ldr%?\\t%0, %1
-   ldr%?\\t%0, %1
-   str%?\\t%1, %0
-   str%?\\t%1, %0"
+  "*
+  switch (which_alternative)
+{
+case 0: return \"mov%?\\t%0, %1\";
+case 1: return \"mov%?\\t%0, %1\";
+case 2: return \"mvn%?\\t%0, #%B1\";
+case 3: return \"movw%?\\t%0, %1\";
+
+case 4:
+  if (GET_CODE (XEXP (operands[1], 0)) == POST_INC)
+   {
+ operands[1] = XEXP (XEXP (operands[1], 0), 0);
+ return \"ldm%(ia%)\t%1!, {%0}\";
+   }
+  else
+   return \"ldr%?\\t%0, %1\";
+
+case 5: return \"ldr%?\\t%0, %1\";
+
+case 6:
+  if (GET_CODE (XEXP (operands[0], 0)) == POST_INC)
+   {
+ operands[0] = XEXP (XEXP (operands[0], 0), 0);
+ return \"stm%(ia%)\t%0!, {%1}\";
+   }
+  else
+   

[PING] 2 ARM patches

2011-05-15 Thread Carrot Wei
Hi

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01973.html
Use ldrd and strd to access two consecutive words

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00490.html
Compute attr length for thumb2 insns

thanks
Carrot


Re: [PATCH: PR target/46975] Replace 32 bit instructions with 16 bit instructions in thumb2

2011-05-16 Thread Carrot Wei
On Fri, Dec 17, 2010 at 8:18 PM, Richard Earnshaw  wrote:
>
> On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
>> Hi
>>
>> Compile the following c code with options -march=armv7-a -mthumb -Os
>>
>> int foo (int s)
>> {
>>     return s == 1;
>> }
>>
>> GCC 4.6 generates:
>>
>>  :
>>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>>    4:    4258          negs    r0, r3
>>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>>    a:    4770          bx    lr
>>
>> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and 
>> adcs
>> instead so they will be 16 bits.
>>
>
> This sequence already contains an instruction that sets the flags as a
> necessary part of the sequence.  Why doesn't it also generate
> flag-corrupting variants of the other two instructions when the
> registers selected are suitable?  It seems silly to force the compiler
> to do yet more work to clean up this code.
>

This revised patch uses new adc insn which simply clobber the CC
register. For the sub instruction I still use the CC setting form
since the existing pattern subsi3_compare does this.

Tested on arm qemu without regression.

thanks
Carrot

ChangeLog:
2011-05-16  Wei Guozhi  

        PR target/46975
        * config/arm/arm.md (*addsi3_carryin_compare0_): New pattern.
        (peephole2 for conditional move): Generate 16 bit instructions.

ChangeLog:
2010-05-16  Wei Guozhi  

        PR target/46975
        * gcc.target/arm/pr46975.c: New testcase.

Index: testsuite/gcc.target/arm/pr46975.c
===
--- testsuite/gcc.target/arm/pr46975.c  (revision 0)
+++ testsuite/gcc.target/arm/pr46975.c  (revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-mthumb -Os" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler "subs" } } */
+/* { dg-final { scan-assembler "adcs" } } */
+
+int foo (int s)
+{
+  return s == 1;
+}
Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 173770)
+++ config/arm/arm.md   (working copy)
@@ -985,6 +985,17 @@
  (const_string "alu_shift_reg")))]
 )

+(define_insn "*addsi3_carryin_clobercc_"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
+ (match_operand:SI 2 "arm_rhs_operand" "rI"))
+(LTUGEU:SI (reg: CC_REGNUM) (const_int 0
+   (clobber (reg:CC CC_REGNUM))]
+   "TARGET_32BIT"
+   "adc%.\\t%0, %1, %2"
+   [(set_attr "conds" "set")]
+)
+
 (define_expand "incscc"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 (plus:SI (match_operator:SI 2 "arm_comparison_operator"
@@ -8788,14 +8799,19 @@
  (set (match_dup 0) (const_int 1)))
(match_scratch:SI 3 "r")]
   "TARGET_32BIT"
-  [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
+  [(parallel
+[(set (reg:CC CC_REGNUM)
+ (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
(parallel
 [(set (reg:CC CC_REGNUM)
  (compare:CC (const_int 0) (match_dup 3)))
  (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
-   (set (match_dup 0)
-   (plus:SI (plus:SI (match_dup 0) (match_dup 3))
-(geu:SI (reg:CC CC_REGNUM) (const_int 0])
+   (parallel
+[(set (match_dup 0)
+ (plus:SI (plus:SI (match_dup 0) (match_dup 3))
+  (geu:SI (reg:CC CC_REGNUM) (const_int 0
+ (clobber (reg:CC CC_REGNUM))])])

 (define_insn "*cond_move"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")


Re: [PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words

2011-03-15 Thread Carrot Wei
The trunk is opened again, could any maintainers continue to review this patch?

thanks
Carrot

On Tue, Jan 18, 2011 at 10:59 PM, Carrot Wei  wrote:
> Ramana's method is to put the instruction output and counting in on place.
> So it's easy to keep them synchronized.
>
> My latest version of patch did the following modifications compared to
> the earlier version: Added support of arm ldrd/strd instructions. Added length
> attribute to insn patterns. Moved the insn patterns to file ldmstm.md.
>
> It has passed the dejagnu testing on arm qemu.
>
> thanks
> Carrot


Re: [PATCH: PR target/46975] Replace 32 bit instructions with 16 bit instructions in thumb2

2011-03-18 Thread Carrot Wei
Ping

On Sat, Dec 18, 2010 at 3:30 AM, Carrot Wei  wrote:
> On Fri, Dec 17, 2010 at 4:18 AM, Richard Earnshaw  wrote:
>>
>> On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
>>> Hi
>>>
>>> Compile the following c code with options -march=armv7-a -mthumb -Os
>>>
>>> int foo (int s)
>>> {
>>>     return s == 1;
>>> }
>>>
>>> GCC 4.6 generates:
>>>
>>>  :
>>>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>>>    4:    4258          negs    r0, r3
>>>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>>>    a:    4770          bx    lr
>>>
>>> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and 
>>> adcs
>>> instead so they will be 16 bits.
>>>
>>
>> This sequence already contains an instruction that sets the flags as a
>> necessary part of the sequence.  Why doesn't it also generate
>> flag-corrupting variants of the other two instructions when the
>
> That is the root cause of this problem. The old pattern simply didn't
> generate the flag setting version.
>
>> registers selected are suitable?  It seems silly to force the compiler
>> to do yet more work to clean up this code.
>>
>
> This patch doesn't force the compiler to do more work. It directly
> modifies the original peephole2 pattern to generate flag setting
> instructions as you suggested. And the new insn pattern is for
> instruction adcs which didn't exist previously.
>
> thanks
> Guozhi
>


[PATCH, ARM] PR47855 Compute attr "length" for some thumb2 insns

2011-03-26 Thread Carrot Wei
Hi

As described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47855, there are
many insn patterns don't compute attribute length correctly. This patch is the
first and simplest part of the fixing.

This patch has been tested on qemu.

thanks
Carrot


ChangeLog:
2011-03-26  Wei Guozhi  

PR target/47855
* config/arm/arm.md (arm_cmpsi_insn): Compute attr "length".
(arm_cond_branch): Likewise.
(arm_cond_branch_reversed): Likewise.
(arm_jump): Likewise.
(push_multi): Likewise.


Index: arm.md
===
--- arm.md  (revision 171337)
+++ arm.md  (working copy)
@@ -7115,7 +7115,18 @@
   "@
cmp%?\\t%0, %1
cmn%?\\t%0, #%n1"
-  [(set_attr "conds" "set")]
+  [(set_attr "conds" "set")
+   (set (attr "length")
+ (if_then_else
+   (and (and (ne (symbol_ref "TARGET_THUMB2") (const_int 0))
+(eq (symbol_ref "which_alternative") (const_int 0)))
+   (ior (ne (symbol_ref "REG_P (operands[1])") (const_int 0))
+   (and (ne (symbol_ref "CONST_INT_P (operands[1])") (const_int 0))
+(and (ge (symbol_ref "INTVAL (operands[1])") (const_int 0))
+ (le (symbol_ref "INTVAL (operands[1])")
+ (const_int 255))
+   (const_int 2)
+   (const_int 4)))]
 )

 (define_insn "*cmpsi_shiftsi"
@@ -7286,7 +7297,14 @@
   return \"b%d1\\t%l0\";
   "
   [(set_attr "conds" "use")
-   (set_attr "type" "branch")]
+   (set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else
+  (and (ne (symbol_ref "TARGET_THUMB2") (const_int 0))
+   (and (ge (minus (match_dup 0) (pc)) (const_int -250))
+(le (minus (match_dup 0) (pc)) (const_int 256
+  (const_int 2)
+  (const_int 4)))]
 )

 (define_insn "*arm_cond_branch_reversed"
@@ -7305,7 +7323,14 @@
   return \"b%D1\\t%l0\";
   "
   [(set_attr "conds" "use")
-   (set_attr "type" "branch")]
+   (set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else
+  (and (ne (symbol_ref "TARGET_THUMB2") (const_int 0))
+   (and (ge (minus (match_dup 0) (pc)) (const_int -250))
+(le (minus (match_dup 0) (pc)) (const_int 256
+  (const_int 2)
+  (const_int 4)))]
 )



@@ -7757,7 +7782,14 @@
 return \"b%?\\t%l0\";
   }
   "
-  [(set_attr "predicable" "yes")]
+  [(set_attr "predicable" "yes")
+   (set (attr "length")
+   (if_then_else
+  (and (ne (symbol_ref "TARGET_THUMB2") (const_int 0))
+   (and (ge (minus (match_dup 0) (pc)) (const_int -2044))
+(le (minus (match_dup 0) (pc)) (const_int 2048
+  (const_int 2)
+  (const_int 4)))]
 )

 (define_insn "*thumb_jump"
@@ -10256,7 +10288,26 @@

 return \"\";
   }"
-  [(set_attr "type" "store4")]
+  [(set_attr "type" "store4")
+   (set (attr "length")
+   (if_then_else
+  (and (ne (symbol_ref "TARGET_THUMB2") (const_int 0))
+   (ne (symbol_ref "{
+   int i, regno, hi_reg;
+   int num_saves = XVECLEN (operands[2], 0);
+   regno = REGNO (operands[1]);
+   hi_reg = (REGNO_REG_CLASS (regno) == HI_REGS)
+&& (regno != LR_REGNUM);
+   for (i = 1; i < num_saves; i++)
+ {
+   regno = REGNO (XEXP (XVECEXP (operands[2], 0, i), 0));
+   hi_reg |= (REGNO_REG_CLASS (regno) == HI_REGS)
+ && (regno != LR_REGNUM);
+ }
+   !hi_reg;}")
+ (const_int 0)))
+  (const_int 2)
+  (const_int 4)))]
 )

 (define_insn "stack_tie"


Re: [PATCH: ARM] PR 45335 Use ldrd and strd to access two consecutive words

2011-03-29 Thread Carrot Wei
Thank you for the knowledge. I've updated the insn patterns
accordingly. Again tested on arm qemu.

thanks
Carrot

ChangeLog:
2011-03-29  Wei Guozhi  

PR target/45335
* gcc/config/arm/ldmstm.md (ldm2_ia, stm2_ia, ldm2_ib, stm2_ib, ldm2_da,
stm2_da, ldm2_db, stm2_db): Add condition !arm_arch7 to these insns.
(ldrd, ldrd_reg1, ldrd_reg2 and peephole2): New insn patterns and
related peephole2.
(strd, strd_reg1, strd_reg2 and peephole2): New insn patterns and
related peephole2.
* gcc/config/arm/arm-protos.h (arm_check_ldrd_operands): New prototype.
(arm_legitimate_ldrd_p): New prototype.
(arm_output_ldrd): New prototype.
* gcc/config/arm/arm.c (arm_check_ldrd_operands): New function.
(arm_legitimate_ldrd_p): New function.
(arm_output_ldrd): New function.


2011-03-29  Wei Guozhi  

PR target/45335
* gcc.target/arm/pr45335.c: New test.
* gcc.target/arm/pr45335-2.c: New test.
* gcc.target/arm/pr45335-3.c: New test.
* gcc.target/arm/pr40457-1.c: Add another possible output "ldrd".
* gcc.target/arm/pr40457-2.c: Changed to store 3 words.
* gcc.target/arm/pr40457-3.c: Changed to store 3 words.


On Thu, Mar 24, 2011 at 8:25 AM, Mike Stump  wrote:
> On Jan 18, 2011, at 6:59 AM, Carrot Wei wrote:
>> +(define_insn "*ldrd"
>> +  [(parallel [(set (match_operand:SI 0 "arm_hard_register_operand" "")
>
> parallel is implicit, you can safely remove it from all define_insns.
>
Index: testsuite/gcc.target/arm/pr40457-3.c
===
--- testsuite/gcc.target/arm/pr40457-3.c(revision 171439)
+++ testsuite/gcc.target/arm/pr40457-3.c(working copy)
@@ -5,6 +5,7 @@ void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }
 
 /* { dg-final { scan-assembler "stm" } } */
Index: testsuite/gcc.target/arm/pr45335-2.c
===
--- testsuite/gcc.target/arm/pr45335-2.c(revision 0)
+++ testsuite/gcc.target/arm/pr45335-2.c(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-options "-Os -march=armv7-a" }  */
+/* { dg-do compile } */
+
+void foo(int a, int b, int* p)
+{
+  p[2] = a;
+  p[3] = b;
+}
+
+/* { dg-final { scan-assembler "strd" } } */
Index: testsuite/gcc.target/arm/pr45335-3.c
===
--- testsuite/gcc.target/arm/pr45335-3.c(revision 0)
+++ testsuite/gcc.target/arm/pr45335-3.c(revision 0)
@@ -0,0 +1,12 @@
+/* { dg-options "-Os -march=armv7-a" }  */
+/* { dg-do compile } */
+
+int foo(int a, int b, int* p, int *q)
+{
+  a = p[2] + p[3];
+  *q = a;
+  *p = a;
+  return a;
+}
+
+/* { dg-final { scan-assembler "ldrd" } } */
Index: testsuite/gcc.target/arm/pr40457-1.c
===
--- testsuite/gcc.target/arm/pr40457-1.c(revision 171439)
+++ testsuite/gcc.target/arm/pr40457-1.c(working copy)
@@ -7,4 +7,4 @@ int bar(int* p)
   return x;
 }
 
-/* { dg-final { scan-assembler "ldm" } } */
+/* { dg-final { scan-assembler "ldm|ldrd" } } */
Index: testsuite/gcc.target/arm/pr40457-2.c
===
--- testsuite/gcc.target/arm/pr40457-2.c(revision 171439)
+++ testsuite/gcc.target/arm/pr40457-2.c(working copy)
@@ -5,6 +5,7 @@ void foo(int* p)
 {
   p[0] = 1;
   p[1] = 0;
+  p[2] = 2;
 }
 
 /* { dg-final { scan-assembler "stm" } } */
Index: testsuite/gcc.target/arm/pr45335.c
===
--- testsuite/gcc.target/arm/pr45335.c  (revision 0)
+++ testsuite/gcc.target/arm/pr45335.c  (revision 0)
@@ -0,0 +1,22 @@
+/* { dg-options "-mthumb -O2" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler "ldrd" } } */
+/* { dg-final { scan-assembler "strd" } } */
+
+struct S
+{
+void* p1;
+void* p2;
+void* p3;
+void* p4;
+};
+
+extern printf(char*, ...);
+
+void foo1(struct S* fp, struct S* otherSaveArea)
+{
+struct S* saveA = fp - 1;
+printf("StackSaveArea for fp %p [%p/%p]:\n", fp, saveA, otherSaveArea);
+printf("prevFrame=%p savedPc=%p meth=%p curPc=%p fp[0]=0x%08x\n",
+saveA->p1, saveA->p2, saveA->p3, saveA->p4, *(unsigned int*)fp);
+}
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 171439)
+++ config/arm/arm.c(working copy)
@@ -23681,4 +23681,234 @@ arm_preferred_rename_class (reg_class_t 
 return NO_REGS;
 }
 
+/* Check the validity of operands in an ldrd/strd instruction.  */
+bool
+arm_check_ldrd_operands (rtx reg1, rtx reg2, rtx 

  1   2   >